How do I get the visible text portion of a web page with selenium webdriver without the HTML tags?
I need something equivalent to the function HtmlPage.asText() from Htmlunit.
It is not enough to take the text with the function WebDriver.getSource and parse it with jsoup because there could be in the page hidden elements (by external CSS) which I am not interested in them.
13 Answers
Doing By.tagName("body") (or some other selector to select the top element), then performing getText() on that element will return all of the visible text.
I can help you with C# Selenium.
By using this you can select all the text on that particular page and save it to a text file at your preferred location.
Make sure you are using this stuff:
using System.IO;
using System.Text;
using OpenQA.Selenium;
using OpenQA.Selenium.Support.UI;After reaching the particular page try using this code.
IWebElement body = driver.FindElement(By.TagName("body"));
var result = driver.FindElement(By.TagName("body")).Text;
// Folder location
var dir = @"C:Textfile" + DateTime.Now.ToShortDateString();
// If the folder doesn't exist, create it
if (!Directory.Exists(dir))
Directory.CreateDirectory(dir);
// Creates a file copiedtext.txt with all the contents on the page.
File.AppendAllText(Path.Combine(dir, "Copiedtext.txt"), result); 1 I'm not sure what language you're using, but in C# the IWebElement object has a .Text method. That method shows all text that is displayed between the element's opening and closing tag.
I would create an IWebElement using XPath to grab the entire page. In other words, you're grabbing the body element and looking at the text in it.
string pageText = driver.FindElement(By.XPath("//html/body/")).Text;If the above code does not work for selenium, use this:
string yourtext= driver.findElement(By.tagName("body")).getText(); 3