Make Test Automation Scripts Fast Using HTML Parsing Frameworks



If test execution speed is most important, HTML parser libraries like JSOUP should be used when Selenium WebDriver scripts are too slow.

JSOUP is a Java library for working with real-world HTML.

It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.


Selenium WebDriver scripts are very slow

The speed of Selenium WebDriver scripts is very slow as it depends on:
  • browser load time; how fast the browser loads depends on how good the computer is
  • site's performance; the performance of the site depends on server hardware, number of concurrent users, site architecture
  • internet connection's performance

Let's take for example a simple script that implements the following test case:
  • open the home page of the http://www.vpl.ca site
  • do a keyword search
  • on the results page, click on the title link for the first result
  • on the details page, check that the book title and book author values exist

All code samples are just that, samples.

No page object classes are being used.

@Test
public void test2() throws InterruptedException {

driver.get("http://www.vpl.ca");


WebElement searchField = wait.until(ExpectedConditions.
visibilityOfElementLocated(By.xpath("//input[@id='globalQuery']")));


searchField.click();
searchField.sendKeys("java");

WebElement searchButton = wait.until(ExpectedConditions.
elementToBeClickable(By.xpath("//input[@class='search_button']")));


searchButton.click();

WebElement resultTitle = wait.until(ExpectedConditions.
elementToBeClickable(By.xpath("(//a[@testid='bib_link'])[1]")));


resultTitle.click();

String bookTitleValue = "", bookAuthorValue = "";;

WebElement bookTitleElement = wait.until(ExpectedConditions.
visibilityOfElementLocated(By.xpath("//h1[@id='item_bib_title']")));

bookTitleValue = bookTitleElement.getText();
assertTrue(bookTitleValue.length() > 0);

//some books do not have author so I need to use a try/catch
try {


WebElement bookAuthorElement = wait.until(ExpectedConditions.
visibilityOfElementLocated(By.xpath("//a[@testid='author_search']")));

bookAuthorValue = bookAuthorElement.getText();
assertTrue(bookAuthorValue.length() > 0);


}
catch(Exception e) { }

}


The test script runs correctly in about 15 seconds.

Let's assume that we want to create a script for another test case that does the same things as the previous one but for all book title links from the results page (10 links).

The script is a bit more complicated as it needs to iterate through all book title links:
  • open the home page of the http://www.vpl.ca site
  • do a keyword search
  • on the results page, do the following for each book title link
    • click on the title link
    • on the details page, check that the book title and book author values exist
    • go back
    • continue with the next link

@Test
public void test1() throws InterruptedException {

driver.get("http://www.vpl.ca");

WebElement searchField = wait.until(ExpectedConditions.
visibilityOfElementLocated(By.xpath("//input[@id='globalQuery']")));


searchField.click();
searchField.sendKeys("java");

WebElement searchButton = wait.until(ExpectedConditions.
elementToBeClickable(By.xpath("//input[@class='search_button']")));


searchButton.click();

for (int i = 1; i <= 10; i++) {

WebElement resultTitle = wait.until(ExpectedConditions.
elementToBeClickable(By.xpath("(//a[@testid='bib_link'])[" + i + "]")));


resultTitle.click();

String bookTitleValue = "", bookAuthorValue = "";

WebElement bookTitleElement = wait.until(ExpectedConditions.
visibilityOfElementLocated(By.xpath("//h1[@id='item_bib_title']")));


bookTitleValue = bookTitleElement.getText();

assertTrue(bookTitleValue.length() > 0);

try {

WebElement bookAuthorElement = wait.until(ExpectedConditions.
visibilityOfElementLocated(By.xpath("//a[@testid='author_search']")));


bookAuthorValue = bookAuthorElement.getText();

assertTrue(bookAuthorValue.length() > 0);


}
catch(Exception e) { }

driver.navigate().back();
}


The script runs successfully but it needs 98 seconds to complete.

Is there another way for making the second script faster?

The first script proves that the site navigation between the home page, result and details pages works well.

If we agree that the navigation is not important for the second script, we can implement it not using the Selenium WebDriver framework but with the JSOUP HTTP parser library.


Implement time consuming automation scripts using the JSOUP library

Let's start cooking :)

We need the soup ingredients first: vegetables, herbs, oil ..............

Just kidding.



We will be cooking a different type of soup: JSOUP.

A few words about JSOUP (from the jsoup official site):

jsoup is a Java library for working with real-world HTML.
It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
  1. scrape and parse HTML from a URL, file, or string
  2. find and extract data, using DOM traversal or CSS selectors
  3. manipulate the HTML elements, attributes, and text
  4. clean user-submitted content against a safe white-list, to prevent XSS attacks
  5. output tidy HTML

To user JSOUP, first download the JAR file from http://jsoup.org/ and add it to the project properties.

The second script written with JSOUP looks like this:

@Test public void test1() throws IOException {

Document resultsPage = Jsoup.connect("https://vpl.bibliocommons.com/search?q=java&t=keyword").get();

Elements titles = resultsPage.select("span.title");

for (Element title : titles) {

Element link = title.child(0);

String detailsPageUrl = "https://vpl.bibliocommons.com" + link.attr("href");

Document detailsPage = Jsoup.connect(detailsPageUrl).get();

Elements bookTitle = detailsPage.getElementsByAttributeValue("testid", "text_bibtitle");

if (bookTitle.size() > 0)
assertTrue(bookTitle.get(0).text().length() > 0);

Elements bookAuthor = detailsPage.getElementsByAttributeValue("testid", "author_search");

if (bookAuthor.size() > 0)
assertTrue(bookAuthor.get(0).text().length() > 0); 

}

}

Lets see what each line does:

//establishes a connection to the page;
//uses the get method to get the page content and return a document object
Document resultsPage = Jsoup.connect("https://vpl.bibliocommons.com/search?q=java&t=keyword").get();

//selects all span elements that have the title class from the document object
Elements titles = resultsPage.select("span.title");

//for each span element from the list
for (Element title : titles) {

//gets the first node of the span element; this is the title link
Element link = title.child(0);

//gets the href attribute of the a element
String detailsPageUrl = "https://vpl.bibliocommons.com" + link.attr("href");

//establishes a connection to the details page
//gets the page and returns a document object
Document detailsPage = Jsoup.connect(detailsPageUrl).get();

//finds all elements in the details page that have a testid attribute with the text_bibtitle value
Elements bookTitle = detailsPage.getElementsByAttributeValue("testid", "text_bibtitle");

//get the first found element using get(0) and its text using text()
//assert that the text length is > 0
if (bookTitle.size() > 0)
assertTrue(bookTitle.get(0).text().length() > 0);

//finds all elements in the details page that have a testid attribute with the author_search value
Elements bookAuthor = detailsPage.getElementsByAttributeValue("testid", "author_search");

//get the first found element using get(0) and its text using text()
//assert that the text length is > 0
if (bookAuthor.size() > 0)
assertTrue(bookAuthor.get(0).text().length() > 0); 

The script is simpler than the WebDriver one.

It does not work by interacting with the site through the browser.

It uses http requests instead to get the page data and then it parses or navigates through the page data.

The best part of this script is that it executes in 8 seconds!!!!

Compare this with 98 seconds needed for executing the WebDriver script.




Hope that you liked the soup recipe!

If you have any questions about the recipe, ingredients or the cook, please post them in the comments section.

The cook appreciates your interest in his recipes :)


Share this

2 Responses to "Make Test Automation Scripts Fast Using HTML Parsing Frameworks"

  1. Hi,

    Thanks for the nice article. However, you think using JSOUP as selenium alternative will be a good idea?

    ReplyDelete
    Replies
    1. In some cases, jSOUP is a better choice.

      Think about a case where you want to check that all links from a page work (are not broken).

      Or about another situation where you want to be sure that none of the images included in the page is broken.

      Delete