Posts Tagged ‘tutorial’

Android XML Adventure – Parsing HTML using HtmlCleaner

February 4, 2012 26 comments

Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)


  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!


After a long time, now I’d like to come back to the series, sorry guys for make you all waiting.

In this article, I will give a simple guide on how to use HtmlCleaner to parse HTML data in XPath format.

You might get to know what XPath is already and learned how to use XPath library on Android system.

This time, we will use a XPath to query the value we desire to have from an HTML page not XML file, interesting, isn’t it?

The HTML Page target is my blog:

The data will be desired to parse is the “Statistics“, number of Views on my blog, which is on the bottom-right side of the blog. The current number is: 80,303 views.

The XPath for this is: “//div[@id=’blog-stats’]/ul/li

First, get the HtmlCleaner library and set it up, get it from here:

Open your Eclipse and create new project, then right click to the project on the left pane, select Properties.

HtmlCleaner Setup

HtmlCleaner Setup

Ok, on tab Libraries, click button “Add External JARs” on the right side, a dialog to select JAR files open up, select the HtmlCleaner library, then click button Open. It’s done for setting up the library.

Next is the layout of application, I use the default one, only one TextView, well, just enough to confirm value.

Let’s get straight to the source code 🙂



import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;

import android.os.Bundle;
import android.widget.TextView;

public class HtmlCleanerStudyActivity extends Activity {

	// HTML page
	static final String BLOG_URL = "";
	// XPath query
	static final String XPATH_STATS = "//div[@id='blog-stats']/ul/li";

    public void onCreate(Bundle savedInstanceState) {
    	// init view layout

        // decide output
        String value = "";
        try {
        	value = getBlogStats();
        } catch(Exception ex) {

     * get blog statistics
    public String getBlogStats() throws Exception {
    	String stats = "";

    	// config cleaner properties
    	HtmlCleaner htmlCleaner = new HtmlCleaner();
    	CleanerProperties props = htmlCleaner.getProperties();

    	// create URL object
    	URL url = new URL(BLOG_URL);
    	// get HTML page root node
    	TagNode root = htmlCleaner.clean(url);

    	// query XPath
    	Object[] statsNode = root.evaluateXPath(XPATH_STATS);
    	// process data if found any node
    	if(statsNode.length > 0) {
    		// I already know there's only one node, so pick index at 0.
    		TagNode resultNode = (TagNode)statsNode[0];
    		// get text data from HTML node
    		stats = resultNode.getText().toString();

    	// return value
    	return stats;

Also, remember to set INTERNET permission as well on AndroidManifest.xml

Run it, and get the result:

HtmlCleaner Output

HtmlCleaner Output

It’s the output from my phone: Galaxy S II.

This library is simple and pretty fast and I’d like to use it. If you know any other better libraries, please let me know, I’d like to get it too.

In case you have some trouble, you can get this full source code: Get HtmlCleaner Sample Project


Pete Houston

Categories: Tutorials Tags: , , , ,

Set icon for Android application

July 13, 2011 1 comment

It’s very simple! What you need to do is just to specify the icon you want in AndroidManifest.xml

Set Application Icon

Set Application Icon

Remember to put the icon (should be: PNG or JPG) in /drawable directory.

Hope you like it!



Pete Houston