Archive

Archive for February, 2012

Weather sucks totally

February 27, 2012 Leave a comment

Weather these days totally goes crazy, one day freakin’ hot, one day freezing like hell…the pattern has been started up for two weeks…Consequently, I’ve got a fever today….

Oh Mr. Weather, when can you stop these madness? :(, I need some health to work on my ICS Launcher..

Advertisements
Categories: Of Diary Tags: ,

Android XML Adventure – Parsing HTML using JSoup

February 4, 2012 22 comments

Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

Another library used common for parsing HTML is JSoup.

Unlike HtmlCleaner, JSoup uses the concept of attributes as a selector to identify each node in HTML tree.

I suggest you should learn the basics syntax of JSoup selector before continue, http://jsoup.org/cookbook/extracting-data/selector-syntax

Well, we will do the same thing as previous article, we get the blog statistics using JSoup.

The syntax is like this: ” div#blog-stats ul li

Literally, it means: select the node <li> inside node <ul> , which has parent is a <div> having ID value is “blog-stats“.

Download the libary JSoup and add it as “External JARs”.

Head straight to the source code to get our desire value:

package pete.android.study;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;

public class JSoupStudyActivity extends Activity {

	// blog url
	static final String BLOG_URL = "https://xjaphx.wordpress.com/";

    @Override
    public void onCreate(Bundle savedInstanceState) {
    	// set layout view
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        // process
        try {
        	((TextView)findViewById(R.id.tv)).setText(getBlogStats());
        } catch (Exception ex) {
        	((TextView)findViewById(R.id.tv)).setText("Error");
        }
    }

    protected String getBlogStats() throws Exception {
    	String result = "";
    	// get html document structure
    	Document document = Jsoup.connect(BLOG_URL).get();
    	// selector query
    	Elements nodeBlogStats = document.select("div#blog-stats ul li");
    	// check results
    	if(nodeBlogStats.size() > 0) {
    		// get value
    		result = nodeBlogStats.get(0).text();
    	}

    	// return
    	return result;
    }
}

Remember to add INTERNET permission. Here the result on my Galaxy S II phone, which is a little chocky-cocky:

JSoup Sample

JSoup Sample

Not so much different from XPath, is it?

Cheers,

Pete Houston

Categories: Tutorials Tags: , , , , , ,

Android XML Adventure – Parsing HTML using HtmlCleaner

February 4, 2012 26 comments

Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

After a long time, now I’d like to come back to the series, sorry guys for make you all waiting.

In this article, I will give a simple guide on how to use HtmlCleaner to parse HTML data in XPath format.

You might get to know what XPath is already and learned how to use XPath library on Android system.

This time, we will use a XPath to query the value we desire to have from an HTML page not XML file, interesting, isn’t it?

The HTML Page target is my blog: https://xjaphx.wordpress.com/

The data will be desired to parse is the “Statistics“, number of Views on my blog, which is on the bottom-right side of the blog. The current number is: 80,303 views.

The XPath for this is: “//div[@id=’blog-stats’]/ul/li

First, get the HtmlCleaner library and set it up, get it from here: http://htmlcleaner.sourceforge.net/

Open your Eclipse and create new project, then right click to the project on the left pane, select Properties.

HtmlCleaner Setup

HtmlCleaner Setup

Ok, on tab Libraries, click button “Add External JARs” on the right side, a dialog to select JAR files open up, select the HtmlCleaner library, then click button Open. It’s done for setting up the library.

Next is the layout of application, I use the default one, only one TextView, well, just enough to confirm value.

Let’s get straight to the source code 🙂

package pete.android.study;

import java.net.URL;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;

import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;

public class HtmlCleanerStudyActivity extends Activity {

	// HTML page
	static final String BLOG_URL = "https://xjaphx.wordpress.com/";
	// XPath query
	static final String XPATH_STATS = "//div[@id='blog-stats']/ul/li";

    @Override
    public void onCreate(Bundle savedInstanceState) {
    	// init view layout
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        // decide output
        String value = "";
        try {
        	value = getBlogStats();
        	((TextView)findViewById(R.id.tv)).setText(value);
        } catch(Exception ex) {
        	((TextView)findViewById(R.id.tv)).setText("Error");
        }
    }

    /*
     * get blog statistics
     */
    public String getBlogStats() throws Exception {
    	String stats = "";

    	// config cleaner properties
    	HtmlCleaner htmlCleaner = new HtmlCleaner();
    	CleanerProperties props = htmlCleaner.getProperties();
    	props.setAllowHtmlInsideAttributes(false);
    	props.setAllowMultiWordAttributes(true);
    	props.setRecognizeUnicodeChars(true);
    	props.setOmitComments(true);

    	// create URL object
    	URL url = new URL(BLOG_URL);
    	// get HTML page root node
    	TagNode root = htmlCleaner.clean(url);

    	// query XPath
    	Object[] statsNode = root.evaluateXPath(XPATH_STATS);
    	// process data if found any node
    	if(statsNode.length > 0) {
    		// I already know there's only one node, so pick index at 0.
    		TagNode resultNode = (TagNode)statsNode[0];
    		// get text data from HTML node
    		stats = resultNode.getText().toString();
    	}

    	// return value
    	return stats;
    }
}

Also, remember to set INTERNET permission as well on AndroidManifest.xml

Run it, and get the result:

HtmlCleaner Output

HtmlCleaner Output

It’s the output from my phone: Galaxy S II.

This library is simple and pretty fast and I’d like to use it. If you know any other better libraries, please let me know, I’d like to get it too.

In case you have some trouble, you can get this full source code: Get HtmlCleaner Sample Project

Cheers,

Pete Houston

Categories: Tutorials Tags: , , , ,