Archive

Posts Tagged ‘xpath’

Android XML Adventure – Parsing HTML using JSoup

February 4, 2012 22 comments

Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

Another library used common for parsing HTML is JSoup.

Unlike HtmlCleaner, JSoup uses the concept of attributes as a selector to identify each node in HTML tree.

I suggest you should learn the basics syntax of JSoup selector before continue, http://jsoup.org/cookbook/extracting-data/selector-syntax

Well, we will do the same thing as previous article, we get the blog statistics using JSoup.

The syntax is like this: ” div#blog-stats ul li

Literally, it means: select the node <li> inside node <ul> , which has parent is a <div> having ID value is “blog-stats“.

Download the libary JSoup and add it as “External JARs”.

Head straight to the source code to get our desire value:

package pete.android.study;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;

public class JSoupStudyActivity extends Activity {

	// blog url
	static final String BLOG_URL = "https://xjaphx.wordpress.com/";

    @Override
    public void onCreate(Bundle savedInstanceState) {
    	// set layout view
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        // process
        try {
        	((TextView)findViewById(R.id.tv)).setText(getBlogStats());
        } catch (Exception ex) {
        	((TextView)findViewById(R.id.tv)).setText("Error");
        }
    }

    protected String getBlogStats() throws Exception {
    	String result = "";
    	// get html document structure
    	Document document = Jsoup.connect(BLOG_URL).get();
    	// selector query
    	Elements nodeBlogStats = document.select("div#blog-stats ul li");
    	// check results
    	if(nodeBlogStats.size() > 0) {
    		// get value
    		result = nodeBlogStats.get(0).text();
    	}

    	// return
    	return result;
    }
}

Remember to add INTERNET permission. Here the result on my Galaxy S II phone, which is a little chocky-cocky:

JSoup Sample

JSoup Sample

Not so much different from XPath, is it?

Cheers,

Pete Houston

Advertisements
Categories: Tutorials Tags: , , , , , ,

Android XML Adventure – Parsing HTML using HtmlCleaner

February 4, 2012 26 comments

Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

After a long time, now I’d like to come back to the series, sorry guys for make you all waiting.

In this article, I will give a simple guide on how to use HtmlCleaner to parse HTML data in XPath format.

You might get to know what XPath is already and learned how to use XPath library on Android system.

This time, we will use a XPath to query the value we desire to have from an HTML page not XML file, interesting, isn’t it?

The HTML Page target is my blog: https://xjaphx.wordpress.com/

The data will be desired to parse is the “Statistics“, number of Views on my blog, which is on the bottom-right side of the blog. The current number is: 80,303 views.

The XPath for this is: “//div[@id=’blog-stats’]/ul/li

First, get the HtmlCleaner library and set it up, get it from here: http://htmlcleaner.sourceforge.net/

Open your Eclipse and create new project, then right click to the project on the left pane, select Properties.

HtmlCleaner Setup

HtmlCleaner Setup

Ok, on tab Libraries, click button “Add External JARs” on the right side, a dialog to select JAR files open up, select the HtmlCleaner library, then click button Open. It’s done for setting up the library.

Next is the layout of application, I use the default one, only one TextView, well, just enough to confirm value.

Let’s get straight to the source code 🙂

package pete.android.study;

import java.net.URL;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;

import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;

public class HtmlCleanerStudyActivity extends Activity {

	// HTML page
	static final String BLOG_URL = "https://xjaphx.wordpress.com/";
	// XPath query
	static final String XPATH_STATS = "//div[@id='blog-stats']/ul/li";

    @Override
    public void onCreate(Bundle savedInstanceState) {
    	// init view layout
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        // decide output
        String value = "";
        try {
        	value = getBlogStats();
        	((TextView)findViewById(R.id.tv)).setText(value);
        } catch(Exception ex) {
        	((TextView)findViewById(R.id.tv)).setText("Error");
        }
    }

    /*
     * get blog statistics
     */
    public String getBlogStats() throws Exception {
    	String stats = "";

    	// config cleaner properties
    	HtmlCleaner htmlCleaner = new HtmlCleaner();
    	CleanerProperties props = htmlCleaner.getProperties();
    	props.setAllowHtmlInsideAttributes(false);
    	props.setAllowMultiWordAttributes(true);
    	props.setRecognizeUnicodeChars(true);
    	props.setOmitComments(true);

    	// create URL object
    	URL url = new URL(BLOG_URL);
    	// get HTML page root node
    	TagNode root = htmlCleaner.clean(url);

    	// query XPath
    	Object[] statsNode = root.evaluateXPath(XPATH_STATS);
    	// process data if found any node
    	if(statsNode.length > 0) {
    		// I already know there's only one node, so pick index at 0.
    		TagNode resultNode = (TagNode)statsNode[0];
    		// get text data from HTML node
    		stats = resultNode.getText().toString();
    	}

    	// return value
    	return stats;
    }
}

Also, remember to set INTERNET permission as well on AndroidManifest.xml

Run it, and get the result:

HtmlCleaner Output

HtmlCleaner Output

It’s the output from my phone: Galaxy S II.

This library is simple and pretty fast and I’d like to use it. If you know any other better libraries, please let me know, I’d like to get it too.

In case you have some trouble, you can get this full source code: Get HtmlCleaner Sample Project

Cheers,

Pete Houston

Categories: Tutorials Tags: , , , ,

Android XML Adventure – Parsing XML using XPath

December 24, 2011 4 comments

Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

XPath is a syntax to query directly to the specified tag by name or id or using any pre-defined function to detect the nodes. It’s really useful when it’s coming to parse a lot of data in the form of an array.

You might want to study some about XPath first: W3Schools – XPath Tutorials

Specifically, we will apply XPath in Android platform, there’s a library also named XPath, pre-package in Android framework. Reference to Android XPath Library.

The usage of XPath is pretty much simple:

1. Create a `InputSource` object, from a `String`, from a `InputStream`, from `Resources`, from `Assets` ….

2. Create a `XPath` object

3. Define your XPath expression, which is a `String`.

4. Evaluate above expression from `InputSource` created at step 1.

5. Query data retrieved from evaluation.

That’s it. Here go for an example, I’ll have the following XML file, `data.xml`, and put into `/res/raw` folder.

<?xml version="1.0" encoding="UTF-8"?>
<sample>
	<info>
		<title>Using XPath to parse XML</title>
		<author>Pete Houston</author>
	<list>
		<person id="1">
			<name>Pete Houston</name>
			<age>28</age>
		</person>

		<person id="2">
			<name>Nina Jones</name>
			<age>27</age>
		</person>

		<person id="3">
			<name>Yumin Hanazuki</name>
			<age>22</age>
		</person>
	</list>
	</info>
</sample>

Following above 5 steps, including display data to UI.

package pete.android.tutorial.xml.xpath;

import java.util.ArrayList;

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

import android.app.ListActivity;
import android.os.Bundle;
import android.widget.ArrayAdapter;
import android.widget.Toast;

public class XPathStudyActivity extends ListActivity {
    // data
	ArrayList<String> mPeople = new ArrayList<String>();

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);

        try {
        	parseData();
        } catch(Exception ex) {
        	Toast.makeText(this, "Exception: " + ex.getMessage(), Toast.LENGTH_LONG).show();
        }

        // pass adapter w/ data queried through XPath to ListView
        ArrayAdapter<String> adapter = new ArrayAdapter<String>(this, android.R.layout.simple_list_item_1, mPeople);
        setListAdapter(adapter);
    }

    private void parseData() throws Exception {
    	// create an InputSource object from /res/raw
    	InputSource inputSrc = new InputSource(getResources().openRawResource(R.raw.data));
    	// query XPath instance, this is the parser
    	XPath xpath = XPathFactory.newInstance().newXPath();
    	// specify the xpath expression
    	String expression = "//name";
    	// list of nodes queried
    	NodeList nodes = (NodeList)xpath.evaluate(expression, inputSrc, XPathConstants.NODESET);

    	Toast.makeText(this, "count: " + String.valueOf(nodes.getLength()),Toast.LENGTH_SHORT).show();
    	// if node found
    	if(nodes != null && nodes.getLength() > 0) {
    		mPeople.clear();
    		int len = nodes.getLength();
    		for(int i = 0; i < len; ++i) {
    			// query value
    			Node node = nodes.item(i);
    			mPeople.add(node.getTextContent());
    		}
    	}
    }
}

The above sample, I’ve tried to query all `name` XML tag from `data.xml` file and display on the list. Very simple usage w/ XPath library!

Have fun,
Pete Houston

Categories: Tutorials Tags: , , , , , ,

Android XML Adventure – What is the “Thing” called XML?

October 9, 2011 1 comment

Currently I’m working on XML Data Storage for Android Application. It’s quite interesting! So I’ve thought to make it into series.

– What is the thing called XML?

Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification[4] produced by the W3C, and several other related specifications, all gratis open standards.[5]

The design goals of XML emphasize simplicity, generality, and usability over the Internet.[6] It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.

Many application programming interfaces (APIs) have been developed that software developers use to process XML data, and several schema systems exist to aid in the definition of XML-based languages.

As of 2009[update], hundreds of XML-based languages have been developed,[7] including RSS, Atom, SOAP, and XHTML. XML-based formats have become the default for most office-productivity tools, including Microsoft Office (Office Open XML), OpenOffice.org (OpenDocument), and Apple‘s iWork.[8]

(Quoted from Wikipedia: http://en.wikipedia.org/wiki/XML)

– As you see that, XML is really useful and applicable everywhere throughout the Internet nowadays, and you’d better know more about it.

– In Android, XML is used for resource planning like layout, strings (localization), … or the pre-defined SharedPreferences, or to be used as custom database…

– In this series “Android XML Adventure“, I will talk to you about the way how we can handle XML file in Android.

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

Be await for me, this series will make you fall in love XML for real 🙂

Cheers,

Pete Houston