Home > Tutorials > Android XML Adventure – Parsing HTML using JSoup

Android XML Adventure – Parsing HTML using JSoup


Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

Another library used common for parsing HTML is JSoup.

Unlike HtmlCleaner, JSoup uses the concept of attributes as a selector to identify each node in HTML tree.

I suggest you should learn the basics syntax of JSoup selector before continue, http://jsoup.org/cookbook/extracting-data/selector-syntax

Well, we will do the same thing as previous article, we get the blog statistics using JSoup.

The syntax is like this: ” div#blog-stats ul li

Literally, it means: select the node <li> inside node <ul> , which has parent is a <div> having ID value is “blog-stats“.

Download the libary JSoup and add it as “External JARs”.

Head straight to the source code to get our desire value:

package pete.android.study;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;

public class JSoupStudyActivity extends Activity {

	// blog url
	static final String BLOG_URL = "http://xjaphx.wordpress.com/";

    @Override
    public void onCreate(Bundle savedInstanceState) {
    	// set layout view
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        // process
        try {
        	((TextView)findViewById(R.id.tv)).setText(getBlogStats());
        } catch (Exception ex) {
        	((TextView)findViewById(R.id.tv)).setText("Error");
        }
    }

    protected String getBlogStats() throws Exception {
    	String result = "";
    	// get html document structure
    	Document document = Jsoup.connect(BLOG_URL).get();
    	// selector query
    	Elements nodeBlogStats = document.select("div#blog-stats ul li");
    	// check results
    	if(nodeBlogStats.size() > 0) {
    		// get value
    		result = nodeBlogStats.get(0).text();
    	}

    	// return
    	return result;
    }
}

Remember to add INTERNET permission. Here the result on my Galaxy S II phone, which is a little chocky-cocky:

JSoup Sample

JSoup Sample

Not so much different from XPath, is it?

Cheers,

Pete Houston

About these ads
Categories: Tutorials Tags: , , , , , ,
  1. aqua
    April 20, 2013 at 7:23 pm | #1

    can you tell me what i should do in layout .xml, and what does that mean “// set layout view”

  2. Greg
    January 19, 2013 at 7:43 pm | #2

    2.3.3 = WORK 4.03, 4.2 = EROR.. WHY? Can you help my?

  3. Lawrence Macharia
    January 3, 2013 at 12:06 pm | #4

    for those who did not manage to make the code work, try adding this permission in your manifest

  4. December 21, 2012 at 10:54 pm | #5

    I always return text Error, i have follow this tutorial.
    any help me?

  5. ravi
    October 30, 2012 at 1:54 am | #6

    Thanx dude really helpfull :)

  6. cvele
    September 20, 2012 at 12:39 am | #7

    This is not working, I get nothing, any ideas???

  7. Mehrin Anannya
    June 11, 2012 at 3:13 pm | #8

    This is really interesting to work.
    Can anyone help me in getting all the text of under ul of div#subNavigation from the website http://www.juniv.edu/news? plz…….

  8. Mehrin Anannya
    June 11, 2012 at 3:09 pm | #9

    Can anyone help me?
    i want to get all the text within allunder ul of the div#subNavigation from the website http://www.juniv.edu/news. how can i do this?

  9. jhon
    April 30, 2012 at 4:21 pm | #10

    Can anybody help me..

  10. jhon
    April 30, 2012 at 4:20 pm | #11

    Its not working for me

  11. April 8, 2012 at 2:29 pm | #12

    This is pretty awesome! I was needing to check the state of a switch for a remote controlled garage door opener I’m working on. This seems to be the way to go about it. Do you if this will work with IP addresses? I really don’t want to put the button to activate the switch on a website – just my local lan 192.168.1.250 for example. Or do you know how to search HTML files locally? Like if you made a test site in Notepad and wanted to play around with it. file:///C:/Users/User/Desktop/test.html

  12. manivannan
    February 24, 2012 at 1:54 am | #13

    extraordinary work .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: