Home > Tutorials > Android XML Adventure – Parsing HTML using JSoup

Android XML Adventure – Parsing HTML using JSoup


Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

Another library used common for parsing HTML is JSoup.

Unlike HtmlCleaner, JSoup uses the concept of attributes as a selector to identify each node in HTML tree.

I suggest you should learn the basics syntax of JSoup selector before continue, http://jsoup.org/cookbook/extracting-data/selector-syntax

Well, we will do the same thing as previous article, we get the blog statistics using JSoup.

The syntax is like this: ” div#blog-stats ul li

Literally, it means: select the node <li> inside node <ul> , which has parent is a <div> having ID value is “blog-stats“.

Download the libary JSoup and add it as “External JARs”.

Head straight to the source code to get our desire value:

package pete.android.study;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;

public class JSoupStudyActivity extends Activity {

	// blog url
	static final String BLOG_URL = "http://xjaphx.wordpress.com/";

    @Override
    public void onCreate(Bundle savedInstanceState) {
    	// set layout view
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        // process
        try {
        	((TextView)findViewById(R.id.tv)).setText(getBlogStats());
        } catch (Exception ex) {
        	((TextView)findViewById(R.id.tv)).setText("Error");
        }
    }

    protected String getBlogStats() throws Exception {
    	String result = "";
    	// get html document structure
    	Document document = Jsoup.connect(BLOG_URL).get();
    	// selector query
    	Elements nodeBlogStats = document.select("div#blog-stats ul li");
    	// check results
    	if(nodeBlogStats.size() > 0) {
    		// get value
    		result = nodeBlogStats.get(0).text();
    	}

    	// return
    	return result;
    }
}

Remember to add INTERNET permission. Here the result on my Galaxy S II phone, which is a little chocky-cocky:

JSoup Sample

JSoup Sample

Not so much different from XPath, is it?

Cheers,

Pete Houston

About these ads
Categories: Tutorials Tags: , , , , , ,
  1. July 10, 2014 at 1:28 am

    Hey there! This is kind of off topic but I need some advice from an established blog.
    Is it tough to set up your own blog? I’m not very techincal but I can figure things out pretty fast.

    I’m thinking about setting up my own but I’m not sure where to begin. Do
    you have any ideas or suggestions? Cheers

  2. June 22, 2014 at 6:33 pm

    Can you help me? I’ve Followed everything on this tutorial and have also set permission, but it crashes on startup giving a message “unfortunately appname has stopped”

    Can you suggest anything that might be causing this?
    .

  3. aqua
    April 20, 2013 at 7:23 pm

    can you tell me what i should do in layout .xml, and what does that mean “// set layout view”

  4. Greg
    January 19, 2013 at 7:43 pm

    2.3.3 = WORK 4.03, 4.2 = EROR.. WHY? Can you help my?

  5. Lawrence Macharia
    January 3, 2013 at 12:06 pm

    for those who did not manage to make the code work, try adding this permission in your manifest

  6. December 21, 2012 at 10:54 pm

    I always return text Error, i have follow this tutorial.
    any help me?

  7. ravi
    October 30, 2012 at 1:54 am

    Thanx dude really helpfull :)

  8. cvele
    September 20, 2012 at 12:39 am

    This is not working, I get nothing, any ideas???

  9. Mehrin Anannya
    June 11, 2012 at 3:13 pm

    This is really interesting to work.
    Can anyone help me in getting all the text of under ul of div#subNavigation from the website http://www.juniv.edu/news? plz…….

  10. Mehrin Anannya
    June 11, 2012 at 3:09 pm

    Can anyone help me?
    i want to get all the text within allunder ul of the div#subNavigation from the website http://www.juniv.edu/news. how can i do this?

  11. jhon
    April 30, 2012 at 4:21 pm

    Can anybody help me..

  12. jhon
    April 30, 2012 at 4:20 pm

    Its not working for me

  13. April 8, 2012 at 2:29 pm

    This is pretty awesome! I was needing to check the state of a switch for a remote controlled garage door opener I’m working on. This seems to be the way to go about it. Do you if this will work with IP addresses? I really don’t want to put the button to activate the switch on a website – just my local lan 192.168.1.250 for example. Or do you know how to search HTML files locally? Like if you made a test site in Notepad and wanted to play around with it. file:///C:/Users/User/Desktop/test.html

  14. manivannan
    February 24, 2012 at 1:54 am

    extraordinary work .

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: