Home > Tutorials > Android XML Adventure – Parsing HTML using JSoup

Android XML Adventure – Parsing HTML using JSoup

Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)


  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!


Another library used common for parsing HTML is JSoup.

Unlike HtmlCleaner, JSoup uses the concept of attributes as a selector to identify each node in HTML tree.

I suggest you should learn the basics syntax of JSoup selector before continue, http://jsoup.org/cookbook/extracting-data/selector-syntax

Well, we will do the same thing as previous article, we get the blog statistics using JSoup.

The syntax is like this: ” div#blog-stats ul li

Literally, it means: select the node <li> inside node <ul> , which has parent is a <div> having ID value is “blog-stats“.

Download the libary JSoup and add it as “External JARs”.

Head straight to the source code to get our desire value:

package pete.android.study;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;

public class JSoupStudyActivity extends Activity {

	// blog url
	static final String BLOG_URL = "https://xjaphx.wordpress.com/";

    public void onCreate(Bundle savedInstanceState) {
    	// set layout view

        // process
        try {
        } catch (Exception ex) {

    protected String getBlogStats() throws Exception {
    	String result = "";
    	// get html document structure
    	Document document = Jsoup.connect(BLOG_URL).get();
    	// selector query
    	Elements nodeBlogStats = document.select("div#blog-stats ul li");
    	// check results
    	if(nodeBlogStats.size() > 0) {
    		// get value
    		result = nodeBlogStats.get(0).text();

    	// return
    	return result;

Remember to add INTERNET permission. Here the result on my Galaxy S II phone, which is a little chocky-cocky:

JSoup Sample

JSoup Sample

Not so much different from XPath, is it?


Pete Houston

Categories: Tutorials Tags: , , , , , ,
  1. September 17, 2015 at 2:56 am

    Hi! Your websites are loading lagging in my opinion, this
    kind of consumed just like a minute to finally load, I really
    dont know whether it is just me or your website however , twitter loaded fine for me.
    Nevertheless, Thanks for creating an incredibly lovely blog post.
    Most people who actually found this page should
    have observed this content absolutely helpful.
    I personally must tell you that you have done wonderful job with this
    and also hope to find more brilliant stuff from you.
    Immediately after taking a look at the post, I have
    book marked your web site.

  2. Molyakos
    September 10, 2015 at 5:41 pm

    Hey, why this is not work? I always return text Error, why? HELP! PLEASE!

  3. February 18, 2015 at 11:23 am

    Hey I am not sure if it’s me or maybe yuor web blog but it’s launching honestly slowly to me, it took me like a minute or so to finally
    load up but twitter works well . However , thank you
    for submitting great blog post. I suppose it has already been beneficial to plenty of people who arrived here.
    I’m hoping I will be able to get a lot more incredible things and I
    also really should compliment simply by saying you have
    carried out incredible work. I already have your site saved to bookmarks to
    look at blog you post.

  4. February 16, 2015 at 10:57 am

    Hey there Your entire web site runs up literally slow for my situation, I am not sure who’s problem is that however facebook starts really quick.
    Nevertheless, Thank you for writing an incredibly amazing article.
    Nearly everybody who actually discovered this website must
    have observed this content honestly beneficial.
    I have to mention that you have done excellent job
    with this and also wish to check out even more amazing content through you.
    To obtain additional understanding by articles you publish, I’ve
    saved this url.

  5. Aidar
    January 5, 2015 at 4:47 am

    Hi Pete!How to make progressDialog,while my app makes parsing html page. For example i have a heavy html page,sometimes i must wait the long time, while the data will not appear on device. Sometimes it looks like though the data is not loaded and the screen remains black,but after some time, data appear.
    Anyway, how to make progressDialog based on your example?

    Thanks in advance.

  6. July 10, 2014 at 1:28 am

    Hey there! This is kind of off topic but I need some advice from an established blog.
    Is it tough to set up your own blog? I’m not very techincal but I can figure things out pretty fast.

    I’m thinking about setting up my own but I’m not sure where to begin. Do
    you have any ideas or suggestions? Cheers

  7. June 22, 2014 at 6:33 pm

    Can you help me? I’ve Followed everything on this tutorial and have also set permission, but it crashes on startup giving a message “unfortunately appname has stopped”

    Can you suggest anything that might be causing this?

  8. aqua
    April 20, 2013 at 7:23 pm

    can you tell me what i should do in layout .xml, and what does that mean “// set layout view”

  9. Greg
    January 19, 2013 at 7:43 pm

    2.3.3 = WORK 4.03, 4.2 = EROR.. WHY? Can you help my?

  10. Lawrence Macharia
    January 3, 2013 at 12:06 pm

    for those who did not manage to make the code work, try adding this permission in your manifest

  11. December 21, 2012 at 10:54 pm

    I always return text Error, i have follow this tutorial.
    any help me?

  12. ravi
    October 30, 2012 at 1:54 am

    Thanx dude really helpfull :)

  13. cvele
    September 20, 2012 at 12:39 am

    This is not working, I get nothing, any ideas???

  14. Mehrin Anannya
    June 11, 2012 at 3:13 pm

    This is really interesting to work.
    Can anyone help me in getting all the text of under ul of div#subNavigation from the website http://www.juniv.edu/news? plz…….

  15. Mehrin Anannya
    June 11, 2012 at 3:09 pm

    Can anyone help me?
    i want to get all the text within allunder ul of the div#subNavigation from the website http://www.juniv.edu/news. how can i do this?

  16. jhon
    April 30, 2012 at 4:21 pm

    Can anybody help me..

  17. jhon
    April 30, 2012 at 4:20 pm

    Its not working for me

  18. April 8, 2012 at 2:29 pm

    This is pretty awesome! I was needing to check the state of a switch for a remote controlled garage door opener I’m working on. This seems to be the way to go about it. Do you if this will work with IP addresses? I really don’t want to put the button to activate the switch on a website – just my local lan for example. Or do you know how to search HTML files locally? Like if you made a test site in Notepad and wanted to play around with it. file:///C:/Users/User/Desktop/test.html

  19. manivannan
    February 24, 2012 at 1:54 am

    extraordinary work .

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: