Android XML Adventure – Parsing HTML using JSoup
Article Series: Android XML Adventure
Author: Pete Houston (aka. `xjaphx`)
TABLE OF CONTENTS
- What is the “Thing” called XML?
- Parsing XML Data w/ SAXParser
- Parsing XML Data w/ DOMParser
- Parsing XML Data w/ XMLPullParser
- Create & Write XML Data
- Compare: XML Parsers
- Parsing XML using XPath
- Parsing HTML using HtmlCleaner
- Parsing HTML using JSoup
- Sample Project 1: RSS Parser – using SAXParser
- Sample Project 1: RSS Parser – using DOM Parser
- Sample Project 1: RSS Parser – using XMLPullParser
- Sample Project 2: HTML Parser – using HtmlCleaner
- Sample Project 2: HTML Parser – using JSoup
- Finalization on the “Thing” called XML!
=========================================
Another library used common for parsing HTML is JSoup.
Unlike HtmlCleaner, JSoup uses the concept of attributes as a selector to identify each node in HTML tree.
I suggest you should learn the basics syntax of JSoup selector before continue, http://jsoup.org/cookbook/extracting-data/selector-syntax
Well, we will do the same thing as previous article, we get the blog statistics using JSoup.
The syntax is like this: ” div#blog-stats ul li “
Literally, it means: select the node <li> inside node <ul> , which has parent is a <div> having ID value is “blog-stats“.
Download the libary JSoup and add it as “External JARs”.
Head straight to the source code to get our desire value:
package pete.android.study;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import android.app.Activity;
import android.os.Bundle;
import android.widget.TextView;
public class JSoupStudyActivity extends Activity {
// blog url
static final String BLOG_URL = "http://xjaphx.wordpress.com/";
@Override
public void onCreate(Bundle savedInstanceState) {
// set layout view
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
// process
try {
((TextView)findViewById(R.id.tv)).setText(getBlogStats());
} catch (Exception ex) {
((TextView)findViewById(R.id.tv)).setText("Error");
}
}
protected String getBlogStats() throws Exception {
String result = "";
// get html document structure
Document document = Jsoup.connect(BLOG_URL).get();
// selector query
Elements nodeBlogStats = document.select("div#blog-stats ul li");
// check results
if(nodeBlogStats.size() > 0) {
// get value
result = nodeBlogStats.get(0).text();
}
// return
return result;
}
}
Remember to add INTERNET permission. Here the result on my Galaxy S II phone, which is a little chocky-cocky:
Not so much different from XPath, is it?
Cheers,
Pete Houston

can you tell me what i should do in layout .xml, and what does that mean “// set layout view”
2.3.3 = WORK 4.03, 4.2 = EROR.. WHY? Can you help my?
Hi Greg, I’ve found one issue when using Jsoup, you can check if it works for you : https://xjaphx.wordpress.com/2013/01/29/a-note-when-using-jsoup-user-agent/
for those who did not manage to make the code work, try adding this permission in your manifest
I always return text Error, i have follow this tutorial.
any help me?
Thanx dude really helpfull
This is not working, I get nothing, any ideas???
This is really interesting to work.
Can anyone help me in getting all the text of under ul of div#subNavigation from the website http://www.juniv.edu/news? plz…….
Can anyone help me?
i want to get all the text within allunder ul of the div#subNavigation from the website http://www.juniv.edu/news. how can i do this?
Can anybody help me..
Its not working for me
This is pretty awesome! I was needing to check the state of a switch for a remote controlled garage door opener I’m working on. This seems to be the way to go about it. Do you if this will work with IP addresses? I really don’t want to put the button to activate the switch on a website – just my local lan 192.168.1.250 for example. Or do you know how to search HTML files locally? Like if you made a test site in Notepad and wanted to play around with it. file:///C:/Users/User/Desktop/test.html
extraordinary work .
Thanks