Home > Tutorials > Android XML Adventure – Parsing XML Data with DOM

Android XML Adventure – Parsing XML Data with DOM


Article Series: Android XML Adventure

Author: Pete Houston (aka. `xjaphx`)

TABLE OF CONTENTS

  1. What is the “Thing” called XML?
  2. Parsing XML Data w/ SAXParser
  3. Parsing XML Data w/ DOMParser
  4. Parsing XML Data w/ XMLPullParser
  5. Create & Write XML Data
  6. Compare: XML Parsers
  7. Parsing XML using XPath
  8. Parsing HTML using HtmlCleaner
  9. Parsing HTML using JSoup
  10. Sample Project 1: RSS Parser – using SAXParser
  11. Sample Project 1: RSS Parser – using DOM Parser
  12. Sample Project 1: RSS Parser – using XMLPullParser
  13. Sample Project 2: HTML Parser – using HtmlCleaner
  14. Sample Project 2: HTML Parser – using JSoup
  15. Finalization on the “Thing” called XML!

=========================================

In this article, I’d like to introduce you the concepts of parsing XML using DOM (Document Object Model).

In DOM, everything is treated like a node, as you know in data structure; and all nodes link together and form as a whole, called `Document Tree`. There’s always a node called `ROOT` which contains the links to all other child nodes. In a XML document, there is one and only one `ROOT`, often called `Root Element` or `Document Element`.

I assume that you’re already know what XML is, and what kinds of data it holds. In DOM, a node is an interface that presents all other XML objects.

So what can a `Node` in DOM be?

a DOM Node presents a XML object

a DOM Node presents a XML object

Reference to Android – DOM Node Class

That’s the concept of DOM. About how it works, to avoid lots of words to read that make you feel tiring and boring, I’ve visualized for a better explanation and presentation:

DOM - XML Parsing Process

DOM - XML Parsing Process

From `DocumentBuilderFactory`, create an `DocumentBuilder`, at this point, it needs an input for XML data (like a string or an input stream…) in order to create a `Document` that represents a DOM tree. For parsing, it starts with retrieving the very very first `RootElement`, afterward you can choose which XML tags in document to find and parse them by using `getElementByTagName()`.

Have you understood how DOM works for now? Let’s go for a practice, we will use the input XML from previous article to parse. Having the same layout and main application interface, we just need to update the `StudyParser.java` implementing DOM methods.


package pete.android.study.parser;

import java.io.IOException;
import java.io.InputStream;

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import pete.android.study.data.Study;

public class StudyParser {
	public static Study parse(InputStream is) {
		// create new Study object to hold data
		Study study = null;

		try {
			// init Study object
			study = new Study();
			// create Document object
			Document xmlDoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(is);
			// get `root` Element, the first thing always should be done!!!
			Element root = xmlDoc.getDocumentElement();
			// collect child tags need to be parsed <study>
			NodeList listStudyNodes = root.getElementsByTagName(Study.STUDY);
			// iterate through list nodes of <study> tag
			for(int i = 0; i < listStudyNodes.getLength(); ++i) {
				// get current node
				Node curNode = listStudyNodes.item(i);
				// this tag is <study>, get `id` attribute first
				study.mId = Integer.parseInt( ((Attr)(curNode.getAttributes().item(0))).getValue() );

				// get all child tags inside <study>
				NodeList listChilds = curNode.getChildNodes();
				for(int j = 0; j < listChilds.getLength(); ++j) {
					// get a child
					Node child = listChilds.item(j);
					// if this node is ELEMENT type
					if(child.getNodeType() == Node.ELEMENT_NODE) {
						// get element name
						String childName = child.getNodeName();
						// get element text content
						String childValue = ((Element)child).getTextContent();
						// if this tag is <topic>
						if(Study.TOPIC.equalsIgnoreCase(childName)) {
							study.mTopic = childValue;
						}
						// if this tag is <content>
						else if(Study.CONTENT.equalsIgnoreCase(childName)) {
							study.mContent = childValue;
						}
						// if this tag is <author>
						else if(Study.AUTHOR.equalsIgnoreCase(childName)) {
							study.mAuthor = childValue;
						}
						// if this tag is <date>
						else if(Study.DATE.equalsIgnoreCase(childName)) {
							study.mDate = childValue;
						}
					}
				}
			}
		// of course, handling exception
		} catch (ParserConfigurationException e) {
			study = null;
		} catch (SAXException e) {
			study = null;
		} catch (IOException e) {
			study = null;
		}

		// return Study object
		return study;
	}
}

Reading the comment in the source code for inner-sight explanation.

These are some notes I’d suggest you when implementing DOM method:

1. A `Node` can re-present any XML object, so always check `Node`’s type before using it.

2. Always convert `Node` to its correct type for better commands (methods, properties).

3. If XML document is really deep (lots of childs’ of child tag), I suggest not to use DOM since it would really reduce the performance, since it needs time to construct and search for every single node.

4. If you use DOM a lot, always remember to refer to Developers’ API (Java SE 7 or Android DOM Reference).

OK! It’s done for now, try to practice as much as possible to learn it. I’ve just created and showed you the way, the fundamentals out of it. You need to keep walking by yourself if you want to study more about DOM.

Good luck and see you in the next article!

Cheers,

Pete Houston

  1. December 31, 2012 at 4:36 pm

    Everyone loves it when folks come together and share ideas.
    Great site, continue the good work!

  1. No trackbacks yet.

Leave a comment