DOM (Document Object Model)

What is DOM ?

A DOM Document is a collection of nodes, or pieces of information, organized in a hierarchy. This hierarchy allows a developer to navigate around the tree looking for specific information. Analyzing the structure normally requires the entire document to be loaded and the hierarchy to be built before any work is done. Because it is based on a hierarchy of information, the DOM is said to be tree-based, or object-based.

For exceptionally large documents, parsing and loading the entire document can be slow and resource-intensive. DOM provides an API that allows a developer to add, edit, move, or remove nodes at any point on the tree in order to create an application, while event-based models like SAX do not allow a developer to actually change the data in the original document.

What are the node types available in DOM ?

The node types are given below

Elements

Elements are the basic building blocks of XML. Typically, elements have children that are other elements, text nodes, or a combination of both. Element nodes are also the only type of node that can have attributes.

Attributes

Attribute nodes contain information about an element node, but are not actually considered to be children of the element, for example

Text

A text node is exactly the text. It can consist of more information or just white space.

Document

The document node is the overall parent for all of the other nodes in the document.

How to parse an XML file using DOM ?

To work with the information in an XML file, the file must be parsed to create a Document object.

The Document object is an interface, so it cannot be instantiated directly; generally, the application uses a factory instead. In Java environment, parsing the XML file is a three-step process:

Create the DocumentBuilderFactory. This object creates the DocumentBuilder.

Create the DocumentBuilder. The DocumentBuilder does the actual parsing to create the Document object.

Parse the file to create the Document object.

Start by creating the application, a class called NewsProcessor:

In NewsProcessor class within the try-catch block, the application creates the DocumentBuilderFactory, which it then uses to create the DocumentBuilder. Finally, the DocumentBuilder parses the file to create the Document.

How to validate the document using DOM ?

Set setValidating(true) to the DocumentBuilderFactory instance.

How to get the root element using DOM ?

Once the document is parsed and a Document is created, an application can step through the structure to review, find, or display information. This navigation is the basis for many operations that will be performed on a Document. Stepping through the document begins with the root element. A well-formed document has only one root element, also known as the DocumentElement.

How to get the children of a node using DOM ?

Once the application determines the root element, it retrieves a list of the root element’s children as a NodeList. The NodeList class is a series of items through which the application can iterate. In the below example, the application gets the children nodes and verifies the retrieval by showing only how many elements appear in the resulting NodeList:

What are getFirstChild() and getNextSibling() in DOM ?

The parent-child and sibling relationships offer an alternative means for iterating through all of the children of a node that may be more appropriate in some situations, such as when these relationships and the order in which children appear is crucial to understanding the data. A for-loop starts with the first child of the root. The application iterates through each of the siblings of the first child until they have all been evaluated. Each time the application executes the loop, it retrieves a Node object, outputting its name and value. Notice also that the elements carry a value of null, rather than the expected text. It is the text nodes that are children of the elements that carry the actual content as their values:

A Node object carries member constants that represent each type of node, such as ELEMENT_NODE or ATTRIBUTE_NODE. If the nodeType matches ELEMENT_NODE, it is an element. For every element it finds, the application creates a NamedNodeMap that contains all of the attributes for the element. The application can iterate through a NamedNodeMap, printing each attribute’s name and value, just as it iterated through the NodeList: