Parsing XML using Python is a matter of few lines of code has to be written.  The XML could be in a file or XML could be a string. So basically parsing XML using Python means reading the XML node data and further doing something with those data to use in our application. Extensible Markup Language (XML) are the most widely used formats for data, because this format is very well supported by modern applications, and is very well suited for further data manipulation and customization.

Prerequisites

Have Python installed in Windows (or Unix)
Pyhton version and Packages
Here I am using Python 3.6.6 version

Preparing your workspace

Preparing your workspace is one of the first things that you can do to make sure that you start off well. The first step is to check your working directory.

When you are working in the Python terminal, you need first navigate to the directory, where your file is located and then start up Python, i.e., you have to make sure that your file is located in the directory where you want to work from.

Let’s move on to the example…

Example

In the below image you see I have opened a cmd prompt and navigated to the directory where I have put the xml file that has to be read or we will be parsing xml using Python.

python

We will be parsing XML using Python.

Creating the Python script

Now we will create a python script that will read the attached XML file in the above link and display the content in the console.

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. We will be parsing the XML data using xml.etree.ElementTree. ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

If you want to parse or read XML string then use root = ET.fromstring(book_data_as_string)

Here in the below script we first import the required module and then we load or parse the whole XML file into tree variable. Then we get the root node from the XML data. Finally we iterate through each level of the node and print the node name, node attributes and node value.

Lets create the script to read the above XML file:

import xml.etree.ElementTree as ET

tree = ET.parse('bookstore.xml')
root = tree.getroot()

for child in root:
	print(child.tag, child.attrib)
	for node in child:
		print(node.tag, node.attrib, node.text)
		for c in node:
			print(c.tag, c.attrib, c.text)

Testing the Application

Make sure you have the bookstore.xml file in C:\py_scripts directory. Now when you run the above Python script, you should see the following output in the console.

Here I could not show you the whole output but you will see the whole output in the console when you run the script.

parsing xml using python

Thanks for reading.

Tags:

I am a professional Web developer, Enterprise Application developer, Software Engineer and Blogger. Connect me on JEE Tutorials Twitter Facebook  Google Plus Linkedin Or Email Me

Leave a Reply

Your email address will not be published. Required fields are marked *