Welcome to Blogs @ Andrew Qu
Blog Index
All blogs
Search results

Parsing XML File in Python

Summary

A brief introduction to XML. Then shows how to parse XML file in Python.

Basics of XML

XML stands for Extensible Markup Language. It is a way of describing data. The basic construct of XML looks like this:

<tag attribute="val">data</tag>
<tag> is called the opening tag. The openning tag can contain optional attribtes.
</tag> is called the closing tag.

Anything between the openning and closing tag is content or value. The content can be a simple string, more tags and sub tags or empty. 

Let's look at a simple XML file.

<?xml version="1.0" encoding="utf-8"?>
<students>
    <note>List of students</note>
    <student name="John" sex="M" year="6" class="alpha" />
    <student name="Mary" sex="F" year="6" class="beta" />
    <student name="Bob"  sex="M" year="7" class="cedar" />
    <student name="Tony" sex="M" year="7" class="cedar" />
    <student name="Julie" sex="F" year="7" class="ash" />
</students>

An XML file must contain the version as the first line. It is also a tag but has no closing tag.

In the above example, <students>, <note>, <student> are tags. </note> is a closing tag. An XML file must have one root tag. In the above example, <students> is the root tag. An XML structure represents an tree structure, see picture below. So, sometimes, a tag is also called a node. In programming, it is also referred as "element".

The content between the tag and the closing tag is the value. Therefore, in <note>List of students</note>, "List of students" is the value..

A tag can contain optional attributes. Tag <note> has no attributes. Tag <student> has 4 attributes, namely "name", "sex", "year" and "class".

Reading XML Files

Download a copy of the above XML file student.xml. Place the file in C:\temp\student.xml. It will be needed for this exercise.

  • Start Visual Studio 2010
  • Create a new project: [File] -> [New Project] ->[Other Languages] ->Python -> [Python Application]
  • Give the application name "ReadXml"
  • Click [OK]
  • In ReadXml.py file, type the following:
    download source
  • Run the program (Shift+Alt+F5)
Output:
Whole line: <note>List of students</note>
Note: List of students
>>> 
Some explanations:
  • parseString() - Converts the XML string into an tree object called DOM(dynamic object model).
  • getElementsByTagName() - returns an array of XML elements with the specified tag name. In the above code, the tag name is "note".
  • toxml() - convert the element into the original xml string. In this example, it converts to "<note>List of students</note>"
More Exercises

1. Get All Students Names and Sort

studentList = parseString(data)

# get array of all students

allStudents = studentList.getElementsByTagName('student');

# print names of all students, unsorted

for s in allStudents :
    print s.getAttribute('name');

# print names of all students, sorted

sortedNames = []
for s in allStudents :
    sortedNames.append(s.getAttribute('name'));

sortedNames.sort();
print
print 'Names sorted:'
for name in sortedNames :
    print name
Results :
John
Mary
Bob
Tony
Julie

Names sorted:
Bob
John
Julie
Mary
Tony
>>> 

2. Get All Year 7 Student Names and Sort

studentList = parseString(data)

# get array of all students

allStudents = studentList.getElementsByTagName('student');

# print names of all year 7 students, sorted

sortedNames = []
for s in allStudents :
    if(s.getAttribute('year') == '7') :
        sortedNames.append(s.getAttribute('name'));

sortedNames.sort();
print 'All year 7 students:'
for name in sortedNames :
    print name
Results :
Bob
Julie
Tony
>>> 
Ads from Google
Dr Li Anchor Profi
www.anchorprofi.de
Engineering anchorage plate design system
©Andrew Qu, 2015. All rights reserved. Code snippets may be used "AS IS" without any kind of warranty. DIY tips may be followed at your own risk.