Database, whether it is relational or documented, is mainly used for storing the records. Each record will have its own format or structure. When it is represented or put in document without any labels, it will be little difficult for any novice user to understand. In our example above, we have shown address data. But imagine a document containing informations about books, or some order of chemicals to be added to get some solutions. This is not very easy for anyone to identify what the data is or what is the purpose of that document. Like we shown above with tags, there should be a well formatted / structured way of representing the data in the documents. Hence XML is used as one of the method to represent the data in the document.
XML is a markup language, which is mainly used to represent the structured data. Structured data is the one which contains the data along with the tag / label to indicate what is that data. It is like a data with tag as a column name in RDBMS. Hence the same is used to document the data in DDB. One may think why we need to XML rather than simply documenting the data with simple tags as shown in the contact detail example. XML provides lots of features to handle the structured data within the document.
- XML is the markup language which serves the structured data over the internet, which can be viewed by the user easily as well as quickly.
- It supports lots of different types of applications.
- It is easy to write programs which process XMLs.
- This XML does not have any optional feature so that its complexity can increase. Hence XML is a simple language which any user can use with minimal knowledge.
- XML documents are created very quickly. It does not need any thorough analysis, design and development phases like in RDBMS. In addition, one should be able to create and view XML in notepad too.
All these features of XML make it unique and ideal to represent DDB.
A typical XML document begins with <?xml..?>. This is the declaration of xml which is optional, but is important to indicate that it is a xml document. Usually at this beginning line version of the xml is indicated.
<?xml version=”1.0” ?>
Once xml is started, one can write any structured data as like we did in above address example with tags. The data represented with tags are called xml elements and the specifications about the elements are called its attributes. In other words attributes are the name – value pairs appear inside the start tag of the xml element. In below example<Contact>, <Name>, <ApartmentNum> etc are the xml elements and the category = “ADDRESS” is the attribute of element contact. XML element will have some valid data within it and it will be always enclosed within <xml_element> and </xml_element>. Sometimes elements can be dummy – without any data but it may indicate the valid meaning related to the document or other data. This is represented as <xml_element/>. Any open element must be closed. Xml elements can have sub elements inside it. In below example, contact has different sub elements to represent the address. Order of closing the sub element is last opened element is closed first.
Attributes are used to give more meaning to the data represented within the elements. Here attribute indicates what type of contact details are listed like address, phone, email etc.
<Contact category=”ADDRESS”> <Name> Rose Mathew </Name> <ApartmentNum>APT 201 </ ApartmentNum> <AppName> Lakeside terrace 1232 </AppName> <Street>Lakeside Village Drive </Street> <Town> Clinton Township </Town> <State> MI </State> <Country> US </Country> </Contact>
These elements, attributes are all known as nodes in the document. In short, nodes are the tags / labels in the document. There are 7 types of nodes in the xml documents.
- Root : This is the beginning of all the nodes in the document. In our example above contact is the root node.
<Contact >
- Element : This is the any node in the document that begins with <name> and ends with </name>.
<ApartmentNum>APT 201 </ ApartmentNum> <AppName> Lakeside terrace 1232 </AppName>
- Text : This is the value of the element node. In below example, ‘Rose Mathew’ is a text node.
<Name> Rose Mathew </Name>
- Attribute : This is the node within the beginning element of the document which specifies more details about the element. It contains name and its value pair always.
<Contact category=”ADDRESS”>
- Comment : This node contains the comment or the description about the data, element or attribute or anything. But it has nothing to do with the actual data. Its only for understanding the document. It is starts with <!– and ends with –>.
<!-- This is the comment node -->
- Processing Instruction : This is the node which gives the instruction to the document like sort, display, or anything to do with document. It is always a child node beginning with <? and ending with ?>.
<?sort alpha-ascending?> <?StudentNames <Fred>, <Bert>, <Harry> ?>
- Namespace : Namespace indicates to which bucket the elements belong to. For example, there would same element names used in the document which will have different meaning in their contest – state in address and state for STD code. In order to differentiate this we use namespace.
<Address: State> <Phone: State>
Now it is clear what each state is for. This is similar to appending table name or its alias name before the column names in SQL.
These are the very basic things that we need to know while creating an xml document. Let us see how it can be applied in DB.