Get a jump on the future of data transfer
Standardization of data-exchange mechanisms enables a variety of applications to read from and write to the mechanisms' data streams. Applications can work with a standardized mechanism's data stream in many ways, including ways the mechanism's developers can't foresee. To date, programmers haven't standardized most data-exchange mechanisms; they've developed individual mechanisms to work only with specific applications. Extensible Markup Language (XML) is different.
What Is XML?
XML is a specification for storing and exchanging data that the World Wide Web Consortium (W3C) created in 1996 to standardize information delivery across the Internet. The W3C defines XML as a subset of Standard Generalized Markup Language (SGML), which is a standard markup language for documents. But XML isn't a new language; instead, you can think of it as a language specification.
XML looks similar to HTML. XML uses tags and attributes to define data in the same way that HTML uses tags to define formatting. However, instead of having a fixed set of tags, as HTML has, XML lets you define the tags your XML streams use. Therefore, you can use tags to make records' content self-evident.
For example, suppose you're selling a course of instruction. You can define a record for your course using the following syntax:
<Courses>
<Course ID="ACD1000">
<Title>Advanced Component Development</Title>
<startdate>November 2, 1998 </startdate>
<duration>3</duration>
</Course>
</Courses>
Most people who read this data stream can understand its content and structure. The data stream defines your course and specifies its course ID, title, starting date, and duration. You don't need to know a proprietary language to understand this data.
XML is also useful for working with multiple records. For example, you can easily use XML to describe a situation in which a client orders two or more courses at the same time.
<Courses>
<Course ID="A1000">
<Title>Advanced Component Development</Title>
<startdate>November 2, 1998</startdate>
<duration>3</duration>
</Course>
<Course ID="B1000">
<Title>Using XML</Title>
<startdate>November 12, 1998</startdate>
<duration>2</duration>
</Course>
</Courses>
This example repeats the Course tag within the Courses tag to account for multiple user selections. XML also lets you nest tags in multiple layers to define hierarchies of data.
Defining XML tags. HTML specifications explicitly define all the tags you can use in HTML code; this convention makes browsers work. When a browser reads an H1 tag, the browser knows it needs to output the text between the H1 start and end tags as a Heading 1 style tag. In contrast, XML doesn't have any predefined tags. This characteristic makes XML perfect for transporting information between applications. When two applications exchange information, both must understand the XML stream; interoperability requires nothing else.
Applications can rely on only the data in the XML stream for definitions of XML tags, or they can refer to a Document Type Definition (DTD). A DTD describes an XML vocabularya set of definitions of the elements you can use within XML data streams that are based on that vocabulary. Each definition defines one element, and definitions can specify the elements' data type. However, the definitions don't specify elements' content.
The data stream in my previous examples uses the Courses, Course, Title, startdate, and duration tags to identify its data. The Course tag has an attribute (ID) that provides additional information about each record. By adding the ID attribute to the Course tag, I can include the course ID as part of the Course tag so that I don't need to add an ID tag to the record.
The ability to create a vocabulary for an XML data stream is handy because you can make the data structure of that vocabulary's streams clear. You can create a vocabulary to match your database definitions, your corporate procedures, and your organization's documentation. You can mold XML to fit your company's needs. This method is different from other approaches that require you to use rigid, predefined data structures.