using xml attributes

One of the more common questions I get from those first learning XML relates to the use of attributes versus the use of elements. That is, “When should I use an attribute to capture data and when should I use an element?”

The answer depends on your philosophy. A good heuristic is that described in XML Attributes section of the W3Schools’ XML Tutorial.

I don’t agree with the entire discussion provided by the tutorial, and I will address those issues later, but overall I think W3Schools provides a great collection of tutorials for those getting started in a particular web technology and I find myself using it as a quick reference.

Back to the issue at hand. In a nutshell,

Use elements to store data and attributes to store metadata.

Roughly, we can think of metadata as “data about data”. Let’s look at some examples.

XML elements represent a chunk of information. They can be empty and simply indicate the presence of something. They can also be rather complex where the data they describe is done via a deep construct. For an example of the former, consider the horizontal rule element used in XHTML:

<hr />

It indicates the presence of a division line in a document. A common attribute used in conjunction with the horizontal rule element is the style attribute. For example,

<hr style="height:5px;"/>

Here, the style describes the horizontal rule. Let’s look at a more data-oriented example.

<address lastValidated="01-jan-2007">
  <street>1400 Washington Ave</street>
  <city>Albany</city>
  <state>NY</state>
  <zip>12222</state>

The address element is a container for the complex information that is an address which consists of a street, city, state and zip code. The attribute lastValidated tells you something about the address, in this case when the address was checked to ensure it was still correct.

In both examples, the information remains the same independent of attribute. For contrast, let’s look at an example where attributes are used to store the information.

<address street="1400 Washington Ave"
        city="Albany"
        lastValidated="01-jan-2007"
        state="NY"
        zip="12222" />

Here, the element <address> doesn’t exactly contain the data. As well, the information is entangled with the metadata. While it is possible to force a specific element order on an XML structure, attributes can appear in any order. Since they serve to modify the meaning of the data, they really do not require an order.

Another use of attributes as metadata is providing insight inot the structure of an element. For example, suppose we are capturing a customer order. Customers are able to specify the same or different billing and delivery addresses. One option would be as follows

<order>
  <customer>
    <customerName>Dr. Koch</customerName>
    <billingAddress>
      <street>1400 Washington Ave</street>
      <city>Albany</city>
      <state>NY</state>
      <zip>12222</zip>
    </billingAddress>
  </customer>
  <shipTo>
    <recipient>Dr. Koch</recipient>
    <recipientAddress>
    <street>1400 Washington Ave</street>
    <city>Albany</city>
    <state>NY</state>
    <zip>12222</zip>
    <recipientAddress>
  </shipTo>
  [other order info]
</order>

However, this is redundant since the “shipTo” address is the same as the “billing” address. We can use attributes to minimize the repetition of data. Applying this notion to our example, we have the following alternate structure:

<order>
  <customer>
    <customerName>Dr. Koch</customerName>
    <billingAddress>
      <street>1400 Washington Ave</street>
      <city>Albany</city>
      <state>NY</state>
      <zip>12222</zip>
    </billingAddress>
  </customer>
  <shipTo isBillingAddress="true" />
  [other order info]
</order>

Now, let’s address some of the comments made in the section “Avoid Using Attributes?” of the W3Schools discussion of attributes. Really the subtitle should be “Avoid Using Attributes to Store Data” as that’s what the discussion is trying to address.

Two of the points made in the discussion directly address structural limitations of attributes
  • attributes cannot contain multiple values (child elements can)
  • attributes cannot describe structures (child elements can)
If we interpret the first to mean the same attribute cannot appear twice in an opening tag for an element, then both are true. This does not diminish the capacity of attributes but rather it forces us to design better structures. If we need to have repeated or more complex attributes, we could define structures to accommodate this. On the other hand, if the first point is to assert a given attribute cannot have multiple values, I would have to disagree. Consider the class attribute for XHTML tags when using a CSS. It may have multiple values. This is not to suggest the wide spread use of attributes in this fashion but it is possible.

Another point made was
  • attributes are not easily expandable (for future changes)
This is a little unclear and, again, perhaps inaccurate. Considering that attributes can be defined via types and types can be extended, in a sense attributes can be expanded. In light of the use multiple values for attributes, a form expandability can also be achieved.

The final two points made in the discussion are
  • attributes are more difficult to manipulate by program code
  • attribute values are not easy to test against a Document Type Definition (DTD) – which is used to define the legal elements of an XML document
I have no idea how the first is true. Every standard technology (XSLT, DOM, SAX, etc.) used to access and manipulate XML documents has a way to access and manipulate attributes. With the evolution of XPath, this argument is even stronger.

The second statement perhaps has some validity. But DTD’s are not the only schema language in town. XML Schema Definition (XSD) is another and arguably more powerful standard. Using XSDs, attributes are easily validated.

Of course this raises the question of whether or not XSDs can completely replace DTDs. That’s a topic for different discussion.

No comments:

Post a Comment