In-Depth
The X Factor
As the industry battles over XML file formats, what should dev managers focus on?
The rule of etiquette for conversation in polite company is pretty simple: Never talk about politics, sex, religion or money. Now you can add XML-based document file format specifications to that list, because if Miguel de Icaza's experience is any indication, talking about XML file formats is a great way to start a bar fight.
The Novell Inc. vice president and founder of the Linux-compatible open source Mono
implementation of the .NET Framework wrote a blog posting in January that showed some support for the Microsoft Office Open XML (OOXML) specification. Despite also praising the competing, open source-backed OpenDocument Format (ODF) spec in his post, de Icaza was widely vilified for his opinion. Citing his concern about becoming a lightning rod on the issue, he declined to be interviewed for this article, but his position is well-documented.
Why is there so much passion in a debate about something as esoteric as XML-based file formats? In a word: Microsoft. The software giant in September failed to get its OOXML format ratified as an international standard by the International Organization for Standardization (ISO). The run up to that vote -- and to a second vote slated for February 2008 -- has ignited a passionate debate about file formats and suspicions about Microsoft's intentions as it pushes OOXML.
High Stakes
At issue is the way organizations will store, access and manipulate files created by productivity applications like Microsoft Office and OpenOffice. Today's binary files are often opaque to third-party applications, meaning companies can't process these files anywhere but on the client. Moving to open, XML-based file formats radically changes that, says Alexander Falk, CEO of XML tools vendor Altova Inc.
"In IT you had the data inside the SQL database and then you had all the other trash that was locked up inside of Word documents and Excel spreadsheets. All the data inside databases and that data inside documents is now going to be accessible to the organization," Falk says. "That's a quantum shift."
Also, governments and large organizations increasingly look to open, standards-based file formats to ensure that documents created today can be read and manipulated in the future -- regardless of the fate of the vendor whose software created them. Moving to functional and open XML-based file formats obviates this challenge.
In fact, the transition to XML has been underway for several years, says Brian Jones, lead program manager for Microsoft Office.
"People think that Open XML is a reaction to ODF. But really, if you look at the history of both formats, they were both developed in parallel," says Jones. "The work we did at Microsoft in Open XML started in Office 2000. We shipped that in 1999 and probably started engineering it in 1997."
A Standard by Any Other Name
The problem is, Microsoft's OOXML and OpenDocument Foundation's ODF specifications take very different tacks to solving the challenge.
ODF, published as an ISO standard in November 2006, is a compact, efficient and streamlined specification that industry watchers agree does a good job of building on other standards and components. Documents saved in ODF rely on standards-based CSS formatting for fonts, can make use of standards-based Scalable Vector Graphics (SVG) images, and are able to present equations and formulas in standards-based MathML markup language.
The ODF spec also aggressively shares resources among its component applications. As a result, the same XML structures and logic used to build tables in a word processing document are used to build tables in spreadsheet and presentation files. That approach reduces complexity, redundancy and the potential for bugs.
By contrast, the Microsoft OOXML specification takes what might be called a kitchen sink approach. The specification itself is famously 6,000 pages long when printed out. Jones contends that many of those pages contain background and reference materials. But he says the size of the OOXML spec also reflects his team's effort to reproduce the full functionality of Microsoft Office binary files in an XML schema.
"Clearly the biggest thing we're talking about is backward compatibility with the base of documents. That's why the formats were such a large effort for us to design in the first place," says Jones. "The priority-one goal was to create an XML format that could support all the existing binary documents that are out there."
That effort, however, has drawn a good deal of developer criticism.
"The need for some of the rather more interesting compatibility tags left me feeling that the designer of OOXML was being a bit lazy," writes developer Martin Owens in an e-mail to Redmond Developer News. "I mean, if you can't describe what a line spacing should be in a blank canvas of XML text without resorting to pointing at specific applications, then I'm afraid you're just not explaining a standard."
Jones contends that application-specific tags are often needed to handle unique wrinkles presented by specific versions of Office applications. In a few instances, he says, the OOXML spec even faithfully reproduces known bugs in order to maintain compatibility with software that's been designed to work with those flaws. In short: Fixing the bug in the OOXML standard would break compatibility with older software.
It's a philosophy that many developers, including Doug Franklin, don't agree with. "Technically, it's not a standard at all," Franklin writes in an e-mail. "It's an attempt to codify existing technology without codifying it at all. Any 'standard' with stuff like 'break footnotes the way Word 4 breaks them' is complete crap."
Altova's Falk agrees that the Microsoft specification is a good deal larger and less elegant than the ODF spec. But he calls concerns about the stability or usability of OOXML unfounded.
"I can't say for all files, but today everything we've looked at is valid," Falk says. "You can manipulate and edit it. Can a standard be so technically flawed that it's like three steps back and one step forward? Hypothetically, it could happen, but I don't think it's an issue with Office Open XML. We've manipulated the files and are able to get files out of there.
"Again, we looked at it from a developer's perspective, not from the perspective of, 'is it an ideal file format,'" Falk adds.
 |
"All the data inside databases and that data inside documents is now going to be accessible to the organization. That's a quantum shift." |
| Alexander Falk, CEO, Altova Inc. |
XML expert, consultant and Microsoft MVP Don Demsak argues that both technologies share a fundamental flaw-they're not really striving to be standards at all.
"I think this whole OOXML versus ODF thing is a non-issue. Both formats are just serialization formats for the object models they're associated with, and are not designed as impartial, interoperable formats," Demsak writes in an e-mail.
Gary Edwards, president of the OpenDocument Foundation, which drives ODF development, believes codified document standards should not carry forward old flaws and application nuances.
"The world is not a clean slate, but it's going to somehow make that transition of existing documents, applications and processes to XML," he says in an e-mail.
"To us, that is an open XML file format consistent with the continuing work of the W3C that also meets the following criteria: open, unencumbered, universally interoperable, totally application-platform-vendor independent, with an acceptable citizen-driven governance," Edwards writes.
XML Marks the Spot
Even if OOXML fails to earn ISO ratification, the spec is already an approved Ecma International standard and will be carried into enterprises as part of the Microsoft Office franchise. So in a sense, the question becomes: Does standardization even matter?
"ISO would be another huge step, but in my view it's already a huge success," says Microsoft's Jones. "It's already changing the world. It's affecting the world of documents and availability of documents. I don't think in any way the ISO approval will determine if Open XML is a success."
In fact, experts like Falk, Burton Group analyst Peter O'Kelly and Demsak believe the industry will likely end up with two XML-based file format alternatives: ODF and OOXML. Demsak says he's fine with that.
"I don't see any reason why we can't use both, since they're all just XML," Demsak writes. "If we try to force everyone to use one format, each group will try to find holes in the spec to insert metadata they need to persist, and the other implementers won't know what to do with it."
 |
"Microsoft's Brian Jones says the size of the OOXML spec reflects the team's effort to reproduce the full functionality of Microsoft Office binary files in an XML schema." |
Altova's Falk agrees. But he says usage of ODF and OOXML may cut broadly along geographic lines.
"There's a much bigger incentive in South America, India, Russia and China and all these other developing and emerging countries to use open source-based products, simply because of the price differential. So it may be we end up with one clear winner in Europe and North America, and another in the rest of the world," says Falk. "And that's also a good case. Being XML, it should be easy to convert between the two formats."
So what should development shops be doing to prepare? Falk urges developers to start working with ODF- and OOXML-compliant files, so they can understand the structures and get used to the tooling. He also says it's a good idea to begin studying up on the specifications.
Microsoft's Jones says developers may have to go deeper, and rethink the way they look to manipulate and process documents. "Until recently the way people have thought about working with an Office document is primarily by automating the application. So really the biggest shift is primarily thinking about that document rather than the application [that created it]."

[click image for larger view] |
| The OOXML spec comprises word processing, spreadsheet and presentation application files, but critics object to the use of proprietary vocabularies like DrawingML and VML (instead of SVG-based graphics). |
One way or the other, development shops face an XML-centric future. As the battle between the two file-format camps continues to play, O'Kelly urges a measured approach.
"This is capitalism meets the democracy of standards, and it would be outlandish for [Microsoft] to not work in their own self interest," says O'Kelly. "Maybe we could channel Ronald Reagan here in the XML standardization fight and say 'trust but validate,' rather than 'trust but verify.'"