XML (Extensible Markup Language) is THE new format for exchange of information within the business world. Whereas until recently, exchange of information (whether it be data, documents, etc..) was plagued by the existence of thousands of differents formats, XML has in very short time become the "defacto" new standard for electronic information exchange. You may not realise it, but when you shop on the internet, use a search engine, or check all the possible train connections between Basel and Berlin, information streams are in XML.
Why has XML taken such a high flight in such a short time ? There are some simple reasons:
When exchanging information, people can agree on the format. Unfortunately, thousands of formats exist. This misery is over when using XML. For many applications, it may be useful to agree on a common datastructure. Such a datastructure is described in either a DTD or in an XML-Schema. These describe how the data is organised and what its hierarchy is. We will here not go into technical details, but we can say that many communities have already agreed upon a common data structure. As such, many XML-extensions have seen the light in the recent years, e.g. MathML (Mathematical XML), CML (Chemical XML), BSML (Bioinformatics Sequence Markup Languegae), GEML (Gene Expression Markup Language), Real Estate Transaction Information XML, FpML (Financial products Markup Language), and many many others (see e.g. The XML Cover Pages). They all enable the exchange of information between members in a community (e.g. all the mathematicians in the world, all the real-estate brokers in the US, etc..
The successor of DTD is XML-Schema. DTDs have several disadvantages:
But now there is XML-Schema, with the following advantages: Furthermore, XML-Schema has very strong datatyping: a large number of primitive data types have been defined in advance, ranging from integers, boolean to dates (ISO-format) to something like a "QName". A full list of the primitive datatypes can be found at the W3C website.
Examples of recent XML-Schemas for use in Life Sciences and Pharma are:
XML separates content from presentation. The owner of the content can prepare his/her work using a word processor, or an XML-editor such as XMLSpy or XMLNotePad or Liquid XML Studio (there are many others).
The content manager does not have to bother how the data will be presented later. He/She doesn't have to know or understand about presentation anyway. Think about those long painful days where a content manager and a computer specialist were sitting together to put al this information in good-looking HTML pages !
The computer specialist can concentrate on building the presentation, not being bothered by ever changing content. He/She just builds up an ever-reusable stylesheet (XSL stylesheet) that can also be used when the content changes and when another document with the same structure is generated.
StylesheetsConstructing stylesheets is a real specialism. However the learning curve is well-worth going through. Once
the skill mastered, the company will save many expensive working hours (time is money) and profit from the reusability
of both XML and XSL.
An example of an XSL stylesheet can be found here. This stylesheet makes it possible to view an XML-file directly
in MS Internet Explorer. Another example, using the CDISC-ODM standard,
is also available.
Other stylesheets can be constructed to transform the XML document to WAP, to HTML, to PDF. The transformation
is usually performed at the server side.
For one set of XML-files (e.g. having the same DTD), yet-existing or not-yet-existing (e.g. as they are generated
by a database querie), one can develop several stylesheets, e.g. one (or more) for presentation on a web page,
one for WAP, one for generation of Adobe PDF files, one for generating a Word document.
One can even think about a stylesheet that works in a package for developing layouts on packaging cartons !
Such software packages usually work on Apple MacIntosh computers, but due to the portability of XML this does not
create any problem.
For example, the above stylesheet
presents the XML data (patient information) onto a web page.
The same stylesheet is used for generating (automatically) the English,
German, French,
Italian or any other language web page. The presented stylesheet
is an extremely simple one.
Another stylesheet can be used to transform the Spirig Nosedrops data to a PDF-file.
The stylesheet can be found here.
The transformation to PDF needs a special processing engine, which resides on the server.
Transformation to PDF is especially important as PDF documents cannot be manipulated, can be provided with electronic
signatures, and PDF is the common format for submission to regulatory authorities.
A much more complicated example of transformation to PDF : a CDISC-ODM example (Clinical data report).
Stylesheets can also be used to filter information, add information, add images, links, JavaScript, Java applets, and many many more. The essence is: the pure data in the XML file, all presentation in the XSL file.