XML4Pharma
Home Services CDISC Software About us

Native XML databases

When working with XML documents, the question always arises how to store them.

Of course, you can always store XML documents in a file system, but this makes it almost impossible to manage and query them.

Another possibility is to map the structure of your database to the structure of the XML documents, e.g. by a tool that transforms the XML file (or the XML-Schema of the XML file) to a set of SQL statements that generate the tables of the database. Such a solution however is not very optimal, as it means that the database structure has to be adapted if the structure of the XML file changes.
For example, one can easily generate a database structure from a CDISC ODM XML file (MetaDataVersion element), but if the study setup changes, the database structure has to be changed too. Also, the same database structure can then not simply be used for another study (unless a complicated structure of primary and secondary keys is set up).

Another possibility is to store XML documents as BLOBS (Binary Large Objects) in a classic (relational) database, and to use the management and query tools of that database.

A new, and more efficient way is to store XML documents in so-called "native XML databases"

What are "Native XML Databases" ?

Native XML databases are databases that store XML documents and data in a very efficient way. As classic (relational) databases, they allow that data is stored, queried, combined, secured, indexed, etc..

Native XML databases are not based on tables, but on so-called containers. Each container can contain large amounts of XML documents or XML data, which have some relation between them. For example, in a clinical study, a container can store all CRFs (in XML format) of a specific study. Containers can also have subcontainers. The big difference with relational databases is that the structure of the XML data in a container does not have to be fixed. For example, you can store the CRFs in the same container as XML files containing the sites information (AdminData in the CDISC ODM). Of course, it can be wiser to have the CRFs in a different container than the sites information. Essentially, what I mean is that the relation between the XML documents in a container can be a rather loose one.

Native XML databases are not queried by SQL statements, but by XPath-expressions. XPath is a worldwide standard, set by the W3C (*), for navigating through XML documents. So when quering a native XML database, the user usually opens a container, then submits an XPath expression againts all the XML documents in the XML database. A very simple XPath expression is:

//Study[@OID="123-456-789"]

meaning: select all (documents with) Study elements where the OID attribute has the value of "123-456-789".
The system then retrieves all documents, or all Study elements, depending on the choice of the user, which conform to the selection. So, a set of XML documents or XML elements is returned.

Another, more complicated XPath expression is:

//ClinicalData/SubjectData[@SubjectKey="001"]//AuditRecord[DateTimeStamp<"2003-09-01T00:00:00"]

meaning: select all (documents with) ClinicalData elements where the SubjectData element has a SubjectKey attribute value with the value "001" AND an AuditRecord (independent on which sublevel of the SubjectData element) which has a DateTimeStamp with a date before September 1st, 2003.
So, if chosen for the option to retrieve elements, and not full XML documents, this XPath expression retrieves all AuditRecords of patient "001" which have been created before September 1st, 2003.

Advantages of native XML databases

Native XML databases are much better capable of storing, maintaining and quering large amounts of XML documents than relational databases.
Unlike relational databases, no tables have to be set up, and no complicated designs have to be made before setting up the database.
A classic database table has the disadvantage of being 2-dimensional only, so that "deeper" structure has to be implemented using secondary keys, which can make a database design pretty complicated.
The much-heared statement that native XML databases are slower than relational databases to query, is nowadays not correct anymore: the modern XML databases are surely as fast as relational databases, and easier to maintain and support (at least for XML data).

Disadvantages of native XML databases

Native XML databases are not so common yet in the pharmaceutical world. There are two main reasons for this:

Services provided by XML4Pharma

XML4Pharma has a vast amount of experience with native XML databases.
We can help you with the selection of a native XML database system, its DBMS, and with development of software for maintaining and quering an XML database.
As we have a very good knowledge of XPath, we can help you with developing the necessary XPath expressions for your user queries, or to teach your database managers and application developers how to work with XPath expressions, and how to implement them in the database application software.


(*) Note:

We use XPath 1.0 expressions here, as these are very easy to demonstrate. During the last 1-2 years, the much more powerful XPath 2.0 and XQuery languages have recently been developed.

Contact XML4Pharma
XML4Pharma, Katzelbachweg 18, 8052 Thal, Austria