Using SDF (Structured Data File) with DRB APIŽ - PART 1

CAUTION: This tutorial is under construction!!!

Details of the tutorial

Purpose This tutorial aims to help beginners to work with DRB APIŽ using SDF (Structured Data File) for describing their dedicated data so that one can read it (the writing facilities offered by SDF will be adressed in another dedicated tutorial).
Note: The presented samples are ordered by increasing complexity.
Target audience Beginner
Required knowledge Before you continue you should have a basic understanding of the following:
  • XML
  • XML Schema
  • XPath (basis)
If you want to study these subjects first, find the corresponding tutorials on the dedicated list .
What you will learn in this tutorial In this tutorial you will learn the basis of SDF usage. Especially how to use SDF markups in your XML Schema files, and how to declare your schema in an XPath so that it can be used by DRB APIŽ.

Prerequisites

You may find usefull to configure your environnment by using an alias, so that the command of this tutorial appears simpler.
The following definition is for Bash shell:

alias drb='java -jar [DRB_INSTALLATION_HOME]/lib/java/drb-[M]-[m]-[t].jar'

SDF Overview

SDF purpose is to descibe your data (when it is in ASCII and/or in Binary) using dedicated markups direcly inside your XML Schema so that it can be parsed as a structured hierarchy of nodes.
Note: For more detail please look at the "Concepts" chapter of the DRB APIŽ Handbook .

At any time you can obtain help about SDF markups using the dedicated command:

drb --help sdf [name]

Where name is an optional parameter specifying a SDF markup name.

Important notice: Remember that this tutorial is just an introduction to SDF, so it didn't pretend to describe all of its possibilities. It rather introduce to some simple use cases.

SDF Usage overview

This section present the SDF usage step by step, so that you can have a whole understanding of the global process:

  1. Create your schema using appropriate SDF markups
    1. Convert your data specification to XML Schema file (using SDF markups)
    2. Validate the edited XML Schema file
  2. Use the dedicated syntax in XQuery path
The following sections will details each of theses steps for differents use cases.

CASE 00: XML Data

In case of XML data in input, you don't need SDF markups to describe your data since DRB APIŽ is already able to read it.

CASE 01: ASCII data

Having the asciifile.txt :

NAME=John Doe
GENDER=M
AGE=32
WEIGHT=75.6

Following the instructions provide by a specification document (this document does not really exist in our study case ) we have wrote the XML Schema document asciifile.xsd :

<?xml version="1.0" encoding="UTF-8"?>
 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:sdf="http://www.gael.fr/2005/04/drb/sdf"
           elementFormDefault="qualified"
           attributeFormDefault="unqualified">
 
   <!-- Definition of the root node which contains all the content -->
   <xs:element name="Human_List">
      <xs:complexType>
         <xs:annotation>
            <xs:appinfo>
               <sdf:block>
                  <sdf:encoding>ASCII</sdf:encoding>
               </sdf:block>
            </xs:appinfo>
         </xs:annotation>
         
         <xs:sequence>
 
            <xs:element name="Human" type="HumanType"
                        maxOccurs="unbounded">
               <xs:annotation>
                  <xs:documentation xml:lang="en">
                  A small set of informations related to a human person.
                  </xs:documentation>
               </xs:annotation>
            </xs:element>

         </xs:sequence>
      </xs:complexType>
   </xs:element>


   <xs:complexType name="HumanType">
      <xs:sequence>
 
         <xs:element name="Name" type="NameType">
            <xs:annotation>
               <xs:documentation xml:lang="en">
               The name of the Human person.
               </xs:documentation>
            </xs:annotation>
         </xs:element>
         
         <xs:element name="Gender" type="GenderType">
            <xs:annotation>
               <xs:documentation xml:lang="en">
               The gender of the Human person.
               </xs:documentation>
            </xs:annotation>
         </xs:element>
         
         <xs:element name="Age" type="AgeType">
            <xs:annotation>
               <xs:documentation xml:lang="en">
               The age of the Human person.
               </xs:documentation>
            </xs:annotation>
         </xs:element>
         
         <xs:element name="Weight" type="WeightType">
            <xs:annotation>
               <xs:documentation xml:lang="en">
               The weight of the Human person.
               </xs:documentation>
            </xs:annotation>
         </xs:element>

      </xs:sequence>
   </xs:complexType>

   <xs:simpleType name="NameType">
      <xs:annotation>
         <xs:documentation xml:lang="en">
         In this type definition note that the sdf:encoding is already
         inherited from parents (see 'Human_List') therefore in this case
         it is optional.
         </xs:documentation>

         <xs:appinfo>
            <sdf:block>
               <sdf:encoding>ASCII</sdf:encoding>
               <sdf:padding type="header"
                            content="NAME=">5</sdf:padding>
               <sdf:delimiter>&#10;</sdf:delimiter>
            </sdf:block>
         </xs:appinfo>
      </xs:annotation>
      
      <xs:restriction base="xs:string"/>
   </xs:simpleType>
   
   <xs:simpleType name="GenderType">
      <xs:annotation>
         <xs:documentation xml:lang="en">
         </xs:documentation>

         <xs:appinfo>
            <sdf:block>
               <sdf:encoding>ASCII</sdf:encoding>
               <sdf:padding type="header"
                            content="GENDER=">7</sdf:padding>
               <sdf:delimiter>&#10;</sdf:delimiter>
            </sdf:block>
         </xs:appinfo>
      </xs:annotation>
      
      <xs:restriction base="xs:string">
         <xs:length value="1"/>
         <xs:enumeration value="M"/>
         <xs:enumeration value="F"/>
      </xs:restriction>
   </xs:simpleType>
   
   <xs:simpleType name="AgeType">
      <xs:annotation>
         <xs:documentation xml:lang="en">
         </xs:documentation>

         <xs:appinfo>
            <sdf:block>
               <sdf:encoding>ASCII</sdf:encoding>
               <sdf:padding type="header"
                            content="AGE=">4</sdf:padding>
               <sdf:delimiter>&#10;</sdf:delimiter>
            </sdf:block>
         </xs:appinfo>
      </xs:annotation>
      
      <xs:restriction base="xs:int"/>
   </xs:simpleType>
   
   <xs:simpleType name="WeightType">
      <xs:annotation>
         <xs:documentation xml:lang="en">
         </xs:documentation>

         <xs:appinfo>
            <sdf:block>
               <sdf:encoding>ASCII</sdf:encoding>
               <sdf:padding type="header"
                            content="WEIGHT=">7</sdf:padding>
               <sdf:delimiter>&#10;</sdf:delimiter>
            </sdf:block>
         </xs:appinfo>
      </xs:annotation>
      
      <xs:restriction base="xs:float"/>
   </xs:simpleType>

</xs:schema>

The command used to dump the content of the file:

drb -s "asciifile.txt/(asciifile.xsd)*"

The output of the command maybe:

<Human_List>
   <Human>
      <Name>John Doe</Name>
      <Gender>M</Gender>
      <Age>32</Age>
      <Weight>75.6</Weight>
   </Human>
</Human_List>

Explanations:

  • The SDF markups are always put inside a sdf:block node, as in the following:
    <xs:annotation>
       <xs:appinfo>
          <sdf:block>
             <!-- PUT HERE THE SDF MARKUPS -->
          </sdf:block>
       </xs:appinfo>
    </xs:annotation>
  • The ASCII encoding is specify by the dedicated markup:
    <sdf:encoding>ASCII</sdf:encoding>
    Note that this markup is inherited so that you only have to specify it once in the top level node.

CASE 02: Binary Data

CASE 03: ASCII and Binary Data

This section only exist to focus on the fact that SDF can work on data file containing some ASCII part and Binary part. The resulting XML Schema describing this type of file is just a mix of the two previous cases presented above.