Full list of in-depth articles -
Feedback and questions -
Request demo -
Extracting metadata from Office 2003 XSDs with SoProMach
1
| Mining XSD metadata with SoProMach |
|
2
| XML schemas: Visual browsing and textual browsing |
|
|
|
Two previous articles described how SoProMach can extract
metadata directly from relational database management systems, such as
Oracle (first article) and
SQL Server (second article). The present article describes
how SoProMach can extract metadata also from XML Schema Definition
(XSD) files.
The extracted metadata can then be used in different ways: the
example featured in this article produces a textual summary of the
metadata, suitable for textual browsing. In particular, the sample
XML schemas used in this article are the two largest XSD files
made publically available by Microsoft in its
"Office 2003: XML Reference Schemas": the Microsoft
Excel and Microsoft Word XML schemas.
Extracting metadata from an XML schema is very similar to extracting
data from an XML file: in fact, the core function of both tasks
with SoProMach can be performed
by means of Tefigel built-in function tag_file_process
(see description).
In short, the Tefigel processor calls a set of user-provided Tefigel
macros for each XML node in input. The user can define both generic
(node-independent) and node-specific macros. The Tefigel processor
transparently feeds the user-defined macros with the information
read from the XML file, by means of Tefigel variables defined
in the aforementioned description.
The Tefigel code used for this article is listed at the end of the article.
Depending on the action to be performed, a model architect or
a developer may from time to time prefer to view the XSD
information in visual mode or in textual mode, as shown in the following
diagram.
Figure 1 - Visual browsing and textual browsing
For example, navigating through the XML schema hierarchy is best accomplished
using a graphical tool. Instead, a model architect interested
in the detailed description of Microsoft Word's "verticalAlignRunType"
may find more convenient to read that description from a text file.
The screenshots
below show the same XSD files (excel.xsd and wordnet.xsd) in both modes.
Visual Studio 2005 Beta 1 allows to load and graphically visualize the
XSD files, whereas Notepad is sufficient for the
textual visualization of the corresponding files generated with SoProMach.
Figure 2 - Graphical visualization of "wordnet.xsd"
Figure 3 - Textual visualization of "wordnet.xsd"
Figure 4 - Graphical visualization of "excel.xsd"
Figure 5 - Textual visualization of "excel.xsd"
The Excel and Word XSD files from
"Office 2003: XML Reference Schemas" used as input are listed below.
The following files contain the corresponding textual descriptions produced
by SoProMach.
The XSD files used in the example are parsed with
the five Tefigel scripts listed below: overall the code amounts
to around 100 lines of Tefigel.
Written on 2 February 2005
Full list of in-depth articles -
Feedback and questions -
Request demo -
|