Home Page   We Generate Your Software.
Products Services Company Technology Demo Contact Site Map Search

Full list of in-depth articles - Feedback and questions - Request demo -


Extracting metadata from Office 2003 XSDs with SoProMach

  1  Mining XSD metadata with SoProMach
  2  XML schemas: Visual browsing and textual browsing
  Appendix A  XSD files and textual descriptions
  Appendix B  Tefigel scripts

Two previous articles described how SoProMach can extract metadata directly from relational database management systems, such as Oracle (first article) and SQL Server (second article). The present article describes how SoProMach can extract metadata also from XML Schema Definition (XSD) files.

The extracted metadata can then be used in different ways: the example featured in this article produces a textual summary of the metadata, suitable for textual browsing. In particular, the sample XML schemas used in this article are the two largest XSD files made publically available by Microsoft in its "Office 2003: XML Reference Schemas": the Microsoft Excel and Microsoft Word XML schemas.

1 - Mining XSD metadata with SoProMach

Extracting metadata from an XML schema is very similar to extracting data from an XML file: in fact, the core function of both tasks with SoProMach can be performed by means of Tefigel built-in function tag_file_process (see description).

In short, the Tefigel processor calls a set of user-provided Tefigel macros for each XML node in input. The user can define both generic (node-independent) and node-specific macros. The Tefigel processor transparently feeds the user-defined macros with the information read from the XML file, by means of Tefigel variables defined in the aforementioned description.

The Tefigel code used for this article is listed at the end of the article.

2 - XML schemas: Visual browsing and textual browsing

Depending on the action to be performed, a model architect or a developer may from time to time prefer to view the XSD information in visual mode or in textual mode, as shown in the following diagram.

Figure 1 - Visual browsing and textual browsing

Figure 1 - Visual browsing and textual browsing

For example, navigating through the XML schema hierarchy is best accomplished using a graphical tool. Instead, a model architect interested in the detailed description of Microsoft Word's "verticalAlignRunType" may find more convenient to read that description from a text file.

The screenshots below show the same XSD files (excel.xsd and wordnet.xsd) in both modes. Visual Studio 2005 Beta 1 allows to load and graphically visualize the XSD files, whereas Notepad is sufficient for the textual visualization of the corresponding files generated with SoProMach.

Figure 2 - Graphical visualization of "wordnet.xsd"

Figure 2 - Graphical visualization of "wordnet.xsd"

Figure 3 - Textual visualization of "wordnet.xsd"

Figure 3 - Textual visualization of "wordnet.xsd"

Figure 4 - Graphical visualization of "excel.xsd"

Figure 4 - Graphical visualization of "excel.xsd"

Figure 5 - Textual visualization of "excel.xsd"

Figure 5 - Textual visualization of "excel.xsd"

Appendix A - XSD files and textual descriptions

The Excel and Word XSD files from "Office 2003: XML Reference Schemas" used as input are listed below.

excel.xsd
wordnet.xsd

The following files contain the corresponding textual descriptions produced by SoProMach.

excel.txt
wordnet.txt

Appendix B - Tefigel scripts

The XSD files used in the example are parsed with the five Tefigel scripts listed below: overall the code amounts to around 100 lines of Tefigel.

xmacro/tag_node.in.tfg
xmacro/tag_node.out.tfg
xmacro/tag_node.tval.tfg
xmacro/tag_tree.in.tfg
xsd_parse.tfg


Written on 2 February 2005
Full list of in-depth articles - Feedback and questions - Request demo -

http:// www.somusar.com  / company  / news  / in_depth  / xsd_meta  - Powered by SoProMach
Copyright © 2003-2012 Somusar - Trademarks - Legal - Privacy - Webmaster