Powered by SmartDoc

A Composite/Visitor Example

XML documents often represent tree structures. In object-oriented programming, two well-known patterns or mechanisms for manipulating tree structures are composite and visitor. The concept of composite and visitor patterns comes from Design Patterns: Elements of Reusable Object-oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (Addison-Wesley, 1994; ISBN 0-201-63361-2). These patterns are useful for manipulating large, complex tree-structured documents. The composite and visitor options available in Relaxer are derived from the composite and visitor patterns described in the book, but not exactly the same.

The visitor pattern in Relaxer is actually an extension of the pattern in the book. The Relaxer pattern has two rather than only one method for visiting a node: enter() and leave(). Through these methods, Relaxer code can traverse a tree structure or signal upon entering or leaving a node. For the purposes of Relaxer, this pattern is called the tree-visitor pattern.

Relaxer can generate composite and visitor patterned code to manipulate classes that are also generated by Relaxer. Such mechanisms are convenient for document-oriented applications because models for such applications are usually represented with tree structures.

Relaxer's visitor option is useful for transformations such as from XML to LaTeX, from DocBook to XHTML, and so on. The composite option is useful for manipulating large, complex objects that have a uniform tree structure, especially because of its facility for managing parent node relations. Many Relaxer options that use visitor also use composite internally to manipulate tree structures.

Extracting Data with Relaxer

Imagine that you want to extract data from an XML document. You could use a general purpose solution such as XSLT, Perl or even sed. By using Relaxer, however, you could devise a very specific solution in Java, where you have control over the generation and maintenance of the source code. This example will show you how to do this in simple terms.

First, have a look at the following RELAX NG schema in the patterns directory of the example archive:

record.rng
<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" \
    datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <ref name="record"/>
  </start>
  <define name="record">
    <element name="record">
      <optional>
        <element name="description">
          <text/>
        </element>
      </optional>
      <oneOrMore>
        <ref name="datum"/>
      </oneOrMore>
    </element>
  </define>
  <define name="datum">
    <element name="datum">
      <mixed>
        <zeroOrMore>
          <choice>
            <ref name="subdatum"/>
          </choice>
        </zeroOrMore>
      </mixed>
    </element>
  </define>
  <define name="subdatum">
    <element name="subdatum">
      <text/>
    </element>
  </define>
</grammar>

There is a similar Relax Core schema as well:

record.rxm
<?xml version="1.0"?>
<!DOCTYPE module SYSTEM "relaxCore.dtd">
<module relaxCoreVersion="1.0">

  <interface>
    <export label="record"/>
  </interface>

  <tag name="record"/>

  <elementRule label="record" role="record">
    <sequence>
      <element name="description" type="string" occurs="?"/>
      <ref label="datum" occurs="+"/>
    </sequence>
  </elementRule>

  <tag name="datum"/>

  <elementRule label="datum" role="datum">
    <mixed>
     <ref label="subdatum"/>
    </mixed>
  </elementRule>

  <tag name="subdatum"/>

  <elementRule label="subdatum" role="subdatum" type="string"/>

</module>

The following document is an instance of both these schemas:

record.xml
<?xml version="1.0" encoding="UTF-8"?>

<record>
 <description>new part numbers</description>
 <datum>000-256-0940</datum>
 <datum>000-256-0941<subdatum>KLU</subdatum></datum>
 <datum>000-256-0942<subdatum>KLV</subdatum></datum>
 <datum>000-256-0943<subdatum>KLW</subdatum></datum>
</record>

A valid instance of these schemas, such as record.xml, has a root element of <record>, an optional <description> element, followed by one or more <datum> elements. Each <datum> element may or may not contain a <subdatum> element in its mixed content.

With the following command line, Relaxer processes record.rng with the composite and visitor patterns:

C:\Relaxer\patterns>relaxer -verbose -java -useJAXP -composite \
    -visitor record.rng

In a matter of seconds, Relaxer generates over 10,000 lines of source code, producing the following Java files:

As part of the process, Relaxer creates a number of interfaces (according to convention, these Java file names begin with the letter I). IDatumMixed.java, for example, is used by Datum.java and implemented by IDatumMixedChoice.java, RString.java, and Subdatum.java. In addition, a class is created for elements defined in the schema, that is, Record.java, Datum.java, and Subdatum.java. (The <description> element is handled in Record.java.) Utility classes, such as RString.java, RStack.java, and RVisitorBase.java, are also generated, as well as the basic, underlying classes UJAXP.java, URelaxer.java, and URVistor.java.

To examine the code easily, run Javadoc on all the Java files in the patterns directory:

C:\Relaxer\patterns>javadoc -d doc *.java

Then open doc/index.html in a browser to examine the documentation for these classes.

With this code in place, you can write your own applications to manipulate instances of record.rxm and extract some desired output from them. Here is an example of a small application that does just that. It is called RecordApp.java and is found in the patterns directory.

RecordApp.java
import java.io.File;

/**
 * A small application for processing documents.
 * @author Mike Fitzgerald
 *
 */
public class RecordApp {

    public static void main(String[] args) throws Exception {

        // Instantiate a Record with an input file
        Record r = new Record(new File(args[0]));

        // Instantiate a Datumprocessor 
        DatumProcessor dp = new DatumProcessor();

        // Traverse the input file with the processor
        URVisitor.traverse(r, dp);

        // Print the result to standard output
        System.out.println(dp.getText());

    }

}

RecordApp.java instantiates the Record class, using an XML document for an argument to its constructor. It also instantiates the DatumProcessor class, which helps to extract the desired data from the file invoked by the Record constructor. It then uses the traverse() method from the URVisitor class to look over the input document r using the little processor dp. Finally, it prints the result to standard output.

The file DatumProcessor.java follows:

DatumProcessor.java
import java.lang.StringBuffer;

/**
 * A small helper application for processing documents.
 * @author Mike Fitzgerald
 *
 */
public class DatumProcessor extends RVisitorBase {

    // Instantiate a string buffer for collecting text
    private StringBuffer buffer = new StringBuffer();

    /**
     * Behavior upon entering a <code>&lt;datum&gt;</code> element.
     * @param datum the element to be ranged over
* @return <code>true</code>, traverses child nodes; if false, skip \
    traversing child nodes
     *
     */
    public boolean enter(Datum datum) {
        String d = datum.getContentAsString();
        buffer.append("\n");
       	buffer.append("Datum: ");
        buffer.append(d);
        return (true);
    }

    /**
     * Behavior upon entering a <code>&lt;subdatum&gt;</code> element.
     * @param subdatum the element to be ranged over
* @return <code>true</code>, traverses child nodes; if false, skip \
    traversing child nodes
     *
     */
    public boolean enter(Subdatum subdatum) {
        String sd = subdatum.getContent();
       	buffer.append(" Subdatum: ");
        buffer.append(sd);
        return (true);
    }

    /**
     * Exit behavior when leaving a <code>&lt;datum&gt;</code> element.
     * @param datum the element to be ranged over
     *
     */
    public void leave(Datum datum) {
        buffer.append("\n\n");
    }

    /**
     * Gets text as a String from StringBuffer buffer.
     * @return the Stringbuffer buffer as a String
     *
     */
    public String getText() {
	return (new String(buffer));
    }

}

This class extends the RVisitorBase class, providing a few methods specifically for processing datum and subdatum nodes. Text is collected in a StringBuffer called buffer. The enter() methods define behavior upon entering the nodes, and the leave() method defines an action for when exiting a datum node. The getText() method is used in the RecordApp class to output the text in buffer. This code does not process a description node, if it exists in a <record> document

Finally, compile the program RecordApp.java. When you issue this command, all the other Java files in the patterns directory will be compiled as well:

C:\Relaxer\patterns>javac RecordApp.java

After you compile the program successfully, run it with record.xml as an argument:

C:\Relaxer\patterns>java RecordApp record.xml

You should see output that looks like this:

RecordApp output
Datum: 000-256-0940


Datum: 000-256-0941 Subdatum: KLU


Datum: 000-256-0942 Subdatum: KLV


Datum: 000-256-0943 Subdatum: KLW

That concludes the visitor/composite portion of the tutorial. See the sample/visitor, sample/namespace, and sample/mathml directories for additional composite/visitor examples.