XML documents often represent tree structures. In object-oriented programming, two well-known patterns or mechanisms for manipulating tree structures are composite and visitor. The concept of composite and visitor patterns comes from Design Patterns: Elements of Reusable Object-oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (Addison-Wesley, 1994; ISBN 0-201-63361-2). These patterns are useful for manipulating large, complex tree-structured documents. The composite and visitor options available in Relaxer are derived from the composite and visitor patterns described in the book, but not exactly the same.
The visitor pattern in Relaxer is actually an extension of the pattern
in the book. The Relaxer pattern has two rather than only one method for
visiting a node: enter()
and leave()
. Through these
methods, Relaxer code can traverse a tree structure or signal upon
entering or leaving a node. For the purposes of Relaxer, this pattern
is called the tree-visitor pattern.
Relaxer can generate composite and visitor patterned code to manipulate classes that are also generated by Relaxer. Such mechanisms are convenient for document-oriented applications because models for such applications are usually represented with tree structures.
Relaxer's visitor option is useful for transformations such as from XML to LaTeX, from DocBook to XHTML, and so on. The composite option is useful for manipulating large, complex objects that have a uniform tree structure, especially because of its facility for managing parent node relations. Many Relaxer options that use visitor also use composite internally to manipulate tree structures.
Imagine that you want to extract data from an XML document. You could use a general purpose solution such as XSLT, Perl or even sed. By using Relaxer, however, you could devise a very specific solution in Java, where you have control over the generation and maintenance of the source code. This example will show you how to do this in simple terms.
First, have a look at the following RELAX NG schema in the patterns
directory of the example archive:
<?xml version="1.0" encoding="UTF-8" ?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" \ datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="record"/> </start> <define name="record"> <element name="record"> <optional> <element name="description"> <text/> </element> </optional> <oneOrMore> <ref name="datum"/> </oneOrMore> </element> </define> <define name="datum"> <element name="datum"> <mixed> <zeroOrMore> <choice> <ref name="subdatum"/> </choice> </zeroOrMore> </mixed> </element> </define> <define name="subdatum"> <element name="subdatum"> <text/> </element> </define> </grammar>
There is a similar Relax Core schema as well:
<?xml version="1.0"?> <!DOCTYPE module SYSTEM "relaxCore.dtd"> <module relaxCoreVersion="1.0"> <interface> <export label="record"/> </interface> <tag name="record"/> <elementRule label="record" role="record"> <sequence> <element name="description" type="string" occurs="?"/> <ref label="datum" occurs="+"/> </sequence> </elementRule> <tag name="datum"/> <elementRule label="datum" role="datum"> <mixed> <ref label="subdatum"/> </mixed> </elementRule> <tag name="subdatum"/> <elementRule label="subdatum" role="subdatum" type="string"/> </module>
The following document is an instance of both these schemas:
<?xml version="1.0" encoding="UTF-8"?> <record> <description>new part numbers</description> <datum>000-256-0940</datum> <datum>000-256-0941<subdatum>KLU</subdatum></datum> <datum>000-256-0942<subdatum>KLV</subdatum></datum> <datum>000-256-0943<subdatum>KLW</subdatum></datum> </record>
A valid instance of these schemas, such as record.xml
,
has a root element of <record>
, an optional
<description>
element, followed by one or more
<datum>
elements. Each <datum>
element
may or may not contain a <subdatum>
element in its
mixed content.
With the following command line, Relaxer processes record.rng
with the composite and visitor patterns:
C:\Relaxer\patterns>relaxer -verbose -java -useJAXP -composite \ -visitor record.rng
In a matter of seconds, Relaxer generates over 10,000 lines of source code, producing the following Java files:
Datum.java
IDatumMixed.java
IDatumMixedChoice.java
IRNode.java
IRVisitable.java
IRVisitor.java
Record.java
RString.java
RStack.java
RVisitorBase.java
Subdatum.java
UJAXP.java
URelaxer.java
URVisitor.java
As part of the process, Relaxer creates a number of interfaces (according to
convention, these Java file names begin with the letter I).
IDatumMixed.java
, for example, is used by Datum.java
and implemented by IDatumMixedChoice.java
, RString.java
,
and Subdatum.java
. In addition, a class is created for elements
defined in the schema, that is, Record.java
, Datum.java
,
and Subdatum.java
. (The <description>
element is
handled in Record.java
.) Utility classes, such as RString.java
,
RStack.java
, and RVisitorBase.java
, are also generated,
as well as the basic, underlying classes UJAXP.java
,
URelaxer.java
, and URVistor.java
.
To examine the code easily, run Javadoc on all the Java files in the
patterns
directory:
C:\Relaxer\patterns>javadoc -d doc *.java
Then open doc/index.html
in a browser to examine the
documentation for these classes.
With this code in place, you can write your own applications to manipulate
instances of record.rxm
and extract some desired output from them.
Here is an example of a small application that does just that. It is called
RecordApp.java
and is found in the patterns
directory.
import java.io.File; /** * A small application for processing documents. * @author Mike Fitzgerald * */ public class RecordApp { public static void main(String[] args) throws Exception { // Instantiate a Record with an input file Record r = new Record(new File(args[0])); // Instantiate a Datumprocessor DatumProcessor dp = new DatumProcessor(); // Traverse the input file with the processor URVisitor.traverse(r, dp); // Print the result to standard output System.out.println(dp.getText()); } }
RecordApp.java
instantiates the Record
class,
using an XML document for an argument to its constructor.
It also instantiates the DatumProcessor
class, which helps
to extract the desired data from the file invoked by the Record
constructor. It then uses the traverse()
method from the
URVisitor
class to look over the input document r
using the little processor dp
. Finally, it prints the result to
standard output.
The file DatumProcessor.java
follows:
import java.lang.StringBuffer; /** * A small helper application for processing documents. * @author Mike Fitzgerald * */ public class DatumProcessor extends RVisitorBase { // Instantiate a string buffer for collecting text private StringBuffer buffer = new StringBuffer(); /** * Behavior upon entering a <code><datum></code> element. * @param datum the element to be ranged over * @return <code>true</code>, traverses child nodes; if false, skip \ traversing child nodes * */ public boolean enter(Datum datum) { String d = datum.getContentAsString(); buffer.append("\n"); buffer.append("Datum: "); buffer.append(d); return (true); } /** * Behavior upon entering a <code><subdatum></code> element. * @param subdatum the element to be ranged over * @return <code>true</code>, traverses child nodes; if false, skip \ traversing child nodes * */ public boolean enter(Subdatum subdatum) { String sd = subdatum.getContent(); buffer.append(" Subdatum: "); buffer.append(sd); return (true); } /** * Exit behavior when leaving a <code><datum></code> element. * @param datum the element to be ranged over * */ public void leave(Datum datum) { buffer.append("\n\n"); } /** * Gets text as a String from StringBuffer buffer. * @return the Stringbuffer buffer as a String * */ public String getText() { return (new String(buffer)); } }
This class extends the RVisitorBase
class, providing a few
methods specifically for processing datum and subdatum nodes. Text is
collected in a StringBuffer
called buffer
. The
enter()
methods define behavior upon entering the nodes,
and the leave()
method defines an action for when exiting
a datum node. The getText()
method is used in the
RecordApp
class to output the text in buffer
.
This code does not process a description node, if it exists in a
<record>
document
Finally, compile the program RecordApp.java
. When you issue
this command, all the other Java files in the patterns
directory
will be compiled as well:
C:\Relaxer\patterns>javac RecordApp.java
After you compile the program successfully, run it with record.xml
as an argument:
C:\Relaxer\patterns>java RecordApp record.xml
You should see output that looks like this:
Datum: 000-256-0940 Datum: 000-256-0941 Subdatum: KLU Datum: 000-256-0942 Subdatum: KLV Datum: 000-256-0943 Subdatum: KLW
That concludes the visitor/composite portion of the tutorial. See the
sample/visitor
, sample/namespace
, and
sample/mathml
directories for additional composite/visitor
examples.