Copyright © 2006 Index Data
It's useful to think of YP2 as an interpreter providing a small number of primitives and operations, but operating on a very complex data type, namely the ``package''.
A package represents a Z39.50 or SRW/U request (whether for Init, Search, Scan, etc.) together with information about where it came from. Packages are created by front-end filters such as frontend_net (see below), which reads them from the network; other front-end filters are possible. They then pass along a route consisting of a sequence of filters, each of which transforms the package and may also have side-effects such as generating logging. Eventually, the route will yield a response, which is sent back to the origin.
There are many kinds of filter: some that are defined statically
as part of YP2, and other that may be provided by third parties
and dynamically loaded. They all conform to the same simple API
of essentially two methods:
called at startup time, and is passed a DOM tree representing that
part of the configuration file that pertains to this filter
instance: it is expected to walk that tree extracting relevant
process() is called every
time the filter has to processes a package.
While all filters provide the same API, there are different modes of functionality. Some filters are sources: they create packages (frontend_net); others are sinks: they consume packages and return a result (z3950_client, backend_test, http_file); the others are true filters, that read, process and pass on the packages they are fed (auth_simple, log, multi, session_shared, template, virt_db).
The filters are here named by the string that is used as the type attribute of a <filter> element in the configuration file to request them, with the name of the class that implements them in parentheses.
Simple authentication and authorisation. The configuration
specifies the name of a file that is the user register, which
pairs, one per line, colon separated. When a session begins, it
is rejected unless username and passsword are supplied, and match
a pair in the register.
### discuss authorisation phase
A sink that provides dummy responses in the manner of the yaz-ztest Z39.50 server. This is useful only for testing.
A source that accepts Z39.50 and SRW connections from a port specified in the configuration, reads protocol units, and feeds them into the next filter, eventually returning the result to the origin.
A sink that returns the contents of files from the local filesystem in response to HTTP requests. (Yes, Virginia, this does mean that YP2 is also a Web-server in its spare time. So far it does not contain either an email-reader or a Lisp interpreter, but that day is surely coming.)
Writes logging information to standard output, and passes on the package unchanged.
Performs multicast searching. See the extended discussion of multi-database searching below.
When this is finished, it will implement global sharing of result sets (i.e. between threads and therefore between clients), but it's not yet done.
Does nothing at all, merely passing the packet on. (Maybe it should be called nop or passthrough?) This exists not to be used, but to be copied - to become the skeleton of new filters as they are written.
Performs virtual database selection. See the extended discussion of virtual databases below.
Performs Z39.50 searching and retrieval by proxying the packages that are passed to it. Init requests are sent to the address specified in the VAL_PROXY otherInfo attached to the request: this may have been specified by client, or generated by a virt_db filter earlier in the route. Subsequent requests are sent to the same address, which is remembered at Init time in a Session object.
Some other filters that do not yet exist, but which would be useful, are briefly described. These may be added in future releases.
Command-line interface for generating requests.
Translate SRW requests into Z39.50 requests.
SRW searching and retrieval.
SRU searching and retrieval.
A9 OpenSearch searching and retrieval.
If YP2 is an interpreter providing operations on packages, then its configuration file can be thought of as a program for that interpreter. Configuration is by means of a single file, the name of which is supplied as the sole command-line argument to the yp2 program.
The configuration files are written in XML. (But that's just an implementation detail - they could just as well have been written in YAML or Lisp-like S-expressions, or in a custom syntax.)
Since XML has been chosen, an XML schema, config.xsd, is provided for validating configuration files. This file is supplied in the etc directory of the YP2 distribution. It can be used by (among other tools) the xmllint program supplied as part of the libxml2 distribution:
xmllint --noout --schema etc/config.xsd my-config-file.xml
(A recent version of libxml2 is required, as support for XML Schemas is a relatively recent addition.)
All elements and attributes are in the namespace http://indexdata.dk/yp2/config/1. This is most easily achieved by setting the default namespace on the top-level element, as here:
The top-level element is <yp2>. This contains a <start> element, a <filters> element and a <routes> element, in that order. <filters> is optional; the other two are mandatory. All three are non-repeatable.
The <start> element is empty, but carries a route attribute, whose value is the name of route at which to start running - analogouse to the name of the start production in a formal grammar.
If present, <filters> contains zero or more <filter> elements; filters carry a type attribute and contain various elements that provide suitable configuration for filters of that type. The filter-specific elements are described below. Filters defined in this part of the file must carry an id attribute so that they can be referenced from elsewhere.
<routes> contains one or more <route> elements, each of which must carry an id element. One of the routes must have the ID value that was specified as the start route in the <start> element's route attribute. Each route contains zero or more <filter> elements. These are of two types. They may be empty, but carry a refid attribute whose value is the same as the id of a filter previously defined in the <filters> section. Alternatively, a route within a filter may omit the refid attribute, but contain configuration elements similar to those used for filters defined in the <filters> section.
All <filter> elements have in common that they must carry a type attribute whose value is one of the supported ones, listed in the schema file and discussed below. In additional, <filters>s occurring the <filters> section must have an id attribute, and those occurring within a route must have either a refid attribute referencing a previously defined filter or contain its own configuration information.
In general, each filter recognises different configuration elements within its element, as each filter has different functionality. These are as follows:
<filter type="auth_simple"> <userRegister>../etc/example.simple-auth</userRegister> </filter>
<filter type="frontend_net"> <threads>10</threads> <port>@:9000</port> </filter>
<filter type="http_file"> <mimetypes>/etc/mime.types</mimetypes> <area> <documentroot>.</documentroot> <prefix>/etc</prefix> </area> </filter>
<filter type="log"> <message>B</message> </filter>
<filter type="session_shared"> ### Not yet defined </filter>
<filter type="virt_db"> <virtual> <database>loc</database> <target>z3950.loc.gov:7090/voyager</target> </virtual> <virtual> <database>idgils</database> <target>indexdata.dk/gils</target> </virtual> </filter>
<filter type="z3950_client"> <timeout>30</timeout> </filter>
Two of YP2's filters are concerned with multiple-database operations. Of these, virt_db can work alone to control the routing of searches to one of a number of servers, while multi can work with the output of virt_db to perform multicast searching, merging the results into a unified result-set. The interaction between these two filters is necessarily complex, reflecting the real complexity of multicast searching in a protocol such as Z39.50 that separates initialisation from searching, with the database to search known only during the latter operation.
### Much, much more to say!
Stop! Do not read this! You won't enjoy it at all.
This chapter contains documentation of the YP2 source code, and is of interest only to maintainers and developers. If you need to change YP2's behaviour or write a new filter, then you will most likely find this chapter helpful. Otherwise it's a waste of your good time. Seriously: go and watch a film or something. This is Spinal Tap is particularly good.
Still here? OK, let's continue.
In general, classes seem to be named big-endianly, so that FactoryFilter is not a filter that filters factories, but a factory that produces filters; and FactoryStatic is a factory for the statically registered filters (as opposed to those that are dynamically loaded).
The classes making up the YP2 application are here listed by class-name, with the names of the source files that define them in parentheses.
A factory class that exists primarily to provide the create() method, which takes the name of a filter class as its argument and returns a new filter of that type. To enable this, the factory must first be populated by calling add_creator() for static filters (this is done by the FactoryStatic class, see below) and add_creator_dyn() for filters loaded dynamically.
A subclass of FactoryFilter which is responsible for registering all the statically defined filter types. It does this by knowing about all those filters' structures, which are listed in its constructor. Merely instantiating this class registers all the static classes. It is for the benefit of this class that struct yp2_filter_struct exists, and that all the filter classes provide a static object of that type.
The virtual base class of all filters. The filter API is, on the surface at least, extremely simple: two methods. configure() is passed a DOM tree representing that part of the configuration file that pertains to this filter instance, and is expected to walk that tree extracting relevant information. And process() processes a package (see below). That surface simplicitly is a bit misleading, as process() needs to know a lot about the Package class in order to do anything useful.
Individual filters. Each of these is implemented by a header and a source file, named filter_*.hpp and filter_*.cpp respectively. All the header files should be pretty much identical, in that they declare the class, including a private Rep class and a member pointer to it, and the two public methods. The only extra information in any filter header is additional private types and members (which should really all be in the Rep anyway) and private methods (which should also remain known only to the source file, but C++'s brain-damaged design requires this dirty laundry to be exhibited in public. Thanks, Bjarne!)
The source file for each filter needs to supply:
A definition of the private Rep class.
Some boilerplate constructors and destructors.
A configure() method that uses the appropriate XML fragment.
Most important, the process() method that does all the actual work.
Represents a package on its way through the series of filters that make up a route. This is essentially a Z39.50 or SRU APDU together with information about where it came from, which is modified as it passes through the various filters.
This class provides a compatibility layer so that we have an IPC mechanism that works the same under Unix and Windows. It's not particularly exciting.
A namespace of various small utility functions and classes, collected together for convenience. Most importantly, includes the yp2::util::odr class, a wrapper for YAZ's ODR facilities.
A namespace of various XML utility functions and classes, collected together for convenience.
In addition to the YP2 source files that define the classes described above, there are a few additional files which are briefly described here:
The main function of the yp2 program.
Identical to yp2_prog.cpp: it's not clear why.
Unit-tests for various modules.
### Still to be described: ex_filter_frontend_net.cpp, filter_dl.cpp, plainfile.cpp, tstdl.cpp.