NAME

Alvis::Pipeline - Perl extension for passing XML documents along the Alvis pipeline


SYNOPSIS

 use Alvis::Pipeline;
 $in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
                                 port => 16716);
 $out = new Alvis::Pipeline::Write(port => 29168,
                                   spooldir => "/home/alvis/spool");
 while ($xmlDOM = $in->read(1)) {
     $transformed = process($xmlDOM);
     $out->write($transformed);
 }


DESCRIPTION

This module provides a simple means for components in the Alvis pipeline to pass documents between themselves without needing to know about the underlying transfer protocol. Pipe objects may be created either for reading or writing; components in the middle of the pipeline will create one of each. Pipes support exactly one method, which is either read() or write() depending on the type of the pipe. The granularity of reading and writing is the XML document; neither smaller fragments nor larger aggregates can be transferred.

Underneath this interface the Open Archive Initiative's Protocol for Metadata Harvesting (OAI-PMH) is used. Although the Alvis pipeline is often presented in terms of documents being pushed down the pipeline from above, the implementation in fact pulls documents down from below. This is achieved by having the write() method simply store the XML document in a spooling area, where it awaits a request from a reader that will take it from that area. Therefore, write() will never block, but code that writes documents down the pipeline may not assume that a document, once succesfully written, has necessarily been successfully read by the downstream component.

The adoption of OAI-PMH as the pipeline's document-passing protocol gives us a pre-made way to express the various necessary operations, and will make it easier in future for new Alvis components to be written that can participate in the pipeline.

In general, though, document producers, filters and consumers in the Alvis pipeline that use this module need not be concerned with OAI-PMH, and can simply use the API described herein.

The documents expected to pass through this pipeline are those representing documents acquired for, and being analysed by, Alvis. These documents are expressed as XML contructed according to the specifications described in the Metadata Format for Enriched Documents. However, while this is the motivating example pipeline that led to the creation of this module, there is no reason why other kinds of documents should not also be passed through pipeline using this software.


METHODS

new()

 $in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
                                 port => 16716);
 $out = new Alvis::Pipeline::Write(port => 29168,
                                   spooldir => "/home/alvis/spool");

Creates a new pipeline, either for reading or for writing. Any number of name-value pairs may be passed as parameters. Among these, most are optional but some are mandatory:

read()

 # Read-pipes only
 $xmlDOM = $in->read($block);

Reads an XML document from the specified inbound pipe, and returns a DOM tree representing it. If there is no document ready to read, it either returns an undefined value (if no argment is provided, or if the argument is false) or blocks if the argument is provided and true. read() throws an exception if an error occurs.

Once a document has been read in this way, it will no longer be available for subsequent read()s, so a sequence of read() calls will read all the available records one at a time. This is unusual behaviour for an OAI-PMH repository, but then what we are doing here is an unusual deployment of OAI-PMH.

write()

 # Write-pipes only
 $in->write($xmlDocument);

Writes an XML document to the specified outbound pipe. The document may be passed in either as a DOM tree (XML::LibXML::Element) or a string containing the text of the document. Throws an exception if an error occurs.

(In reality, all this does is place the document in a spooling area, whence it will subsequently be picked up when the downstream component asks to read a record. But that implementation detail can be ignored.)

close()

 $pipe->close();

Closes a pipe, after which no further reading or writing may be done on it. This is important for write-pipes, as it frees up the Internet port that the under-the-hood OAI server is listening on. Tiny reading clients will also call this when they're done.


SEE ALSO

Alvis Task T3.2 - Metadata Format for Enriched Documents. Milestone M3.2 - Month 12 (December 2004). Includes a useful overview of the Alvis processing pipeline. http://www.miketaylor.org.uk/alvis/t3-2/m3-2.html

The Open Archives Initiative. http://www.openarchives.org/

The Open Archives Initiative Protocol for Metadata Harvesting Version 2.0. http://www.openarchives.org/OAI/openarchivesprotocol.html


AUTHOR

Mike Taylor, <mike@indexdata.com>


COPYRIGHT AND LICENSE

Copyright (C) 2005 by Index Data ApS.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.