ZIG<<<< at Manchester University

11th November 2002

1. Who's who
2. What's people are doing
        2.1. Ashley
        2.2. Rob
        2.3. Mike
        2.4. Adam
        2.5. Sebastian
        2.6. Ian
3. Emergent Agenda
4. ZOOM
        4.1. Pre-connection options
        4.2. Asynchronous interfaces
        4.3. [Lunch]
        4.4. Marketing
        4.5. Extensibility
        4.6. C++ binding issues
5. SRW
        5.1. Record transfer syntax
6. CQL
        6.1. Qualifier-set names
7. ZeeRex
8. Other agenda items

1. Who's who

Ashley Sanders - Manchester University
Rob Sanderson - Liverpool University
Mike Taylor - speaking only for himself :-)
Adam Dickmeiss - Index Data
Sebastian Hammer - Index Data
Ian Ibbotson - Knowledge Integration

Needs to produce a target that searches other targets. (Yet another SuNFiSH-alike!) This came about because there was a bit of money sloshing around, and someone needed to come up with an idea for something to spend it on. It's called CC-Interop, (COPAC/Clumps Interop). Related to the M25 project which has many Z-targets, and a web front-end that searches across them. They'd like to put a Z interface on the top already. No-one knows what the existing web system is written in(!) so the fanout Z-target would be a from-scratch project.

Interested in ZOOM, wants to sort out C++ binding.

No interest in or respect for SRW :-)

2.2. Rob

[Everything that was in the prototype agenda :-)]

2.3. Mike

ZOOM marketing, ZeeRex, CQL and zSQLgate.

2.4. Adam

Wants to get specifications more in the form of interfaces - abstracting specified functionality away from being Z39.50-specific, so that we can build, for example, SRW back-ends.

Some of the C binding extensions to be merged back into the AAPI. Alternatively, some sanctioned way to extend the official ZOOM interface. The biggest hole is in the area of asynchronous operations: the C binding's solution is not great, in that it doesn't let you wait on both clients and servers as the YAZ++ proxy does.

I propose that the acid test of an IR interface is the ability to build an asynchronous proxy using it. Adam points out that there are lots of problems in the details: different event loops, thread models, etc.

Adam doesn't feel that most of the SOAP tools are powerful and general enough to address multiple servers: they assume you will use a separate thread for each server. Maybe we should just accept that and use threads in these enlightened days? Adam is reluctant, and points out that threads don't work well in the Microsoft world. In the COM world, the COM system starts the threads for you: if you start your own thread, there's no way to close your COM component down properly!

2.5. Sebastian

Sebastian has a different perspective from Adam's, because much of his work involves deploying, while Adam mostly builds tools.

One of ID's most ambitious projects is DEF, a union catalogue of about 100 Danish Library Z-servers, all more or less conforming to the Bath profile. Now that Sebastian's been actually using this stuff, he's aware of how slow it can be: searches are done in parallel, but you don't see any results until the last server responds. Sebastian's thinking about building cleverer clients in Java or Flash to display dynamic results.

Sebastian fears the impact of SRW, sending us ``back to the early nineties'' in terms of wrestling with toolkits, choosing profiles, etc. - while still having to deal with the difficult problems which are mostly semantics. The cynical perspective on this is that the confusion will produce a lot more consultancy work :-)

So this is where we are: everyone's done web gateways that search a hundred or so servers. That's not enough any more. It has to be useful, not merely interesting. That means it needs to be reliable, predictable, etc. If we try to sell SRW without these attributes, it will not get very far.

Finally, Sebastian is interested in SRW as an arena in which to reach out into broader worlds of structured information retrieval, breaking out of the library domain.

2.6. Ian

Fingers in many pies. Getting involved again in JISC projects, where Ian is not hearing much interest in SRW. People are still complaining that searching ten Z-targets is slow.

The problem with dynamic results updates comes when the fastest server returns the least relevant record, and you don't want to display it at the top. When you sort on relevance, you run into problems with different servers' relevance scores needing to be interpreted differently, not to mention deduplication.

Rachel Bruce at JISC is in charge of the Common Services Framework (which Rob's people are going to implement). In connection with this, Ian would like the distinction between collections and the servers that hold them to be made explicit, so you can say that ``this collection is hosted on those three Z-servers and this SRW server.''

Ian's colleague Rob is still pursuing local government work. In this context, SRW has one enormous (if pathetic) advantage over Z39.50 - namely, that it runs on port 80 which is open in firewalls, and people in the commercial world will flatly refuse to open port 210. [For heaven's sake! - Ed.]

In this arena too, Ian is keen on separating the ideas of collection and server, so that (for example) you can push your intranet's copy of a collection out onto a public server when you're happy with it.

3. Emergent Agenda

These seem to be the issues arising from what's people are doing:

ZOOM:
- Pre-connection options
- Asynchronous interfaces
- Marketing
- Extensibility: standard ones (like sort) but non-standard things like record update.
- Rob wants to drop the type parameters of the Query constructor. [SORTED]
SRW record transfer syntax
CQL, including its possible role as a canonical abstract query representation.
ZeeRex
SRW: why, when, where and whither
XML and Z39.50
Large records and SIT
Multiplexing Z39.50 gateways. (Ian is working on this with his HeterogeneousSetOfServer thing, Ashley needs to build one for the M25 people, and I want to do the SuNFiSH project. Index Data have already built something similar for the UNIVERSE project.)
Applications which are useful rather than merely interesting: requires asynchronicity, intelligent fallback on server failure, dynamic feedback, etc. Related to accessibility requirements. Dynamic UI programs.
Using ZING as a framework in which to rethink IR architecture.

4. ZOOM

4.1. Pre-connection options

We seem to agree that the right solution to this is to have an unconnected-connections constructor. Then you can set your options (authentication, etc.) and call conn->connect()

We must document the standard options in the ZOOM AAPI. Setting non-standard options should return an ``unknown options'' error indicator. So we need to separate the Get Option and Set Option methods.

4.2. Asynchronous interfaces

We all agree that we need to write specifications for asynchrous operations in the AAPI. The choice is between two basic models:

Event-driven - operations such as Connection.Search and Result Set.Get Record return null results when in asynchronous mode; messages are generated when the operations complete, and can be polled for.
Callback-based - operations such as searching and retrieval have an additional, optional, function-pointer parameter; when the operation completes, the result (a new Result Set, Record, etc.) is passed to the nominated function.

Consensus seems to be that the former is more flexible - you can easily build callbacks out of events, but not vice versa.

Adam will draft some prose for the AAPI.

4.3. [Lunch]

4.4. Marketing

Get Ray to make the font bigger on www.loc.gov/z3950/agency/newzing/zing-home.html
We need to get an SRW back-end behind a ZOOM front-end, preferably YAZ's ZOOM-C (which would instantly SRW-enable lots of the other bindings for free: Perl, C++, Visual Basic, Ada - but not Java, Tcl or Python.)
Mail to lists like oss4lib.
Make a ZOOM Freshmeat project.
Article in a magazine like D-LIB.

4.5. Extensibility

Adam wants guidelines in the ZOOM AAPI for specifying extensions in a way that ``doesn't make me cross''. We can't think of what such guidelines would look like - not for C, anyway: in more OO languages, we could say something like ``extensions should be implemented in subclasses'', but that makes no sense in C.

The upshot seems to be just that Adam should more clearly document which parts of the ZOOM-C API are standard ZOOM, and which are extensions. Also, some of the extensions - notably not-yet-connected connections - need to factored back into the AAPI.

4.6. C++ binding issues

Global error-information functions.
Dump these.
Global option functions.
We can dump these too: their only real use is no longer required, since we have introduced pre-connection options
Promoting recordSyntax to a class.
Yes: we need to do this, so that we can translate between the enumerated values and the strings that are passed into set_option()
Ashley's resultSet.getRecord() returns a record object, while mine returns a pointer; he's also therefore removed the clone() method.
Adam suggests we lose getRecord() in favour of a record constructor:
```
	class record {
	  public:
	  record(resultSet &rs, size_t i);
	}

	resultSet rs;
	record r = record(rs, 0);
	record *rp = new record(rs, 0);
	delete rp;
	
```
We think we don't need clone() any more except as a performance measure; and good implementations will achieve this anyway, by reference-counting or similar measures.
Remove the extraneous repeated declarations and implementations of errcode() in the non-base exception classes.
string vs. char*
Alright then, I give in.
Harmonise my exception classes with Ashley's; include initRefusedException. These exception types should probably be promoted into the AAPI, since they necessarily crop up in other bindings.

5. SRW

5.1. Record transfer syntax

[We discussed this between 4.3 and 4.4]

In the old question of whether SRW should return result-set records as XML fragments or strings, Rob suggests that we could use a well-defined Dublic Core schema, and so return DC records as XML fragments; while general records must be encoded as strings because their structure is not known in advance.

6. CQL

6.1. Qualifier-set names

The qualifier-set name in CQL qualifiers must be significant in itself, and not require looking up in a ZeeRex record to find a qualifier-set URL. The way things are at the moment:

A CQL query does not stand alone and must be interpreted in the context of a ZeeRex record.
That's no good for CQL applications that don't have ZeeRex (maybe because they are not SRW or Z39.50 applications at all).
For applications that do understand ZeeRex, there's no guarantee that a record fetched three seconds ago, which maps the qualifier-set name bath to http://www.bathprofile/what/ever is still valid for the search to be submitted now.
Broadcast searching is impossible, since in general 100 servers will require 100 different qualifier-set names, so that each must be probed to discover the name corresponding to the desired qualifier-set.
Persistent queries can't work if the saved qualifier-set name's interpretation can change under your feet.

The same argument applies to record-schema names.

These are hard problems - we can't think of a The Correct Solution. The best we can do is set up an authoritative global registry of qualifier-set names; but that may not work for record schema names, since we expect to have many more of these.

Adam's suggestions:

   "srw.prefix.dc=http:/purl.org/dublincore/qualset" dc.title=computer
   prefix dc="http:/purl.org/dublincore/qualset" dc.title=computer
   >dc="http:/purl.org/dublincore/qualset" dc.title=computer

The latter of these introduces new syntax: a search clause beginning with >, which is followed by a qualifier-set name, an equals sign, a qualifier-set identifier and a sub-query. Mmmm ... Nice!

The qualifier-set name and equals sign are optional: if they are omitted, the >-clause specifies the default qualifier set that pertains to unqualified terms in the governed sub-query.

7. ZeeRex

We think we're there with the DTD. Rob now needs to update the commentary, and I have some changes to make to the web site.

8. Other agenda items

We'll discuss the ``fluffy'' ones in the pub.

Feedback to <mike@miketaylor.org.uk> is welcome!