Searching multiple collections for vertebrate palaeontology specimens

19th May 2005

1. Introduction
        1.1. Motivating example
        1.2. Relevance Ranking
2. Implementation
        2.1. Thesauri
        2.2. Distributed Search
        2.3. Heuristic Semantic Analysis

1. Introduction

1.1. Motivating example

Imagine trying to express a search like this:

I want to find specimens stegosaur metacarpals from the Kimmeridgian found on the Isle of Wight and held in the Natural History Museum.

If the exact material you want doesn't exist, there are five degrees of freedom that a clever search-engine could slide along to find papers that would be interesting to you:

In the absence of better hits, such an engine might offer up information on Tithonian anylosaur manual phalanges from Dorset held in the OUMNH.

1.2. Relevance Ranking

### Rank by number of degrees of slippage?

### Allow users to specify which axes are most/least significant.

### View and rotate a 3d slice of the slippage space to see what areas are best represented (and which areas, because they're sparsely populated, will make good research subjects.)

2. Implementation

2.1. Thesauri

To make this work, the searching system would need to have five ``thesauri'' (in the most general sense of structured collections of authority records):

These thesauri would need to be provided by experts in the field. Experience shows that building them is usually more work than people expect, and is in any case an inexact science. That's OK: even a vague, imprecise and error-strewn thesaurus will yield useful results.

2.2. Distributed Search

### New sites can "nuzzle up to" the network.

2.3. Heuristic Semantic Analysis

### Guess which bits of title/abstract are author, taxon, etc.

Feedback to <mike@miketaylor.org.uk> is welcome!