Update: The support for browsing metadata hierarchies, as well as ranking (partially) based on the hierarchy level has now been enabled in the . For more information, see towards the end of this post.
This week, the VLO facted browser, now at its new host vlo.clarin.eu, has been upgraded to the new minor version 3.3. The major addition to this version is the support for advanced querying by means of a relatively intuitive syntax based on that of Lucene. Such queries can be entered directly into the VLO's search box, and may include operators such as AND, OR and NOT, and can also be used to search in specific fields.
The behaviour of the VLO when a user enters 'simple' search terms has also changed: it no longer interprets the entered string as a complete phrase, but as a set of (whitespace separated) keywords connected by an AND operator. To search for an entire phrase, simply use double quotes. For some example queries, see the bottom of this post.
Furthermore, the ranking of the search results now corresponds to the entered search term(s) in a more natural way, where matches in 'primary' fields such as the title or description result in a higher ranking than a match in a random location in the metadata document. Documents that match the exact search phrase get higher priority. Using the new search syntax, users can even influence the ranking by 'boosting' one or more search terms.
Two notable changes have been made with respect to facet mapping in this version: the facet 'continent' has been removed; and the mapping of content languages has been improved - in particular for languages that are known by mutliple names yet have a single code (in ISO639-3, e.g. Moldavian/Moldovan/Romanian), which caused duplication of content language information in some cases in earlier versions.
As of early November, the VLO also takes metadata hierarchies into account. This feature causes 'collection' records to appear in the search results with higher priority and also allow users to browse such hierarchies from within record pages in the VLO.
A good example is the first record retrieved when searching for DoBeS archive. It collects a large number of records, each of which collects resources on a specific endangered language or set of languages, often with further subdivision into types of resources, specific languages, etc. For collection records such as this one, the info page contains a section labelled 'Hierarchy' which displays an interactive tree that can be expanded to reveal the entire underlying hierarchy. Any of the (subcollection) records in the tree can be openend simply by clicking its name.
Records that are part of a hierarchy (whether they are collection records themselves or 'leaves') have a link to their parent, and provide the option to perform 'upwards expansion' on the tree, i.e. to also see the record's siblings without having to leave the record page. The DoBeS archive record illustrates this as well, since it is part of a larger collection called IMDI-corpora.
Providers of metadata that are interested in enabling or improving the presentation of the hierarchy in their metadata can find information on how such hierarchies should be represented in the FAQ item "How can I create a hierarchical collection with CMDI?". Any collection included in the VLO that follow this standard will be processed in the way described above automatically.
Example search queries
To demonstrate some of the possibilities of the advanced query syntax, here are some examples search queries that you can try for yourself. Of course, you can also use these as a basis for formulating your own search queries. Click on any of the queries to perform a search in the live VLO.
German corpus (same result set but different ranking due to phrase matching)
country:Finland AND language:Swedish AND NOT language:Finnish
A more extensive description of the avaible search options, including more examples can be found at the VLO's new help page.