HOWTO use the search capabilities of CPS

Revision: $Id$

Contents

1   The (different) Search Capabilities of CPS

There are 2 ways to search for documents in CPS:

  1. Skim through all the documents and test the documents one by one.
  2. Query the indexes in the portal_catalog tool.

Of course, the first method is the more trivial but it can only be used on a web site with a limited number of documents or inside a specific folder with a limited number of documents.

The second method is an optimized method based on indexes. It is more complex to use/maintain/modify for a developer, but it is far superior in term of speed.

The search capabilities of CPS are based on querying the indexes in the portal_catalog tool.

2   Search Hooks/Plugs in CPS

There are 2 search hooks, or search plugs, in CPS, that is points where one can query for search results:

One can call search.py in one's Python code (read doc directly in the source code of search.py), while one can redirect a user browser to an URL like: http://localhost:8080/cps/workspaces/search_form?portal_type=File&Title=myfile

Note that an "advanced_search_form" also exists but is just a user interface, you cannot pass arguments in the URL to directly display results: CPSDefault/skins/cps_default/advanced_search_form.pt

3   Search Indexes

There are as many indexes in the portal_catalog tool as there are fields we want to index in documents.

There are many different types of indexes. One should use them according to the value of the document fields to index.

For fields that hold text values, there are 3 possible index types:

The TextIndex index type is a deprecated type. It should not be used anymore.

4   Search Query Syntax

The "Searching and Categorizing Content" chapter of the Zope Book is very helpful in explaining how to do searches through indexes: http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition/SearchingZCatalog.stx

4.1   Text index Query parser

For ZCTextIndex one can read the doc in the source code of Zope-2.7/lib/python/Products/ZCTextIndex/QueryParser.py :

This particular parser recognizes the following syntax:

Start = OrExpr

OrExpr = AndExpr ('OR' AndExpr)*

AndExpr = Term ('AND' NotExpr)*

NotExpr = ['NOT'] Term

Term = '(' OrExpr ')' | ATOM+

The key words (AND, OR, NOT) are recognized in any mixture of case.

An ATOM is one of the following:

  • A sequence of characters not containing whitespace or parentheses or double quotes, and not equal (ignoring case) to one of the key words 'AND', 'OR', 'NOT'; or
  • A non-empty string enclosed in double quotes. The interior of the string can contain whitespace, parentheses and key words, but not quotes.
  • A hyphen followed by one of the two forms above, meaning that it must not be present.

An unquoted ATOM may also contain globbing characters. Globbing syntax is defined by the lexicon; for example "foo*" could mean any word starting with "foo".

When multiple consecutive ATOMs are found at the leaf level, they are connected by an implied AND operator, and an unquoted leading hyphen is interpreted as a NOT operator.

Summarizing the default operator rules:

  • a sequence of words without operators implies AND, e.g. foo bar.
  • double-quoted text implies phrase search, e.g. "foo bar".
  • words connected by punctuation implies phrase search, e.g. foo-bar.
  • a leading hyphen implies NOT, e.g. foo -bar.
  • these can be combined, e.g. foo -"foo bar" or foo -foo-bar.
  • "*" and "?" are used for globbing (i.e. prefix search), e.g. foo*.

4.2   Advanced search topics

With some indexes it is possible to use AND/OR/NOT/etc/ associations. It is possible to modify those associations, for example to use French associations: ET/OU/PAS/NON/etc. To do that, special query parsing should be added to the tool.

It is also possible to drop left truncation feature.

TODO: explain why it is interesting

TODO: explain how

5   Search in Directories

Currently there are inconsistencies in the CPS user interface concerning wild-cards and empty queries. In those cases, the search results in directories are not the same as those in the local roles management forms and in the document search forms.

This is a bit confusing but we will try to harmonize all the kinds of search as soon as possible.