2.1 Quick Start

To illustrate the usage of itools.catalog, we are going to index and search the Web! I mean, a couple of pages.

Create a new catalog

To create a new (and empty) catalog we use the function make_catalog:

    >>> from itools.catalog import make_catalog
    >>>
    >>> catalog = make_catalog('catalog_test')

The parameter is the path where the catalog will be created. The value returned by make_catalog is a catalog object, which offers an API for indexing, unindexing and searching.

Define the objects to be indexed

Objects to be indexed must inherit from the base class CatalogAware, and implement the two methods get_catalog_fields and get_catalog_values:

    >>> from itools.catalog import CatalogAware
    >>> from itools.catalog import KeywordField, TextField
    >>> from itools.html import HTMLFile
    >>>
    >>> class Document(CatalogAware, HTMLFile):
    ...     def get_catalog_fields(self):
    ...         return [KeywordField('url', is_stored=True),
    ...                 TextField('body')]
    ...     def get_catalog_values(self):
    ...         return {'url': str(self.uri), 'body': self.to_text()}
    ...

Index

Now we are going to index a couple of web pages:

    # Load support for the HTTP protocol
    >>> import itools.http
    >>>
    # Index a couple of web pages
    >>> for url in ['http://www.python.org', 'http://git.or.cz/']:
    ...     document = Document(url)
    ...     catalog.index_document(document)
    ...
    # Save changes
    >>> catalog.save_changes()

Note that all changes are made in memory, and not saved to the file system until the call to save_changes is made.

Search

Time to search:

    >>> results = catalog.search(body='python')
    >>> for document in results.get_documents():
    ...     print document.url
    ...
    http://www.python.org
    >>>