
  ********************************************************
  * ===============                                      *
  * IMPORTANT NOTE:                                      *
  * ===============                                      *
  *                                                      *
  * Before calling any PyLucene API that requires the    *
  * Java VM, start it by calling initVM(classpath, ...)  *
  *                                                      *
  * More about this function in jcc/README.              *
  *                                                      *
  ********************************************************

  README file for PyLucene with JCC
  ---------------------------------

  Contents
  --------

   - Installing PyLucene
   - API documentation for PyLucene


  Installing PyLucene
  -------------------

  PyLucene is a Python extension built with JCC.

  To build PyLucene, JCC needs to be built first. Sources for JCC are
  available in the jcc sub-directory of this source tree.
  Instructions for building and installing JCC are in jcc/INSTALL.

  See INSTALL file for instructions for building PyLucene.


  API documentation for PyLucene
  ------------------------------

  PyLucene is currently built the Java Lucene trunk. It intends to
  supports the entire Lucene API.

  PyLucene also includes a number of Lucene contrib packages: the Snowball
  analyzer and stemmers, the highlighter package, analyzers for other
  languages than english, regular expression queries and specialized queries
  such as 'more like this'. 

  This document only covers the pythonic extensions to Lucene offered
  by PyLucene as well as some differences between the Java and Python
  APIs. For API the documentation on Java Lucene APIs, please visit:
      http://lucene.apache.org/java/docs/api/index.html

  To help with debugging and to support some Lucene APIs, PyLucene also
  exposes some Java runtime APIs.

   - Contents

     . Samples
     . Threading support with attachCurrentThread()
     . Exception handling with lucene.JavaError
     . Differences between the Java Lucene and PyLucene APIs
     . Pythonic extensions to the Java Lucene APIs
     . Extending Lucene classes from Python

   - Samples

     The best way to learn PyLucene is to look at the many samples included
     with the PyLucene source release or on the web at

         http://svn.osafoundation.org/pylucene/trunk/samples/
         http://svn.osafoundation.org/pylucene/trunk/samples/LuceneInAction/

     A large number of samples are shipped with PyLucene. Most notably, all
     the samples published in the "Lucene in Action" book that did not
     depend on a third party Java library for which there was no obvious
     Python equivalent were ported to Python and PyLucene.

     "Lucene in Action" is a great companion to learning Lucene. Having all
     the samples available in Python should make it even easier for Python
     developers. 

     "Lucene in Action" was written by Erik Hatcher and Otis Gospodnetic,
     both part of the Java Lucene development team, and is available from
     Manning Publications at http://www.manning.com/hatcher2.

   - Threading support with attachCurrentThread()

     Before PyLucene APIs can be used from a thread other than the main
     thread that was not created by the Java Runtime, the
     attachCurrentThread() method must be called on the JCCEnv object
     returned by the initVM() or getVMEnv() functions.

   - Exception handling with lucene.JavaError

     Java exceptions are caught at the language barrier and reported to
     Python by raising a JavaError instance whose args tuple contains the
     actual Java Exception instance.

   - Differences between the Java Lucene and PyLucene APIs

     . The PyLucene API exposes all Java Lucene classes in a flat namespace
       in the PyLucene module.
       For example, the Java import statement:
         import org.apache.lucene.index.IndexReader;
       corresponds to the Python import statement:
         from lucene import IndexReader

     . Instead of taking array arguments the read method defined
       on org.apache.lucene.index.TermDocs takes the number of entries to
       read from the enumeration and returns one array containing the
       concatenation of values returned by the original API.

       For example:

           values = termDocs.read(16)

       values is a an array of 32 ints maximum. The first half represents
       the numbers and the second half the term frequencies actually read.

     . Downcasting is a common operation in Java but not a concept in
       Python. Because the wrapper objects implementing exactly the APIs of
       the declared type of the wrapped object, all classes implement two
       class methods called instance_ and cast_ that verify and cast an
       instance respectively.

   - Pythonic extensions to the Java Lucene APIs

     Java is a very verbose language. Python, on the other hand, offers
     many syntactically attractive constructs for iteration, property
     access, etc... As the Java Lucene samples from the 'Lucene in Action'
     book were ported to Python, PyLucene received a number of pythonic
     extensions listed here:

     . Iterating search hits is a very common operation. Hits instances are
       iterable in Python. Two values are returned for each iteration, the
       zero-based number of the document in the Hits instance and the
       document instance itself.

         The Java loop:

             for (int i = 0; i < hits.length(); i++) {
                 Document doc = hits.doc(i);
                 System.out.println(hits.score(i) + " : " + doc.get("title"));
             }

         can be written in Python:

             for hit in hits:
                 hit = Hit.cast_(hit)
                 print hits.score(i), ':', hit.getDocument['title']

         if hit.iterator()'s next() method were declared to return Hit
         instead of Object, the above cast_() call would be unnecessary.

       The same java loop can also be written:

             for i xrange(len(hits)):
                 print hits.score(i), ':', hits[i]['title']

     . Hits instances partially implement the Python 'sequence' protocol.

         The Java expressions:

             hits.length()
             doc = hits.get(i)

         are better written in Python:

             len(hits)
             doc = hits[i]

     . Document instances have fields whose values can be accessed through
       the mapping protocol.

         The Java expressions:

             doc.get("title")

         are better written in Python:

             doc['title']

     . Document instances can be iterated over for their fields

         The Java loop:

             Enumeration fields = doc.fields();
             while (fields.hasMoreElements()) {
                 Field field = (Field) fields.nextElement();
                 ...
             }

         is better written in Python:

             for field in doc.getFields():
                 field = Field.cast_(field)
                 ...

       Once JCC support heeding Java 1.5 annotations and once Java Lucene
       makes use of them, such casting should become unncessary.

   - Extending Java Lucene classes from Python

     Many areas of the Lucene API expect the programmer to provide their own
     implementation or specialization of a feature where the default is
     inappropriate. For example, text analyzers and tokenizers are an area
     where many parameters and environmental or cultural factors are calling
     for customization.

     PyLucene enables this by providing Java extension points listed below
     that serve as proxies for Java to call back into the Python
     implementations of these customizations.

     These extension points are simple Java classes that JCC generates the
     native C++ implementations for. It is easy to add more such extensions
     classes into the 'java' directory of the PyLucene source tree.

     To learn more about this topic, please refer to the jcc/README file.

     Please refer to the classes in the 'java' tree for currently available
     extension points. Examples of uses of these extension points are to be
     found in PyLucene's unit tests and "Lucene in Action" samples.
