User Guide

Acrobat SDK User’s Guide 109
Searching and Indexing
Indexing PDF Documents
13
When creating a replacement search plug-in for Acrobat, you must decide what indexes
your search plug-in will use. You can either create your own indexes (see
Extracting and
Highlighting Text) or search the Lextek indexes created by the Acrobat 7 Catalog plug-in.
Indexing PDF Documents
You can use the Acrobat SDK to create a full-text index of a set of PDF documents. A full-text
index is a searchable database of all the text in the documents. After building an index, you
can use search the entire library quickly.
You can build and manipulate indices from a plug-in, from Acrobat JavaScript or from an
external application using IAC (DDE or Apple events) calls.
Extracting and Highlighting Text
For indexing PDF files, Acrobat provides text extraction APIs. Text extraction also supplies
position information that can be used to highlight search hits in the original PDF file. The
text extraction tools are provided as calls in the plug-in API on the Acrobat platforms
(Mac
OS and Windows).
You can extract ASCII text from a PDF file using a plug-in or using Acrobat JavaScript. You
can also save the PDF document as text or rich text.
Indexing and Acrobat JavaScript
It is possible to extend and customize indexes for multiple PDF documents using the
Acrobat JavaScript Catalog, CatalogJob, and Index objects. These objects may be used to
build, retrieve, or remove indexes.
The Index object represents a Catalog-generated index and contains a build method that
is used to create an index.
For more information, see the Acrobat JavaScript Scripting Guide.
The Acrobat Catalog Plug-in
Acrobat Catalog is a plug-in that allows you to create a full-text index of a set of PDF
documents. The Catalog plug-in has an HFT consisting of several methods that plug-in
developers can import and use. In addition, Catalog supports DDE, and broadcasts several
Windows messages.
For more information, see the Acrobat and PDF Library API Overview.