Designing Your Document Management Solution

Document management software have a wide assortment of different features. Let's look at some of these to find the ones most important for your project.

The Benefits of Automation

Once you have decided the best way to organize, store and retrieve your documents, the next part of the planning stage is to find the most efficient way to scan these documents and associate them with the correct index field values. Creating an efficient scanning and indexing process will save you countless hours of labor over the life of your project.

The two main methods for automating indexing are barcode recognition and Optical Character Recognition (OCR). Barcode recognition is faster and more accurate, but your documents must contain a barcode on the document or a cover page for this to work. OCR is able to read printed data directly from the page, which means most documents can be processed as-is. However there are many conditions that can affect the practicality of OCR that will be discussed in this section.

If your index data already exists in another database, SimpleIndex® has two features that can make use of this data to automate processing. The Index Autofill feature lets you enter one key field that is used in a database lookup to retrieve matching values and fill in the remaining index fields automatically. SimpleIndex also has the ability to pre-set index values using the Command Line Interface and have a scanned document receive these indexes automatically.

Using Barcode Recognition

Barcode recognition is the most efficient way to capture index data printed on documents. Some documents already have key information in barcode format on them. If your project is to scan new documents on an ongoing basis, it may be possible for you to redesign it to include barcodes. Having a barcode with index data on the document is the best case scenario, for all the index data is on the document at the time it is created in a format that can be read with near 100% accuracy.

If it is not possible to print barcodes on the document itself, an alternative is to have the person who creates the document print out a barcode cover page and place it on the file before it is scanned. The SimpleCoversheet application was designed to make this easy by providing a simple interface for selecting index values and printing a standard coversheet that contains these values in barcode format.

Barcode recognition can also be useful when you have documents with a variable number of pages that will all receive the same index values. If it is not possible to generate an indexed coversheet for these at the time they are created, a generic barcode coversheet can be used to separate the scanned images into multi-page files, one for each document. A second process can then be used to index these images one file at a time instead of one page at a time, greatly increasing throughput.

Using OCR

Traditionally, zone OCR solutions require you to specify a region on the page where index information will be found. This region is recognized and the result is inserted into an index field. The problem with traditional zone OCR is that if the region is moved slightly due to variations in scanning, the result could contain extra neighboring characters or cut off desired characters. This limits the usefulness of traditional zone OCR to documents where the index value is in the exact same place every time and has plenty of white space around it.

SimpleIndex's OCR contains many advanced features to overcome the inherent limitations of zone OCR. This is done by providing template and dictionary matching for OCR fields. These features search the OCR results for a certain pattern or list of possible values and return only the matching data. This allows you to draw your OCR zones much larger than normal, ensuring that no matter how much the data shifts around it will always be contained within that region.

It is even possible to draw your zone around the entire page and find key information that is not printed in any fixed location. For example, a doctor's office may receive lab reports from many different labs. Each report is formatted differently, but each contains the patient's name somewhere on it. Using the dictionary matching feature with a patient name list, SimpleIndex can identify the correct patient for each lab automatically.

When implementing OCR for document automation, carefully consider the data you are trying to recognize. Is the text legible? Does it appear in a fixed location? Does it conform to a unique pattern that won't be found anywhere else on the page? Is there a list available with all the possible values for this field? Answer these questions and you will know which OCR approach is best for your application.

Using Index Autofill

The Autofill feature of SimpleIndex is an easy way to associate many index fields with one document without retyping data that already exists in another application. Autofill uses a database lookup to retrieve records that match a key value entered by the user. Blank index fields are then filled in automatically with the data from this lookup. The result is a document database with many different possible search fields, of which only one needed to be entered during scanning.

The key field may be typed by the user, or it may be read from the document automatically using barcode recognition or OCR. The lookup is performed either when the user changes this field or when the index values are saved. If the lookup finds multiple matching records, the user will be notified and the first set of values will be used by default.

Using Pre-Indexed Batches

Pre-index batches are a unique feature of SimpleIndex that greatly improve throughput for scanning a single document at a time. Pre-indexed batches can be configured to allow the user to enter index values prior to scanning, or they can be executed from the command line to circumvent user interaction altogether.

Some typical scenarios for pre-indexed batches are:

  1. User scans one document at a time by entering field values first, scanning and having the images saved with these values automatically.
  2. User has several pre-defined documents that they scan. All field values are saved with the configuration file. User loads the scanner and double-clicks the appropriate configuration to scan and save that file automatically.
  3. SimpleIndex is integrated with an existing application. A "Scan Current Record" button is implemented that launches SimpleIndex and passes the index values for the current document through the command line. The user loads the scanner and clicks this button; images are scanned and saved automatically.
