Cost and efficiency remain core areas of focus when approaching Electronically Stored Information (ESI) from a document management perspective.
During the process various questions arise such as:
- How to optimize the accuracy of the data set eligible for review?
- How to filter out the majority of irrelevant material?
Data managers can rely on search utilities to build a refined data set. To determine what search operation yields the least amount of documents without over-exclusion, a data manager can benefit from taking full advantage of search function "best practices".
ESI databases place words and characters in an index for optimal search performance. The ability to retrieve a desired term can depend on:
- Similar content in the data set
- Possible derivative forms of the term
- Opportunities for the term to have alternative meanings
These considerations can result in the development of a “search plan” that considers the literal term usage within the document text. However, other questions to consider include how the search engine interprets the:
- Terms in the search structure
- Related terms in the source data
- Resulting index of the retrieval universe
A maximized search structure will consider both the source data and how the host system handles the search process.
To evaluate the effectiveness of search criteria, an initial assessment of the tokenization requirements will quickly identify areas for further consideration.
A “tokenized” character will read as a space during the indexing or search process, even though it will display as a text character.
Conversely, a “non-tokenized” symbol will read as presented in text.
Any tokenized characters identified in the initial assessment warrant evaluation regarding best syntax to refine the search results.
The most logical question following this topic is why one would want operations treating symbols differently?
One example is the following scenario:
Suppose a case has a subpoena that states the exact words, e-mail addresses, and characters that should be produced.
In this instance, using a tokenized operator one would generate overly inclusive results. When a data manager has an idea of what to look for within a data set, but has questions regarding how a word should appear, a tokenized operator will provide variable components to determine what should and should not remain in the data set.
Table 1 - How Tokenized characters affect Indexing and Searching
Capital Novus provides powerful modules to take advantage of tokenization in ESI data management with eZSuite’s eZReview and eZVUE. These modules use various operators to customize searching on data sets and within metadata fields to provide flexibility to leverage tokenization for refined searching. Indexing routines match fields with tokenized schemes for best results based on the type of information expected for the associated field.