Search Engines
- Dr. Moria Levy
- Jul 1, 2002
- 3 min read

A wise man once said: "Just as the concern with hunger at the beginning of the twentieth century was replaced by a massive focus on proper nutrition and diets by the end of the century, so too the problem of information scarcity in the early eighties was replaced, toward the end of the nineties, with the equally significant problem of information overflow and excess."
We document a lot today. Not always everything, not always all the right things, but increasingly vast disk spaces are dedicated to storing documents and other content items.
This extensive documentation quickly leads us to a navigation problem. There is a problem locating content items that document important knowledge, which indeed exists somewhere. The very important knowledge in people's minds sometimes focuses on where the document recording the knowledge is located, no less than what the knowledge itself is.
One technological means of helping us locate this documented knowledge is the organizational search engine. Search engine components exist in many different systems within the organization, but these are specialized, and more and more organizations are seeking a general search engine for themselves.
Many search engines exist in the market. They are good but confusing. The reason for this is that any random group of search engines we examine typically contains products with minimal overlap and competition between them. In other words, one sells books and the other newspapers. But despite both having pages, the differences are substantial.
To understand which tool we need most in the organization (everything is required, but resources are never sufficient...), we must first understand the types of capabilities addressed by search engines. These are described in the following diagram:

Explanation
Colors:
Components colored in orange allow for the expansion of the number of results. For example, the ability to work with a thesaurus enables searching for the word "airplane" to retrieve documents containing "aircraft" (synonym) and to reach "Lavi" documents (parent-child relationship).
Components colored in yellow allow for reducing the number of results and/or organizing them for focus. For example, the Text Mining capability will filter, when requesting documents related to Smith Company, content related to the company but not to Mr. Smith. This is based on analyzing relationships within the sentence where the text appears, whether it's followed by "Ltd." or the title "Mr./Mrs." etc.
Components colored in purple relate to search engine interfaces with other systems. For example, APIs activate the search function within operational IT systems.
Components colored in green are not independent components but important infrastructures for the success of any infrastructure product. For example, a suitable permissions mechanism knows how to simulate itself as a user (at the permitted authorization level) when it seeks to search and retrieve information from various sources in the organization and, based on the user's permissions, to include/ignore the results received. Sounds trivial, but far from it in reality.
Positions:
Components located in the left area of the diagram are relevant when defining the words on which the search will be performed (before accessing the various contents).
Components located in the middle are relevant when processing the results obtained from the various contents before returning them to the user.
The components on the right are related to the external world of content, coming from outside the central repository where the search is performed.
The components at the top are related to the user interface. The clouds are related to infrastructures.
Product Families:
Most products primarily represent two or three components that are the center of gravity of their capabilities. Therefore, there is a partial overlap between products. The leading product families focus on:
Linguistic retrieval (Lingual)
Federated Search
Smart search / Text Mining
Catalog products (Auto Categorization/Categorization)
Comments