Issue link: https://epubs.iltanet.org/i/37773
next five years, and took a leap into the high-end arena when it acquired FAST Search & Transfer, a company that owned the powerful FAST Enterprise Search Platform (ESP) search engine. It seems to have paid off. The 2008 Gartner Magic Quadrant report for information access, which included enterprise search capabilities, showed Microsoft at the top of the magic quadrant. The 2010 Gartner MarketScope Enterprise Search report rated Microsoft a “Strong Positive,” which is the highest rating assigned. FAST ESP provided the foundation for the FAST Search Server 2010 for SharePoint, commonly referred to as “FS4SP.” It is used by many large e-commerce websites, with some installations reporting over a billion items in their search indexes. FAST ESP’s roots go back to early Internet search engines such as “AlltheWeb” and “Lycos.” Powerful and very flexible, FAST ESP has a fairly complex architecture (as an indication, the documentation for the product is over 2,000 pages). FAST ESP started out as a non-Microsoft product — the use of an Apache server, Python scripting, a JDBC connector for data, and open source components make this apparent. After purchasing FAST Search & Transfer, Microsoft’s attractive pricing, combined with the extreme scalability and speed (sub-second queries against millions of documents), content extraction, connectors, and the ability to honor security set on crawled content, made FAST ESP a serious contender as an internal, SharePoint-hosted enterprise search engine. FAST DOCUMENT PROCESSING AND CONTENT EXTRACTION One of the appealing capabilities of the FAST ESP search engine is the ability to insert functionality into the document processing pipeline. The document processing stage does a tremendous amount of work and offers advanced features; however, the more interesting capabilities are content extraction and advanced language processing. Content extractors are used to automatically build metadata based on the text in the documents. Examples that are available with FAST ESP are people names, company names, locations, credit card numbers and keywords. Organizations can also build their own custom extractors. An example of a custom extractor useful to a law firm might be legal terms. The extracted content then becomes available for metadata searching and refining. Refining refers to the ability to filter search results by clicking on metadata terms, which are usually listed on the left side of the search results. This is generally referred to as faceted searching, and is commonly found on e-commerce sites for filtering search results based on brands, product types or price ranges. In a law firm it is common to offer search result filtering by client, matter, document types, dates and practice areas. Keyword extraction, another FAST ESP function, is an advanced feature that extracts common phrases in the documents and displays them as refiners. This helps users determine the general theme of the documents displayed in the search results. Document processors can also be used for document conversions and other processing before the content is indexed. A good example is creating HTML versions of documents so search hits can be highlighted for the user. Advanced linguistics capabilities include recognition of more than 80 languages and more than 400 document formats. Content processing also includes www.iltanet.org Portal Platforms 51