sharepoint ocr

SharePoint 2013 Enterprise Search possesses a powerful built-in feature capable of Optical Character Recognition (OCR) and indexing your scanned TIFF images during a crawl. This functionality applies to images stored both in and out of SharePoint. To configure this feature, follow the steps outlined below.

Enable Windows TIFF iFilter Feature on Crawl Servers

Firstly, use Server Manager to ensure that the Windows TIFF iFilter feature is enabled on each crawl server.

Configure Local Group Policy Editor

Next, open the Local Group Policy Editor and locate the OCR folder under ‘Computer Configuration > Administrative Templates’. Edit the policy setting for “Select OCR languages from a code page”. Choose ‘Enabled’ and select the appropriate languages.

Configure Content Parsing for TIFF Images

Open the SharePoint Management Shell (using ‘Run as Administrator’) and run the following commands:

$ssa = Get-SPEnterpriseSearchServiceApplication
New-SPEnterpriseSearchFileFormat -SearchApplication $ssa tif "TIFF Image File" "image/tiff"
New-SPEnterpriseSearchFileFormat -SearchApplication $ssa tiff "TIFF Image File" "image/tiff"

Next, restart the SharePoint Search Host Controller service.

Add New File Types in Search Service Application

Open the Search Service Application administration and find ‘File Types’ under the ‘Crawling’ navigation item. Add two new File Types for ‘tif’ and ‘tiff’.

Perform a Full Crawl

Finally, perform a full crawl of your content. Keep in mind that the crawling process might take a considerably longer time, depending on how many TIFF images are crawled. It may be necessary to make additional adjustments, such as scoping a content source to only include content requiring OCR or modifying crawl schedules.