Install the package tesseract-ocr (included in your Linux distribution): (OCR is enabled by default in the virtual machine packages like Open Semantic Desktop Search or Open Semantic Search Appliance) scans or screenshots instead of text format), the enhancer extracts images from PDF files for automatic text recognition (OCR), too. Since many information is not searchable by full text search because its in graphical formats embedded in PDF documents (i.e. So this enhancer enriches meta data of images like filename, format and size with results from automatic text recognition or optical character recognition (OCR) by free open source software like Tesseract OCR. scans, photos or screenshots) can not be found by standard full text search. Text stored in image formats like JPG, PNG, TIFF or GIF (i.e. Automatic text recognition (OCR) for Solr or Elastic Search Automatic text recognition in images or scanned documents by Optical Character Recognition (OCR)
0 Comments
Leave a Reply. |