How to use OCR option in Linux/MacOs
As you know, PDF Focus .Net supports OCR. There are no difficulties in deploying on the Windows platform. However, if we are talking about Linux/MacOs, you need to make additional settings and the ability to convert scanned pdf documents will please you with its quality.
So, let's get started. Step by step.
- Download a latest version of PDF Focus .Net from NuGet in your project:
- You need to add additional dependencies in the project of Visual Studio:
- To install Tesseract 5.x you can simply run the following command on your Linux/macOS: sudo apt install tesseract-ocr
An example of full *.proj file:
Tesseract uses Leptonica.
Leptonica is a pedagogically-oriented open source library containing software that is broadly useful for image processing and image analysis applications.
sudo apt install libleptonica-dev
If you wish to install the Developer Tools which can be used for training, run the following command:
sudo apt install libtesseract-dev
The following instructions are for building on Linux, which also can be applied to other UNIX like operating systems.
Important! Because Tesseract is configured to control the Windows OS, you need to manually add two files to the “x64” folder:
Libleptonica-1.82.0.solibtesseract50.so
You need to download it from the Internet or our full code sample for OCR Linux/MacOs:
The full code sample you may download directly from GitHub in C#: The link
If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: