How to use OCR option in Linux/MacOs

As you know, PDF Focus .Net supports OCR. There are no difficulties in deploying on the Windows platform. However, if we are talking about Linux/MacOs, you need to make additional settings and the ability to convert scanned pdf documents will please you with its quality.

So, let's get started. Step by step.

Download a latest version of PDF Focus .Net from NuGet in your project:

You need to add additional dependencies in the project of Visual Studio:

<PackageReference Include="SkiaSharp.NativeAssets.Linux" Version="2.88.7" />
<PackageReference Include="SkiaSharp.NativeAssets.macOS" Version="2.88.7" />
<PackageReference Include="HarfBuzzSharp.NativeAssets.Linux" Version="*" />
<PackageReference Include="HarfBuzzSharp.NativeAssets.macOS" Version="*" />
<PackageReference Include="System.Reflection.Emit" Version="*" />

To install Tesseract 5.x you can simply run the following command on your Linux/macOS:

sudo apt install tesseract-ocr

Leptonica

sudo apt install libleptonica-dev

sudo apt install libtesseract-dev

Now, you need to add some code for converting scanned PDF to editable Word document:

Please be sure, that you specify the correct path for the folder “tessdata”.

“OcrLanguage” must be installed like your original PDF document: Eng, Ger, Fra, etc. Once your project is launched, all dependencies will be downloaded and additional folders will be added to the debug folder:

Important! Because Tesseract is configured to control the Windows OS, you need to manually add two files to the “x64” folder:

Libleptonica-1.82.0.so
libtesseract50.so

You need to download it from the Internet or our full code sample for OCR Linux/MacOs:

After you have done the above steps. Run the last command in the terminal: "dotnet restore" and "dotnet run":

The result of the conversion will be an editable Word-file that you can edit and save.

The full code sample you may download directly from GitHub in C#: The link

If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:

Name(optional):

Email:

Message:

Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.

How to use OCR option in Linux/MacOs

The captcha is incorrect or the Email field is not filled in, please try again.

Questions and suggestions from you are always welcome!