Simple way to extract Text from PDF in C# .Net?

Simple way to extract Text from PDF in C# .Net?

PDF Focus .Net

.Net assembly which gives API to convert PDF to All: DOCX, RTF, HTML, XML, Text, Excel, Images in .Net and C#.
PDF to Text scheme

PDF Focus .Net

Simple way to extract Text from PDF in C# .Net?
PDF to Text scheme

Introduction

If you are looking for a .NET library to extract text data from PDF, you are in the right place. PDF Focus .Net helps you extract text from any PDF document.

To illustrate how to easily extract text from PDF, let's look at simple code in C#:

SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();

            f.OpenPdf(@"d:\Wikipedia.pdf");

            if (f.PageCount > 0)
            {
                f.ToText(@"d:\Wikipedia.txt");
            }
          

You will be able extract a text from a whole document or from specific pages. The library extracts good-looking text without unwanted spaces between the letters in words and supports Unicode symbols.

Furthermore, a text layout looks the same as in the RTF with proper line breaks and columns.


Download

To see this functionality firsthand, download the freshest «PDF Focus .Net» with code examples, 104.0 Mb.

Limitations

PDF Focus .Net The limitations of the free version are: The trial notice "Created by unlicensed version of PDF Focus .Net" and the random addition of the word "TRIAL".


Some examples to convert PDF to Text in C# and VB.Net

1. Convert PDF file to Text using C#:

           SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();

            f.OpenPdf(@"d:\Cook Book.pdf");

            if (f.PageCount > 2)
            {
                //Convert only pages from 2 to 3 in Text
                f.ToText(@"d:\Cook Book.txt", 2, 3);
            }

2. Convert whole PDF document to Text in memory using C#:

           SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();

            //Read PDF to byte array
            byte[] pdf = File.ReadAllBytes(@"d:\Sample.pdf");

            f.OpenPdf(pdf);

            if (f.PageCount > 0)
            {
                string text = f.ToText();

                //Save to text file
                File.WriteAllText(@"d:\Sample.txt", text);
            }

3. Extract Text from all pages of PDF in ASP.Net/VB.Net:

        Dim f As New SautinSoft.PdfFocus()
        Dim url As New Uri("http://www.website.com/sample.pdf")

        f.OpenPdf(url)

        If f.PageCount > 0 Then
            'Convert whole PDF to Text (extract text from PDF)
            Dim text As String = f.ToText()

            'show text
            TextBox1.Text = Text

        Else
            TextBox1.Text = "Converting failed!"
        End If

4. Convert 1st page of PDF to Text in VB.Net:

        Dim f As New SautinSoft.PdfFocus()

        Dim pdf() As Byte = File.ReadAllBytes("d:\Simple.pdf")
        Dim text As String = ""

        f.OpenPdf(pdf)

        If f.PageCount > 0 Then
            text = f.ToText(1, 1)

            'show text
            If text <> "" Then
                TextBox1.Text = text
            End If
        End If

Requirements and Technical Information

Requires .NET Framework 4.0 or higher. Our product is compatible with all .NET languages and supports all Operating Systems where .NET Framework and .NET Core can be used. Note that PDF Focus .Net is entirely written in managed C#, which makes it absolutely standalone and an independent library.

.Net Framework 4.0 and higher and .Net Core 2.0 and higher

.NET Framework 4.0, 4.5, 4.6.1 and higher.The old version for old .NET 2.0 can be found here

.NET Standard 2.0

.NET Core 2.0 and higher.


Multi-platform component, runs on:


Our component has proven itself on cloud platforms and services:

  • Microsoft Azure
  • Amazon Web Services (AWS)
  • Google Cloud Platform
  • SharePoint
  • Docker
  • etc.

If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: