PDF Focus .Net - simplifies PDF to HTML converting to several C# code lines!

PDF Focus .Net

.Net assembly which gives API to convert PDF to All: DOCX, RTF, HTML, XML, Text, Excel, Images in .Net and C#.
PDF to HTML scheme

PDF Focus .Net

PDF Focus .Net - simplifies PDF to HTML converting to several C# code lines!
PDF to HTML scheme

Introduction

Without belaboring the point, let's see how to add "PDF to HTML feature" into any .Net application. First of all, to give your .Net application ability to convert PDF to HTML, add a reference to the "SautinSoft.PdfFocus.dll" assembly. You may download it here, 69.3 Mb.

Let's take a look to a very straightforward example in C#:

            SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
            f.OpenPdf(@"d:\Odyssey.pdf");
            f.ToHtml(@"d:\Odyssey.html");
          

PDF Focus .Net generates documents in HTML5 format. Also you may choose between the two conversion modes:

  • HTML-Fixed is better to use for rendering, because it completely repeats the PDF layout with the structure of pages.
    The markup of such documents is very complex and have a lot of tags styled by (x,y) coords.

  • HTML-Flowing is better for further processing by a human: editing and combining.
    The markup of such documents is much simple inside and has the flowing structure. It's very simple for understanding by a human.

The difference choose between HTML-Fixed and HTML-Flowing modes.

Another point of interest, PDF Focus .Net can generate HTML documents with images embedded using base-64 algorithm. Given this fact, you can convert PDF to HTML completely in memory without HDD.

<html>
	<head>...</head>
	<body>
		<div style="position:relative;margin: 0px 0px ....">Homer is
        the author of the Iliad and the Odyssey ...</div>
		<div style="">
        <img src="data:image/gif;base64,R0lGODlhUAAPAKIAAAsL...></div>
	</body>
</html>

All HTML documents produced using SautinSoft.PdfFocus.dll are completely compatible with W3C standards W3C Markup validation service.


Download

To see this functionality firsthand, download the freshest «PDF Focus .Net» with code examples, 69.3 Mb.

Limitations

PDF Focus .Net The limitations of the free version are: The trial notice "Created by unlicensed version of PDF Focus .Net" and the random addition of the word "TRIAL".


Some examples to convert PDF to HTML in C# and VB.Net

Want to adjust a result of PDF to HTML conversion? See our tips ...

1. Convert PDF file to HTML file in C#:

           SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
            f.OpenPdf(@"d:\History.pdf");

            if (f.PageCount > 0)
            {
                int result = f.ToHtml(@"d:\History.html");

                //Open HTML document
                if (result==0)
                {
                    System.Diagnostics.Process.Start(@"d:\History.html");
                }
            }
      

2. Convert PDF to HTML in memory using C#:

           byte[] pdf = File.ReadAllBytes(@"c:\Book.pdf");

            SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
            f.OpenPdf(pdf);

            if (f.PageCount > 0)
            {
                f.HtmlOptions.IncludeImageInHtml = true;
                f.HtmlOptions.Title = "Simple text";
                string html = f.ToHtml();
                //now the variable 'html' contains HTML document
            }

      

3. Export PDF to HTML in ASP.Net-C#:


        SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
        f.OpenPdf(FileUpload1.FileBytes);
        string html = String.Empty;

        if (f.PageCount > 0)
        {
            //Let's whole PDF document to HTML
            f.HtmlOptions.IncludeImageInHtml = true;
            html = f.ToHtml();
        }

        //show HTML
        if (html != "")
        {
            Response.Buffer = true;
            Response.Clear();
            Response.ContentType = "application/msword";
            Response.AddHeader("Content-Disposition:", "attachment; filename=Result.doc");
            Response.Write(rtf);
            Response.Flush();
            Response.End();
        }
      

4. Convert PDF file to HTML file in VB.Net:

        Dim f As New SautinSoft.PdfFocus()
        f.OpenPdf("c:\Simple Text.pdf")

        If f.PageCount > 0 Then
            Dim result As Integer = f.ToHtml("c:\Result.html")

            'Show HTML document
            If result = 0 Then
                System.Diagnostics.Process.Start("c:\Result.html")
            End If
        End If

Requirements and Technical Information

Requires .NET Framework 4.0 or higher. Our product is compatible with all .NET languages and supports all Operating Systems where .NET Framework and .NET Core can be used. Note that PDF Focus .Net is entirely written in managed C#, which makes it absolutely standalone and an independent library.

.Net Framework 4.0 and higher and .Net Core 2.0 and higher

.NET Framework 4.5, 4.6.1 and higher.The old version for old .NET 2.0 can be found here

.NET Standard 2.0

.NET Core, .NET 5.0 and higher.


Multi-platform component, runs on:


Our component has proven itself on cloud platforms and services:

  • Microsoft Azure
  • Amazon Web Services (AWS)
  • Google Cloud Platform
  • SharePoint
  • Docker
  • Xamarin Forms
  • etc.