How to set a single font for all text during PDF to HTML in C# and .NET


Processing PDF files and converting them to various formats is an important task for developers, especially when ensuring compatibility, improving readability, or preparing content for further work. One popular tool for such tasks is PDF Focus .NET component from SautinSoft SDK, which offers powerful functionality for converting PDF to HTML, Word, RTF, and other formats. By specifying a single font during or after conversion, we improve the quality of the final document and make it more manageable.

In this article, we'll discuss how to specify a single font for all text when converting PDF to HTML in C# and .NET. We'll explain the advantages of this approach, discuss who might benefit from it, and highlight implementation considerations.

When converting PDF to HTML, sometimes different fonts are used in the final document, making the content less aesthetically pleasing and difficult to read.
This is especially true:

  • When automatically processing large numbers of documents.
  • When a consistent appearance is required for web publishing or internal systems.
  • When standardizing styles for further styling is necessary.

Characteristics of this procedure:

  • Automation: often used in document processing systems at the script or server application level.
  • Flexibility: eliminates dependence on source PDF styles, which can be heterogeneous.
  • Can be combined with more complex styles: for example, you can also specify font sizes, colors, and line spacing.

An interesting aspect: sometimes it is useful not only to set a style but also to replace fonts within a PDF before conversion (for example, if the source PDF contains non-standard fonts), which makes the resulting HTML even more predictable.

Complete code

using System;
using System.IO;
using SautinSoft;

namespace Sample
{
    class Sample
    {
        static void Main(string[] args)
        {
            string pdfFile = @"..\..\..\simple text.pdf";
            string htmlFile = Path.ChangeExtension(pdfFile, ".html");

            // Convert PDF file to HTML file
            SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();

            // Let's change all text to Verdana 8pt.
            f.HtmlOptions.SingleFontFamily = "Verdana";
            f.HtmlOptions.SingleFontSize = 8;

            // After purchasing the license, please insert your serial number here to activate the component:
            //f.Serial = "XXXXXXXXXXX";

            f.OpenPdf(pdfFile);

            if (f.PageCount > 0)
            {
                int from = 1;
                int to = (3 > f.PageCount) ? f.PageCount : 3;

                int result = f.ToHtml(htmlFile, from, to);

                // Show resulted HTML document in a browser.
                if (result == 0)
                {
                    System.Diagnostics.Process.Start(htmlFile);
                }
            }
        }
    }
}

Download

Imports System
Imports System.IO
Imports SautinSoft

Namespace Sample
	Friend Class Sample
		Shared Sub Main(ByVal args() As String)
			Dim pdfFile As String = "..\..\..\simple text.pdf"
			Dim htmlFile As String = Path.ChangeExtension(pdfFile, ".html")

			' Convert PDF file to HTML file
			Dim f As New SautinSoft.PdfFocus()

			' Let's change all text to Verdana 8pt.
			f.HtmlOptions.SingleFontFamily = "Verdana"
			f.HtmlOptions.SingleFontSize = 8

			' After purchasing the license, please insert your serial number here to activate the component:
			'f.Serial = "XXXXXXXXXXX";

			f.OpenPdf(pdfFile)

			If f.PageCount > 0 Then
				Dim from As Integer = 1
				Dim [to] As Integer = If(3 > f.PageCount, f.PageCount, 3)

				Dim result As Integer = f.ToHtml(htmlFile, from, [to])

				' Show resulted HTML document in a browser.
				If result = 0 Then
					System.Diagnostics.Process.Start(htmlFile)
				End If
			End If
		End Sub
	End Class
End Namespace

Download


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:


Captcha

Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.