How to set a single font for all text during PDF to HTML in C# and .NET
Processing PDF files and converting them to various formats is an important task for developers, especially when ensuring compatibility, improving readability, or preparing content for further work. One popular tool for such tasks is PDF Focus .NET component from SautinSoft SDK, which offers powerful functionality for converting PDF to HTML, Word, RTF, and other formats. By specifying a single font during or after conversion, we improve the quality of the final document and make it more manageable.
In this article, we'll discuss how to specify a single font for all text when converting PDF to HTML in C# and .NET. We'll explain the advantages of this approach, discuss who might benefit from it, and highlight implementation considerations.
When converting PDF to HTML, sometimes different fonts are used in the final document, making the content less aesthetically pleasing and
difficult to read.
This is especially true:
- When automatically processing large numbers of documents.
- When a consistent appearance is required for web publishing or internal systems.
- When standardizing styles for further styling is necessary.
Characteristics of this procedure:
- Automation: often used in document processing systems at the script or server application level.
- Flexibility: eliminates dependence on source PDF styles, which can be heterogeneous.
- Can be combined with more complex styles: for example, you can also specify font sizes, colors, and line spacing.
An interesting aspect: sometimes it is useful not only to set a style but also to replace fonts within a PDF before conversion (for example, if the source PDF contains non-standard fonts), which makes the resulting HTML even more predictable.
Complete code
using System;
using System.IO;
using SautinSoft;
namespace Sample
{
class Sample
{
static void Main(string[] args)
{
string pdfFile = @"..\..\..\simple text.pdf";
string htmlFile = Path.ChangeExtension(pdfFile, ".html");
// Convert PDF file to HTML file
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
// Let's change all text to Verdana 8pt.
f.HtmlOptions.SingleFontFamily = "Verdana";
f.HtmlOptions.SingleFontSize = 8;
// After purchasing the license, please insert your serial number here to activate the component:
//f.Serial = "XXXXXXXXXXX";
f.OpenPdf(pdfFile);
if (f.PageCount > 0)
{
int from = 1;
int to = (3 > f.PageCount) ? f.PageCount : 3;
int result = f.ToHtml(htmlFile, from, to);
// Show resulted HTML document in a browser.
if (result == 0)
{
System.Diagnostics.Process.Start(htmlFile);
}
}
}
}
}
Imports System
Imports System.IO
Imports SautinSoft
Namespace Sample
Friend Class Sample
Shared Sub Main(ByVal args() As String)
Dim pdfFile As String = "..\..\..\simple text.pdf"
Dim htmlFile As String = Path.ChangeExtension(pdfFile, ".html")
' Convert PDF file to HTML file
Dim f As New SautinSoft.PdfFocus()
' Let's change all text to Verdana 8pt.
f.HtmlOptions.SingleFontFamily = "Verdana"
f.HtmlOptions.SingleFontSize = 8
' After purchasing the license, please insert your serial number here to activate the component:
'f.Serial = "XXXXXXXXXXX";
f.OpenPdf(pdfFile)
If f.PageCount > 0 Then
Dim from As Integer = 1
Dim [to] As Integer = If(3 > f.PageCount, f.PageCount, 3)
Dim result As Integer = f.ToHtml(htmlFile, from, [to])
' Show resulted HTML document in a browser.
If result = 0 Then
System.Diagnostics.Process.Start(htmlFile)
End If
End If
End Sub
End Class
End Namespace
If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: