How to set a location of images during PDF to HTML in C# and .NET
Converting PDF files to HTML is a crucial task for developers working with documents, websites, and process automation. This is especially important when preserving the structural and visual similarity of the original files. In this article, we'll examine one of the key aspects: configuring image placement when converting PDFs to HTML using the PDF Focus .NET component from SautinSoft library. We'll explain what this process is, what benefits it offers, and how to implement it correctly.
Converting PDFs to HTML is the process of converting content from PDF to a format suitable for display in a browser and subsequent integration into
web applications.
This approach allows you to:
- Make content more accessible.
- Ensure its editability and SEO optimization.
- Easily integrate into web environments and automated systems.
However, the quality of the conversion depends heavily on the careful placement of elements, especially images and graphics.
By default, when converting PDF to HTML, images may be misplaced or incorrectly positioned due to format specifications or tool settings.
Therefore, it is crucial to be able to precisely control image placement to preserve the document's design and structure and ensure a positive
user experience.
This aspect is especially useful when you need to:
- Maintain precise image positioning to preserve the document's visual presentation.
- Integrate images into a specific part of the page, for example, next to text.
- Create responsive output that matches the website design.
- Provide a clear structure for automatic processing or indexing.
In SautinSoft PDF Focus .Net, image placement during conversion is controlled through parameters and settings that allow you to:
- Specify where to place images—inline with text, in a separate folder, or at specified coordinates.
- Prevent automatic insertion of images in incorrect positions.
- Override styles or add custom CSS classes for images in the final HTML.
Complete code
using System;
using System.IO;
using SautinSoft;
namespace Sample
{
class Sample
{
static void Main(string[] args)
{
// Before starting, we recommend to get a free key:
// https://sautinsoft.com/start-for-free/
// Apply the key here:
// SautinSoft.PdfFocus.SetLicense("...");
// Here you will find how to keep images in the resulting HTML document.
string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");
string htmlFile = "Result.html";
// Convert PDF file to HTML file
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
// Way 1 (default): Images will be stored inside HTML document as base64, jpeg images.
/*
f.HtmlOptions.IncludeImageInHtml = true;
// Auto - the same image format as in the source PDF;
// 'Jpeg' to make the document size less;
// 'PNG' to keep the highest quality, but the highest size too.
f.EmbeddedImagesFormat = PdfFocus.eImageFormat.Jpeg;
*/
// Way 2: Images will be stored as JPG files in a special folder "{pdf name}_images".
// Images will have names "picture100.jpg", "picture101.jpg" .. "pictureN.jpg".
// Let's set the quality for jpeg to 95 percents.
f.HtmlOptions.ImageFolder = Path.GetDirectoryName(htmlFile);
// Auto - the same image format as in the source PDF;
// 'Jpeg' to make the document size less;
// 'PNG' to keep the highest quality, but the highest size too.
f.EmbeddedImagesFormat = PdfFocus.eImageFormat.Jpeg;
f.EmbeddedJpegQuality = 95;
f.HtmlOptions.ImageSubFolder = String.Format("{0}_images", Path.GetFileNameWithoutExtension(pdfFile));
f.HtmlOptions.ImageFileName = "picture";
f.HtmlOptions.ImageNumStart = 100;
f.HtmlOptions.IncludeImageInHtml = false;
// Way 3: Images will be stored as PNG files in the same directory with the HTML file.
// All images on each page will be combined in a single image.
/*
f.HtmlOptions.ImageFolder = Path.GetDirectoryName(htmlFile);
// 'Jpeg' to make the document size less; Or 'PNG' to keep the highest quality.
f.EmbeddedImagesFormat = PdfFocus.eImageFormat.Png;
f.HtmlOptions.ImageSubFolder = "";
f.HtmlOptions.IncludeImageInHtml = false;
*/
f.OpenPdf(pdfFile);
if (f.PageCount > 0)
{
int res = f.ToHtml(htmlFile);
// Open the result for demonstration purposes.
if (res == 0)
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(htmlFile) { UseShellExecute = true });
}
}
}
}
Imports System
Imports System.IO
Imports SautinSoft
Namespace Sample
Friend Class Sample
Shared Sub Main(ByVal args() As String)
' Before starting, we recommend to get a free key:
' https://sautinsoft.com/start-for-free/
' Apply the key here
' SautinSoft.PdfFocus.SetLicense("...");
' Here you will find how to keep images in the resulting HTML document.
Dim pdfFile As String = Path.GetFullPath("..\..\..\simple text.pdf")
Dim htmlFile As String = "Result.html"
' Convert PDF file to HTML file
Dim f As New SautinSoft.PdfFocus()
' Way 1 (default): Images will be stored inside HTML document as base64, jpeg images.
'f.HtmlOptions.IncludeImageInHtml = True
' Auto - the same image format as in the source PDF;
' 'Jpeg' to make the document size less;
' 'PNG' to keep the highest quality, but the highest size too.
'f.EmbeddedImagesFormat = PdfFocus.eImageFormat.Auto
' Way 2: Images will be stored as JPG files in a special folder "{pdf name}_images".
' Images will have names "picture100.jpg", "picture101.jpg" .. "pictureN.jpg".
' Let's set the quality for jpeg to 95 percents.
f.HtmlOptions.ImageFolder = Path.GetDirectoryName(htmlFile)
' 'Jpeg' to make the document size less; Or 'PNG' to keep the highest quality.
f.EmbeddedImagesFormat = PdfFocus.eImageFormat.Jpeg
f.EmbeddedJpegQuality = 95
f.HtmlOptions.ImageSubFolder = String.Format("{0}_images", Path.GetFileNameWithoutExtension(pdfFile))
f.HtmlOptions.ImageFileName = "picture"
f.HtmlOptions.ImageNumStart = 100
f.HtmlOptions.IncludeImageInHtml = False
' Way 3: Images will be stored as PNG files in the same directory with the HTML file.
' All images on each page will be combined in a single image. '
'f.HtmlOptions.ImageFolder = Path.GetDirectoryName(htmlFile)
' 'Jpeg' to make the document size less; Or 'PNG' to keep the highest quality.
'f.EmbeddedImagesFormat = PdfFocus.eImageFormat.Png
'f.HtmlOptions.ImageSubFolder = ""
'f.HtmlOptions.IncludeImageInHtml = False
f.OpenPdf(pdfFile)
If f.PageCount > 0 Then
Dim res As Integer = f.ToHtml(htmlFile)
' Open the result for demonstration purposes.
If res = 0 Then
System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(htmlFile) With {.UseShellExecute = True})
End If
End If
End Sub
End Class
End Namespace
If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: