Convert PDF to HTML in memory using C# and .NET


Document processing in modern applications increasingly requires dynamic file format conversion without saving intermediate files to disk. One popular scenario is converting a PDF document to HTML for display in web interfaces or for further processing. In this article, we'll explore how to implement in-memory PDF to HTML conversion in C# and .NET using the powerful PDF Focus .NET component from SautinSoft library.
In-memory PDF to HTML conversion is a process in which the entire data stream is converted without writing intermediate files to disk. This is critical in new applications where security, speed, and efficiency are paramount.

Key Benefits:

  • Data Security: Avoiding the need to save files to disk reduces the risk of data leakage.
  • Performance: Avoiding disk read/write operations speeds up processing.
  • Integration: Easy implementation in ASP.NET applications, cloud solutions, APIs, and services.

Key Concept.
Using the PDF Focus API, you can load a PDF document from memory (for example, from a byte array or stream), convert it to HTML, and retrieve the result as a string or stream—all without touching the file on disk.

This approach is widely used in various scenarios:

  • Web applications and portals: displaying PDF documents in the browser without storing them on the server.
  • Document processing and storage: automatically converting PDFs to HTML for indexing, searching, and further analysis.
  • Mobile and cloud solutions: avoiding file system operations, reducing load and increasing speed.
  • Integration with security systems: avoiding the use of file streams to reduce the risk of leaks.

Other important considerations and tips:

  • Licensing: ensure you have a valid license key for PDF Focus .NET to use it in commercial applications.
  • Performance: working with large files may require resource management and tuning API parameters.
  • Converting complex documents: PDF Focus handles a wide range of PDFs well, but in rare cases, HTML post-processing may be required to remove artifacts.
  • Error handling: Always implement exception handling to avoid data loss or crashes.
  • HTML customization: You can further manage styles, CSS, and markup during conversion to tailor the result to your project's needs.

Complete code

using System;
using System.IO;

namespace Sample
{
    class Sample
    {
        static void Main(string[] args)
        {
            // Before starting, we recommend to get a free key:
            // https://sautinsoft.com/start-for-free/
            
            // Apply the key here:
            // SautinSoft.PdfFocus.SetLicense("...");
			
            ConvertPdfBytesToHtml();
            //ConvertPdfStreamToHtml();
        }

        private static void ConvertPdfBytesToHtml()
        {
            // We need files only for demonstration purposes.
            // The whole conversion process will be done in memory.
            string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");
            string htmlFile = "Result.html";
			
            // Convert PDF to HTML in memory
            SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();


            // Let's force the component to store images inside HTML document
            // using base-64 encoding.
            // Thus the component will not use HDD.
            f.HtmlOptions.IncludeImageInHtml = true;
            f.HtmlOptions.Title = "Simple text";            

            // Read a PDF document to byte array
            // Assume that we already have the  PDF as array of bytes.
            byte[] pdf = File.ReadAllBytes(pdfFile);

            f.OpenPdf(pdf);

            if (f.PageCount > 0)
            {
                // Convert PDF to HTML in memory
                string html = f.ToHtml();

                // Save HTML to the file only for demonstration purpose.
                if (html != null)
                {
                    File.WriteAllText(htmlFile, html);
                    // Open the result for demonstration purposes.
                        System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(htmlFile) { UseShellExecute = true });

                }
            }
        }
        private static void ConvertPdfStreamToHtml()
        {
            // We need files only for demonstration purposes.
            // The whole conversion process will be done in memory.
            string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");
            string htmlFile = "Result.html";

                                  // Get your free key here:   
			 // https://sautinsoft.com/start-for-free/
			
            // Convert PDF to HTML in memory
            SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();


            // Let's force the component to store images inside HTML document
            // using base-64 encoding.
            // Thus the component will not use HDD.
            f.HtmlOptions.IncludeImageInHtml = true;
            f.HtmlOptions.Title = "Simple text";

            // Assume that we have a PDF document as Stream.
            using (FileStream fs = File.OpenRead(pdfFile))
            {
                f.OpenPdf(fs);

                if (f.PageCount > 0)
                {
                    // Convert PDF to HTML to a MemoryStream.
                    using (MemoryStream msHtml = new MemoryStream())
                    {
                        int res = f.ToHtml(msHtml);
                        // Open the result for demonstration purposes.
                        if (res == 0)
                        {
                            File.WriteAllBytes(htmlFile, msHtml.ToArray());
                            System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(htmlFile) { UseShellExecute = true });
                        }
                    }
                }
            }
        }
    }
}

Download

Imports System
Imports System.IO

Namespace Sample
    Friend Class Sample
        Shared Sub Main(ByVal args() As String)
			' Before starting, we recommend to get a free key:
			' https://sautinsoft.com/start-for-free/

			' Apply the key here
			' SautinSoft.PdfFocus.SetLicense("...");

            ConvertPdfBytesToHtml()
            'ConvertPdfStreamToHtml()
        End Sub

        Private Shared Sub ConvertPdfBytesToHtml()
            ' We need files only for demonstration purposes.
            ' The whole conversion process will be done in memory.
            Dim pdfFile As String = Path.GetFullPath("..\..\..\simple text.pdf")
            Dim htmlFile As String = "Result.html"
		
            ' Convert PDF to HTML in memory
            Dim f As New SautinSoft.PdfFocus()

            ' Let's force the component to store images inside HTML document
            ' using base-64 encoding.
            ' Thus the component will not use HDD.
            f.HtmlOptions.IncludeImageInHtml = True
            f.HtmlOptions.Title = "Simple text"

            ' Read a PDF document to byte array
            ' Assume that we already have the  PDF as array of bytes.
            Dim pdf() As Byte = File.ReadAllBytes(pdfFile)

            f.OpenPdf(pdf)

            If f.PageCount > 0 Then
                ' Convert PDF to HTML in memory
                Dim html As String = f.ToHtml()

                ' Save HTML to the file only for demonstration purpose.
                If html IsNot Nothing Then
                    File.WriteAllText(htmlFile, html)
                    ' Open the result for demonstration purposes.
                    System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(htmlFile) With {.UseShellExecute = True})

                End If
            End If
        End Sub
        Private Shared Sub ConvertPdfStreamToHtml()
            ' We need files only for demonstration purposes.
            ' The whole conversion process will be done in memory.
            Dim pdfFile As String = Path.GetFullPath("..\..\..\simple text.pdf")
            Dim htmlFile As String = "Result.html"
	                                ' Get your free key here: 
	                                ' https://sautinsoft.com/start-for-free/
		
            ' Convert PDF to HTML in memory
            Dim f As New SautinSoft.PdfFocus()

            ' Let's force the component to store images inside HTML document
            ' using base-64 encoding.
            ' Thus the component will not use HDD.
            f.HtmlOptions.IncludeImageInHtml = True
            f.HtmlOptions.Title = "Simple text"

            ' Assume that we have a PDF document as Stream.
            Using fs As FileStream = File.OpenRead(pdfFile)
                f.OpenPdf(fs)

                If f.PageCount > 0 Then
                    ' Convert PDF to HTML to a MemoryStream.
                    Using msHtml As New MemoryStream()
                        Dim res As Integer = f.ToHtml(msHtml)
                        ' Open the result for demonstration purposes.
                        If res = 0 Then
                            File.WriteAllBytes(htmlFile, msHtml.ToArray())
                            System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(htmlFile) With {.UseShellExecute = True})
                        End If
                    End Using
                End If
            End Using
        End Sub
    End Class
End Namespace

Download


If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below:


Captcha

Questions and suggestions from you are always welcome!

We are developing .Net components since 2002. We know PDF, DOCX, RTF, HTML, XLSX and Images formats. If you need any assistance with creating, modifying or converting documents in various formats, we can help you. We will write any code example for you absolutely free.