How to convert DOCX to HTML in C# and .NET. Various examples.

How to convert DOCX to HTML in C# and .NET

RTF to HTML .Net

.Net assembly to convert Text, RTF and DOCX to HTML 3.2, 4.01, XHTML and HTML5 in .Net and C#.
How to convert DOCX to HTML in C# and .NET. Various examples.

Introduction

With the help of "RTF to HTML .Net", any .NET application can easily convert DOCX documents to HTML and XHTML format. For example, to convert a DOCX to HTML in C# you will only need to add a reference to the .dll and type a few lines of code:


                 SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
            r.OpenDocx(@"c:\Quiet Flows the Don.docx");

            r.OutputFormat = SautinSoft.RtfToHtml.eOutputFormat.HTML_5;
            r.ToHtml(@"c:\Quiet Flows the Don.html");
          

The library gives you a full set of API to convert DOCX to HTML. Furthermore, during conversion to HTML you may adjust following:

  • Various output format: HTML 3.2, HTML 4.01, HTML 5, XHTML, Text.
  • Generating output document in plain HTML 3.2 without CSS.
  • Whether to store images on filesystem or embed them into HTML document using base64 encoding.
  • Save CSS data between tags <style>...</style> or as inline styles <tag style="...">.
  • Specify encoding of output HTML.
  • Set up document Title; create only the part of HTML between <body>...</body> tags.
  • Set up a common font, size and color for a whole document.
  • Detect hyperlinks from text and make them real hyperlinks.
  • Override the table borders visibility.

Download

To see this functionality firsthand, download the freshest «RTF to HTML .Net» with code examples, 51.8 Mb.

Limitations

RTF to HTML .Net The limitations of the free version are: The trial notice "Created by unlicensed version of RTF to HTML .Net" and the random addition of the word "TRIAL".


Some examples to convert DOCX to HTML in C# and VB.Net

1. Convert DOCX file to HTML file in C#:

			SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
            string docxFile = @"d:\The Gift.docx";
            string htmlFile = Path.ChangeExtension(docxFile, ".html");

            r.OpenDocx(docxFile);
            r.OutputFormat = SautinSoft.RtfToHtml.eOutputFormat.HTML_5;
            if (r.ToHtml(htmlFile)==true)
            {
                // Open HTML in browser.
                System.Diagnostics.Process.Start(htmlFile);
            }
2. Convert DOCX to HTML in memory using C#; store images inside HTML using base-64.
			SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
            string docxFile = @"d:\The Gift.docx";
            byte[] docxBytes = File.ReadAllBytes(docxFile);

            // Let's store all images inside the HTML document.
            r.ImageStyle.IncludeImageInHtml = true;

            r.OpenDocx(docxBytes);
            string htmlString = r.ToHtml();
3. Convert DOCX to HTML in VB.Net; make all CSS styles inline, i.e. put within the attribute style="...".
            Dim r As New SautinSoft.RtfToHtml()

            ' Set HTML5 format.
            r.OutputFormat = SautinSoft.RtfToHtml.eOutputFormat.HTML_5
            ' Set utf-8 encoding.
            r.Encoding = SautinSoft.RtfToHtml.eEncoding.UTF_8
            ' Make all CSS inline.
            r.TextStyle.InlineCSS = True

            Dim docxFile As String = "e:\Petersburg.docx"
            Dim htmlFile As String = Path.ChangeExtension(docxFile, ".html")
            r.ConvertFile(docxFile, htmlFile)
4. Convert DOCX to HTML in C#; get the list with all images from DOCX.
			SautinSoft.RtfToHtml r = new SautinSoft.RtfToHtml();
            string docxFile = @"d:\The Gift.docx";
            byte [] docxBytes = File.ReadAllBytes(docxFile);

            // Let's store all images inside the HTML document.
            r.ImageStyle.IncludeImageInHtml = true;

            List listImages = new List();
            r.OpenDocx(docxBytes);
            string htmlString = r.ToHtml(listImages);

            // Let's loop and and save all images to hdd.
            int count = 1;
            foreach (RtfToHtml.SautinImage img in listImages)
            {
                img.Img.Save(String.Format(@"d:\image{0}.png", count));
                count++;
            }

Requirements and Technical Information

«RTF to HTML .Net» can be used on 32 and 64-bits platforms with .NET Framework 4.0 and higher, .NET Core 2.0 and higher. The component doesn't require Internet Explorer, Microsoft Office or any other software. It's absolutely standalone and independent library.

The DOCX conversion works starting from .NET Framework 4.0 and higher, .NET Core 2.0 and higher. If you are looking for a standalone C# library to create and parse Word documents, try our Document .Net.

Our product is compatible with all .NET languages and supports all Operating Systems where .NET Framework can be used. Note that «RTF to HTML .Net» is entirely written in managed C#.

.Net Framework 4.0 and higher and .Net Core 2.0 and higher

.NET Framework 4.0, 4.5, 4.6.1 and higher.

.NET Standard 2.0

.NET Core 2.0 and higher.


Multi-platform component, runs on:


Our component has proven itself on cloud platforms and services:

  • Microsoft Azure
  • Amazon Web Services (AWS)
  • Google Cloud Platform
  • SharePoint
  • Docker
  • etc.

If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: