How to extract Text from HTML using C# and VB.NET

HTML to RTF .Net

.Net assembly to convert HTML to Text, RTF and DOCX in .Net and C#.

HTML to RTF .Net

How to extract Text from HTML using C# and VB.NET


Rest assured, if you are looking a .NET solution how to extract a Text from HTML document, you are in the right place. To illustrate, let's see the simplest C# code:

                  SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();
            string htmlString = "Hello World!";
            string outputFile = @"c:\Test\result.txt";

            if (h.OpenHtml(htmlString))
                bool ok = h.ToText(outputFile);

                // Open the result for demonstration purposes.
                if (ok)
                    System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(outputFile) {
                    UseShellExecute = true });

You will likely be surprised at the amount of built in functionality and ability to convert also to other formats:

How looks HtmlToRtf .Net class


To see this functionality firsthand, download the freshest «HTML to RTF .Net» with code examples, 19.7 Mb.


HTML to RTF .Net The limitations of the free version are: The trial notice "Created by unlicensed version of HTML to RTF .Net" and the random addition of the word "TRIAL".

Three examples to extract Text from HTML in C# and VB.NET

1. Simple extraction of Text from HTML file in C#:

            SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();

            string htmlFile = @"d:\Resurrection.html";
            string textFile = Path.ChangeExtension(htmlFile, ".txt");

            h.OutputFormat = HtmlToRtf.eOutputFormat.TextUnicode;
            h.ConvertFile(htmlFile, textFile);

2. Convert HTML to Text in memory using C#:

            SautinSoft.HtmlToRtf h = new SautinSoft.HtmlToRtf();

            string htmlFile = @"d:\Resurrection.html";
            string htmlString = File.ReadAllText(htmlFile);

            // Start the conversion.
            h.OutputFormat = HtmlToRtf.eOutputFormat.TextAnsi;
            string textString = h.ConvertString(htmlString);

3. Extract Text from HTML in memory using VB.NET:

			Dim h As New SautinSoft.HtmlToRtf()

			Dim htmlFile As String = "d:\Resurrection.html"
			Dim htmlString As String = File.ReadAllText(htmlFile)

			' Start the conversion.
			h.OutputFormat = HtmlToRtf.eOutputFormat.TextUnicode
			Dim textString As String = h.ConvertString(htmlString)

Requirements and Technical Information

Requires only .NET Framework 4.0 and up or .NET Core 2.0 and up. Our product is compatible with all .NET languages and supports all Operating Systems where .NET Framework and .NET Core can be used.

Note, that «HTML to RTF .Net» is entirely written in managed C#, which makes it absolutely standalone and an independent library.

.Net Framework 4.0 and higher and .Net Core 2.0 and higher

.NET Framework 4.5, 4.6.1 and higher.

.NET Standard 2.0

.NET Core, .NET 5.0 and higher.

Multi-platform component, runs on:

Our component has proven itself on cloud platforms and services:

  • Microsoft Azure
  • Amazon Web Services (AWS)
  • Google Cloud Platform
  • SharePoint
  • Docker
  • Xamarin Forms
  • etc.