Processing RTL documents is a common need for international businesses, government agencies, educational institutions, and many other organizations. Virtual assistants, automated translation systems, search engines, and archival storage systems all require accurate data extraction with support for RTL languages. In this article, we'll discuss how to easily implement this task using C# and .NET, using the popular SDK SautinSoft Document.NET.
Many documents, especially those in Arabic, Hebrew, and other right-to-left (RTL) languages, require a special approach during processing. The standards and structures of such documents may differ from those typical for left-to-left (LTR) text.
Tasks related to extracting text and data from RTL documents:
Without proper processing, there is a risk of losing important data or misinterpreting the content.
Complete code
using SautinSoft.Document;
using System;
using System.IO;
using System.Linq;
using System.Reflection.Metadata;
namespace Sample
{
class Sample
{
static void Main(string[] args)
{
// Get your free trial key here:
// https://sautinsoft.com/start-for-free/
ConvertRTLcontent();
}
/// <summary>
/// How to convert documents with Right-To-Left content to HTML.
/// </summary>
/// <remarks>
/// Details: https://sautinsoft.com/products/document/help/net/developer-guide/convert-documents-with-right-to-left-content-to-html.php
/// </remarks>
public static void ConvertRTLcontent()
{
string sourcePath = @"..\..\..\RTL.docx";
string destPath = "RTL.html";
// Load document with arabic, hindi, hebrew content.
DocumentCore dc = DocumentCore.Load(sourcePath);
// Save the document as HTML.
dc.Save(destPath, new HtmlFixedSaveOptions());
// Show the source and the dest documents.
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(sourcePath) { UseShellExecute = true });
System.Diagnostics.Process.Start(new System.Diagnostics.ProcessStartInfo(destPath) { UseShellExecute = true });
}
}
}Imports SautinSoft.Document
Imports System
Imports System.IO
Imports System.Linq
Imports System.Reflection.Metadata
Namespace Sample
Friend Class Sample
Shared Sub Main(ByVal args() As String)
' Get your free trial key here:
' https://sautinsoft.com/start-for-free/
ConvertRTLcontent()
End Sub
''' <summary>
''' How to convert documents with Right-To-Left content to HTML.
''' </summary>
''' <remarks>
''' Details: https://sautinsoft.com/products/document/help/net/developer-guide/convert-documents-with-right-to-left-content-to-html.php
''' </remarks>
Public Shared Sub ConvertRTLcontent()
Dim sourcePath As String = "..\..\..\RTL.docx"
Dim destPath As String = "RTL.html"
' Load document with arabic, hindi, hebrew content.
Dim dc As DocumentCore = DocumentCore.Load(sourcePath)
' Save the document as HTML.
dc.Save(destPath, New HtmlFixedSaveOptions())
' Show the source and the dest documents.
System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(sourcePath) With {.UseShellExecute = True})
System.Diagnostics.Process.Start(New System.Diagnostics.ProcessStartInfo(destPath) With {.UseShellExecute = True})
End Sub
End Class
End Namespace
If you need a new code example or have a question: email us at support@sautinsoft.com or ask at Online Chat (right-bottom corner of this page) or use the Form below: