How to save a document as PDF/A with different conformance levels in C# and VB.NET

The ISO 19005 standard (PDF/A) defines requirements for creating documents suitable for archiving based on the widely available PDF format. The standard specifies in detail what content is allowed and what is not. These and other specifications are intended to ensure long-term readability of the documents regardless of the application software and operating system in which they were originally produced.

All information that is necessary for the document to be displayed unchanged each time must be embedded in the file. This includes all the contents of the document: text, bitmaps and vector graphics, fonts, color information, etc.

The constraints for PDF/A-1, PDF/A-2, PDF/A-3 include:

  • Audio and video content are forbidden. 3D artwork is also forbidden.
  • All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering.
  • Colorspaces must be specified in a device-independent manner.
  • Audio and video content are forbidden. 3D artwork is also forbidden.
  • Javascript and executable file launches are prohibited.
  • Encryption is disallowed.
  • Use of standards-based metadata is mandated.

The Different PDF/A Standards.

PDF/A-1. The first PDF/A standard (ISO 19005-1:2005), was based on PDF version 1.4 and published in 2005. Missing features: JPEG 2000 compression, transparency, layers and attachments.

PDF/A-1a: level A – satisfies all requirements in the specification. The content must be tagged with a structure tree, meaning elements such as reading order, figures and tables are explicitly identified through metadata.

PDF/A-1b: level B –  is a lower level of conformance, encompassing the requirements of this part of ISO 19005 regarding the visual appearance of electronic documents, but neither their structural or semantic properties nor the requirement that all text have Unicode equivalents. It is preferable for scanned documents.

PDF/A-2. Standard (ISO 19005-2:2011). Based on PDF 1.7 (ISO 32000-1:2008). Introduces several features unavailable in PDF 1.4. This offers a number of useful technical innovations: highly efficient JPEG 2000 compression, support for transparency effects and layers, and embedding of OpenType fonts as well as provisions for digital signatures in accordance with the PAdES (PDF Advanced Electronic Signatures) standard. It also offers the possibility to embed PDF/A files in PDF/A-2, allowing archiving of sets of documents as individual documents in a single file.

PDF/A-2a: indicates complete compliance with the ISO 19005-2 requirements, including those related to structural and semantic properties of documents. A valid a-level PDF/A will have text that can be reliably searched and copied.

PDF/A-2b: level B –  is a lower level of conformance, encompassing the requirements of this part of ISO 19005 regarding the visual appearance of electronic documents, but neither their structural or semantic properties.

PDF/A-2u: level U conformance represents level B conformance with the additional requirement that all text in the document have Unicode equivalents. Therefore, u-level conformance will have text that can be reliably searched and copied, but the reading order will not be guaranteed.

PDF/A-3. Standard (ISO 19005-3:2012). Based on PDF 1.7 (ISO 32000-1:2008). PDF/A-3 adds a single and very important feature to the specification (ISO 19005-2) that allows embedding in a PDF/A file or files in any other format, and not just other PDF/A files (as allowed in PDF/A-2). Files that comply with these requirements are termed “associated” files; an explicit association must be made between each embedded files and the containing PDF or object or structure (e.g., image, page, or logical section) within the PDF.

However, a PDF/A viewer is not required to do anything extra with these attached files beyond ensuring their proper extraction. Therefore, the standard cannot guarantee whether you will be able to read or otherwise use these files in the future.

PDF/A-3a: indicates complete compliance with the ISO 19005-3 requirements, including those related to structural and semantic properties of documents. A valid a-level PDF/A will have text that can be reliably searched and copied.

PDF/A-3b: level B is a lower level of conformance, satisfying requirements intended to be those minimally necessary to ensure that the rendered visual appearance of a conforming file is preservable over the long term. The specification notes that Level B conforming files might not have sufficiently rich internal information to allow for the preservation of the document’s logical structure and content text stream in natural reading order, which is provided by Level A conformance.

PDF/A-3u: level U conformance represents level B conformance with the additional requirement that all text in the document have Unicode equivalents. Therefore, u-level conformance will have text that can be reliably searched and copied, but the reading order will not be guaranteed.

So, if you need to save a document as a PDF with the PDF/A-1b compliance level, just type a few lines of code.

DocumentCore dc = DocumentCore.Load(@"d:\input.docx");
            dc.Save(@"d:\output.pdf", new PdfSaveOptions()
            {
                Compliance = PdfCompliance.PDF_A1b,                
            });

The code snippet below shows how you can create a new document containing text and save this document as a PDF/A-2a.

DocumentCore dc = new DocumentCore();
            DocumentBuilder db = new DocumentBuilder(dc);
            db.CharacterFormat.FontName = "Times New Roman";
            db.CharacterFormat.Size = 24;
            db.Writeln("Hello World!");
            dc.Save(@"d:\output.pdf", new PdfSaveOptions()
            {
                Compliance = PdfCompliance.PDF_A2a,                
            });

For each specific document, the format of which is declared as PDF/A, it is impossible to say in advance that this is really so. Validation is required for compliance with the format requirements of each specific document.

If you want to check your PDF/A files for validity, see the links to free online services below.

https://www.pdf-online.com/osa/validate.aspx

https://bfo.com/blog/2017/11/08/verify_pdfa_online

https://pdfrecover.herokuapp.com/pdfaconvert

Complete code examples and more information regarding PDF/A manipulation see: https://www.sautinsoft.com/products/document/examples/create-and-save-document-in-pdf-a-format-net-csharp-vb.php

Thank you for your time.

Facebook Comments

Leave a Comment

Your email address will not be published.