What is PDF Remediation? P

If you are responsible for your organization’s website and online documents, you may find yourself answering questions such as “What is Accessibility?” and “What is PDF remediation?” and “What are tags?” Below is a brief overview of what PDF remediation involves.

What does accessible mean?

“Accessible” means “able to be reached or entered.” When referring to digital content, it means able to be used by a person using assistive technology, or by someone who has a disability. Accessible documents are readable by assistive technology such as screen readers or connected Braille displays, but “accessible” also means understandable by all people, including those with cognitive disorders or brain injuries, and usable on a variety of technology and platforms. For more information about accessible digital content, check out this article, The Four Pillars of WCAG.

What are “Tags?”

Tags are digital labels that provide information to assistive technology about the elements a document contains. These can include headings, images, tables, lists, links, etc. Tags also tell assistive technology where these various elements belong in the order of the document. Tags provide in a hierarchy (or, “outline”) of how a document should be read and they provide structure. They inform assistive technology users about what they are reading and help them to more easily navigate and move through the content.

What is PDF Remediation?

PDF Remediation is the process of “tagging” digital elements of PDF documents so that they can be read using assistive technology. Again, these “tags” identify the elements and inform the assistive technology about the order in which they are meant to be read. Many organizations use the PDF file format because visually, it remains the same no matter what platform is used to open it. For visual users, the PDF format is stable and consistent across many platforms and a variety of devices.

Ideally, all documents would be created accessibly, and stay that way even if “saved as PDF.” In truth, that is often not the case. Even completely accessible documents created in MS Word, Google Docs, or other authoring tools may not be accessible when saved to PDF format. Not all existing tags are preserved when content is converted to PDF, and some elements may still require remediation in order to remain accessible. Similarly, a document that begins as inaccessible or only partially accessible will not be more accessible when saved as a PDF. Remediation is required.

The benefits of adding correct PDF tagging go beyond accessibility. They also improve the SEO of any online documents and make such documents more usable for everyone reading them.

Common PDF elements requiring remediation to be made accessible are images, headings, links, lists, tables, and reading order. Other elements may require remediation as well, but these are the most commonly occurring elements that need to be tagged.

Headings

Headings are a navigation tool that help organize a document and inform the reader of what it contains. Just like newspaper headings, document headings tell the user what type of content follows. For an assistive technology user, headings are essential in dividing content into easily understood sections. A person using assistive technology can choose to move through a document reading only the headings to tell them what it contains. Without headings, a person reading a document cannot find specific information without reading every single line of text it contains.

If, for example, the document is a handbook, and the person reading it needs help with a specific topic, they don’t want to have to read the entire 50-page document to find out how to replace batteries. Instead, they will want to navigate to that section by using headings. Users can skim through headings to find what they need, and properly tagged digital headings help assistive technology users do this.

Images

All images must have alt text in order to be understood by assistive technology. Without alt text, any image found in a PDF (or any other format) will simply read as “image” or “graphic.” This means that whatever information was meant to be conveyed by that image or graphic is unavailable to an assistive technology user. Some images are purely decorative and may not require alt text. These can be tagged as an “artifact” and will be ignored by assistive technology. This might include background images, boxes, text shadows, or repetitive logos.

Painting of George Washington Crossing the Delaware in a rowboat full of troops. He is stood in the front of the boat with a furled American Flag.

Alt text should be short and describe the image as it pertains to the content. The best way to do this is an article unto itself, but an example might be a picture of George Washington Crossing the Delaware. How the alt text is written depends on the context. Is it a document about Presidents? The Revolutionary War? Boating in the 1700s? 18th-century art? Readers need to know how the image forwards the information being conveyed in the rest of the document.

Charts, graphs, flow charts, and infographics need to be clearly and completely described. Sometimes the best way to do this is to include the data table from which the chart or graph is derived.

Links

Links within PDF documents need to be tagged as links. If the text of the document doesn’t indicate where the link leads, some explanation is necessary so the reader knows where the link is going. Otherwise, they may not realize they are leaving your website, or where they will end up. It’s the digital equivalent of jumping off a cliff. This kind of information is useful for all users. Most people prefer to know where a link is leading, and most people would rather see a link attached to descriptive text than a string of HTML code.

Lists

Sample Table of Contents

Lists need to be tagged as lists. If there is no indication that text is part of a list, it will simply appear to be a wall of unrelated text or a bunch of words with no context.

Properly tagged lists will allow assistive technology to inform the reader that items are “item 1 of 12” so they know the items are part of a list. This can be quite complex in the case of a nested list (or “outline”). A table of contents is an example of a list.

Without proper list tags, the table of contents pictured here will simply read as a wall of items. The reader will be unable to understand that for example, that “Research Design,” “Participants,” “Measures” and “Procedure and Analytic Plan” fall under the section “Methods.”

Tables

Tables can be difficult for assistive technology users to parse. Because each cell usually refers to both a row and a column header for content, additional information is required so that the data can be clearly understood. Row and column headers must be identified in order for the data to be understood and to facilitate navigation.

**Table 9, Rainfall by continent 2009**
Rainfall (inches)	Americas	Asia	Europe	Africa
Average	133	244	155	166
24 hour high	27	28	29	20
12 hour high	11	13	13	16

If, for example, in the table to the left, the user wants to know the average rainfall in each country, they need to know that the columns read Americas, Asia, Europe, and Africa and that the rows read Average, 24-hour high and 12-hour high. Then can then navigate across the rows to get the information. However, without an identified column header tag, all the reader gets is a list of numbers: 133, 244, 155, and 166 with no context. Similarly, without the Table Summary of Table 9, Rainfall by continent 2009, they won’t know what year this data is from.

Reading Order

Polio Vaccine Frontpage headline in Pittsburgh Press

Reading order is just what it says: the order in which the elements are read. Think again of a newspaper with headings, columns of text, and boxed items such as ads or references to other articles. Without an identified reading order, assistive technology will not understand how to read down the first column, starting again at the top of the second. It will not understand that an advertisement next to the article being read is meant to be separate and should be read before or after, not in the middle of the article.

Without clear reading order tags, this newspaper pictured to the right would be read as “Baseball The Pittsburgh Press Final Latest stocks 70 pages 5 cents Vol 71, No 291, Tuesday, April 12, 1955, Weather- mild and wet, Polio is conquered public airing biggest thrill came 31 months ago vaccine Salk shots…” This would be utterly incomprehensible and the sense of the information is lost. By clearly marking the correct reading order for each heading, paragraph, and image in this newspaper page, someone using assistive technology would get the same information as anyone else.

PDF Remediation organizes and labels

PDF remediation is the process by which digital information is clearly labeled (or, “tagged”) and organized so that people using assistive technology can get the same information from the document that anyone else does. It requires a remediation tool and some understanding of remediation procedures. Most organizations are legally required to make their websites and digital documents accessible. In many states, there are laws to protect people with disabilities, and digital content is part of that mandate. These are the basics of PDF remediation.