Skip to content
Back to blog
Part 3 of 6 in series
Learning in Public: Accessibility Literacy

A curated collection of articles exploring this topic in depth.

5 min read

Content as Semantic Data

Using the Abstract Syntax Tree (AST) to automate structural integrity and deep-linking in MDX content.


In the previous parts of this series, I focused on the mechanics of the browser: how it perceives structure through the Accessibility Tree and how it handles interactive logic through state and focus management. However, for a content-heavy site like a blog, the most significant opportunity for accessibility is not in the components I write, but in the pipeline that processes my content.

Context: Content as a Data Structure#

When we write markdown, it is easy to think of it as just a convenient shorthand for HTML. We write a hash followed by a space, and we expect an <h1> tag. We write some text in brackets followed by a URL in parentheses, and we expect a link.

The content pipeline is the best place to bake in accessibility. By treating MDX as a data structure (an Abstract Syntax Tree or AST), we can automate structural integrity during the transformation from raw MDX to static HTML. This ensures that every post is accessible by default, regardless of manual effort during authoring.

Approach: Automating Structural Integrity#

To build a reliable map of my content, I use the rehype ecosystem(opens in a new tab) to manipulate the AST during the transformation process.

1. Unique Identifiers: rehype-slug#

For a screen reader user, being able to jump to a specific section is vital for navigating a long post. However, this requires every heading to have a unique id attribute. Manually managing these is brittle and error-prone.

By integrating rehype-slug, I automate this process. The plugin scans every heading in the document and generates a URL-safe slug based on the text content:

// web/lib/markdown.ts
const processedContent = await unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(rehypeSlug) // Automatically generates id attributes
  .use(rehypeStringify)
  .process(content);

The insight here was that unique IDs are the foundation of deep-linking and document scanning. Without them, the "map" we provide to the Accessibility Tree is incomplete.

Having IDs is a good start, but they are useless if they aren't discoverable. I use rehype-autolink-headings to inject an anchor link into every heading. This allows both sighted and non-sighted users to obtain a direct link to any section of the post.

However, it is important to ensure these links remain accessible, so I configure the plugin to include an aria-label and hide the decorative anchor icon from screen readers to avoid redundant announcements:

// web/lib/markdown.ts
.use(rehypeAutolinkHeadings, {
  behavior: 'append',
  properties: {
    className: ['anchor'],
    ariaLabel: 'Link to section',
  },
  content: {
    type: 'element',
    tagName: 'span',
    properties: { className: ['anchor-icon'], ariaHidden: true },
    children: [{ type: 'text', value: '#' }],
  },
})

3. The Navigational Map: Table of Contents#

A 2,000-word post can be overwhelming. Sighted users scan headings to find what they need. Screen reader users do the same by jumping between landmark headings. By extracting these headings directly from the AST, we can generate a Table of Contents (ToC) that serves as a permanent navigational map at the top of the post.

I use unist-util-visit to walk the tree before it is converted to HTML, extracting the text and level of every heading:

// web/lib/markdown.ts
.use(() => (tree: Root) => {
  visit(tree, 'heading', (node: MdastHeading) => {
    const text = toString(node);
    headings.push({
      level: node.depth,
      text,
      slug: slugger.slug(text),
    });
  });
})

This ensures the ToC and the actual heading IDs are always in sync because they are derived from the same source of truth.

Later in this series, I will discuss the importance of identifying external links for users. However, relying on myself to remember to manually add icons and screen-reader text to every outbound link is brittle. Instead, I wrote a custom rehype plugin that automatically detects external links in the AST and appends the necessary context:

// web/lib/markdown.ts
const rehypeExternalLinks = () => (tree: HastRoot) => {
  visit(tree, 'element', (node: Element) => {
    if (node.tagName === 'a' && typeof node.properties?.href === 'string') {
      const href = node.properties.href;
      if (href.startsWith('http')) {
        node.properties.target = '_blank';
        node.properties.rel = 'noopener noreferrer';
 
        // Append a visually-hidden icon and screen-reader text
        node.children.push(
          iconNode,   // SVG with ariaHidden: true
          srOnlyNode, // <span class="sr-only">(opens in a new tab)</span>
        );
      }
    }
  });
};

This means every external link in every post automatically receives the visual icon and the (opens in a new tab) announcement, without the me needing to remember. It is the same principle as rehype-slug: by baking the rule into the pipeline, we guarantee consistency across all the content.

Trade-offs: Automation vs. Complexity#

Choosing to automate structural integrity comes with a trade-off in pipeline complexity. I am now dependent on a specific set of plugins and a custom processing step. If a plugin changes its slugification logic, I risk breaking existing deep links.

However, this is an acceptable trade-off because the alternative (manual maintenance) is a guaranteed path to inconsistency and regression. By baking these rules into the pipeline, I ensure that every new post inherits these accessibility features without me needing to remember a checklist.

Literacy Gained#

The practical shift was subtle but important. Markdown stopped feeling like presentation and started behaving like structure.

Accessibility then became less about UI polish and more about data discipline. When content is modelled semantically, the AST becomes a mechanism for maintaining navigability and inclusion by default.

In the next part, we will move from the content pipeline to the browser’s global environment, exploring how we can design for user intent through motion and focus preferences.