Revisiting the language hints

The web gets written in multiple languages. Knowing you're about to click on a link that sends you to a page in another language really helps navigation.

Thomas, a good friend of mine, published recently an article explaining how to provide language hints with CSS (fr). Following a discussion on twitter, it turns out injecting content with CSS causes accessibility issues (thanks for pointing it out Julie (fr)). Users with custom stylesheets for their needs might override the content generated by CSS and miss out on the information. As I'm using a very similar technique, there was need to update the site again.

While client-side JavaScript would have allowed to inject some HTML with the content, making it more accessible. It had the same brittleness issue as the CSS, though: if the JavaScript doesn't get loaded, no content. Instead, I preferred looking at generating those hints at build time.

Embracing abstract syntax trees

Presented with a heap of Markdown, the first step will be to find all the links with an hreflang attribute. CSS and client-side JavaScript can use the a[hreflang] selector to spot them. Working with remark, already in place for rendering the content, we can do the same for processing the Markdown.

This tool turns the Markdown text into a tree of JavaScript objects (an abstract syntax tree, or AST). A bit like an HTML document is turned into the DOM in the browser, but each node representing a piece of Markdown: a Link, a Heading… The tree can then be traversed and the nodes transformed with plug-ins.

Working with an AST makes the process much more robust than processing the text itself, via a regular expression for example. The object structure removes the need to care for whether the link has other attributes before or after the hreflang, possibly HTML content inside, maybe with line breaks. We can focus on finding links with an hreflang in its properties and adding a final node to their children.

Converting the tree the tree

Actually, we want to find all the <a> tags, not just the [Markdown links] and then inject a <span>. These are HTML concepts, not Markdown ones, and won't appear in remark's tree. But we can convert it to one representing HTML, thanks to remark being part of an ecosystem of tools for manipulating text, structured like HTML and Markdown, or not.

This is the role of two plug-ins:

Those need to be added to the pipeline of remark plug-ins currently processing the Markdown. Because of jstransformer-remark's API for receiving plug-ins, there's a little gymnastic with functions to do for configuring remark-rehype. Without the allowDangerousHtml option inline <a> tags wouldn't be detected afterwards by rehype-raw.

This brings the pipeline to (after installing the two packages with npm i remark-rehype rehype-raw):

/* … */
.use(
  inPlace({
    engineOptions: {
      plugins: [
        require('remark-slug'),
        require('remark-autolink-headings'),
        function() {
          return require('remark-rehype')({
            allowDangerousHtml: true
          });
        },
        require('rehype-raw')
      ]
    }
  })
/* … */

At this stage, the resulting HTML is a bit wrecked though, having lost all its semantics. This is due to the jstransformer-remark glue used for processing, so let's set to fix that.

Patching an NPM module

So it transforms Markdown to HTML out of the box, jstransformer-remark adds by default the remark-html plug-in after all the plug-ins provided by users. However, we're now feeding it a tree representing HTML instead of Markdown and it's all lost.

Converting back to a Markdown tree, with rehype-remark, loses a good few classes and attributes, unfortunately. Instead, we'll be looking at patching jstransformer-remark to remove the addition of remark-html.

The code being inside node_modules, any change would be wiped when we reinstall the project and not saved by Git. Enters patch-package, which will let us save the changes as a patch and apply them after each install.

It's perfect for making one-off little change to a small library, much lighter than forking the project. It can still be nice to raise an issue on the original package to see if the change could be integrated to it and remove the need for the patch in the future.

Once patch-package is installed with npm i -D patch-package, we can go and edit the file from jstransformer-remark, in node_modules/jstransformer-remark/index.js:

/* … */
// plugins.push(html)
/* … */

Once it's done, running npx patch-package will create the patch, comparing our edits with a clean version, and store it inside a patches folder.

Remains to apply this patch after each install. This can be done through a postinstall script in the package.json file. It'll run patch-package, which will apply the patches in the repository, ensuring that jstransform-remark is in the shape we need it to be.

remark-html was responsible for rendering the tree back into a string of HTML. Now we're removed it, we need to add that back. rehype-stringify, added at the end of the plug-in list, will take care of that.

Building the plug-in

Everything is now ready for creating our own plug-in in src/rehype/hreflang.js. remark/rehype plug-ins have a very similar structure to metalsmith ones: an attacher function that receives the plug-in options and returns the function that'll actually handle the processing:

module.exports = function hreflang(/* options */) {
  return function(tree) {
    /* transform the tree here */
  }
}

Let's not forget to add it to the list of remark plug-ins:

/* … */
.use(
  inPlace({
    engineOptions: {
      plugins: [
        require('remark-slug'),
        require('remark-autolink-headings'),
        function() {
          return require('remark-rehype')({
            allowDangerousHtml: true
          });
        },
        require('rehype-raw'),
        require('./rehype/hreflang'),
        require('rehype-stringify')
      ]
    }
  })
/* … */

The syntax tree doesn't have a document.querySelectorAll like the DOM, but the hast-util-select package provides the same feature with its selectAll function.

const {selectAll} = require('hast-util-select');

module.exports = function hreflang({
  selector = 'a[hreflang]'
  } = {}) {
    return function(tree) {
      const linksWithHreflang = selectAll(selector, tree);
      /* Coming soon */
    }
  }

The CSS injecting the language info had a little escape hatch for links we didn't want to show the language: adding the no-hreflang. We can use the the matches function from hast-util-select to allow the same behaviour in the plug-in:

const {selectAll, matches} = require('hast-util-select');

module.exports = function hreflang({
  selector = 'a[hreflang]',
  ignoreSelector = '.no-hreflang'
} = {}) {
  return function(tree) {
    const linksWithHreflang = selectAll(selector, tree).filter(
      link => !ignoreSelector || !matches(ignoreSelector, link)
    );
    /* Coming soon */
  }
}

Adding the language information

We need to loop over those links and add the appropriate span. Constructing a node manually is very possible. A text node representing a space looks like this, for example: {type: 'text', value: ' '}.

The object notation quickly becomes verbose for more complex HTML structure, though. The hastscript package provides a handy shortcut to quickly build hast nodes.

module.exports = function({
  selector = '[hreflang]',
  ignoreSelector = '.no-hreflang',
  className = 'hreflang'
} = {}) {
  return function(tree) {
    const linksWithHreflang = selectAll(selector, tree).filter(
      link => !ignoreSelector || !matches(ignoreSelector, link)
    );
    for (const link of linksWithHreflang) {
      const span = h('span', { class: className }, link.properties.hrefLang);

      // Add a little space
      link.children.push({ type: 'text', value: ' ' });
      // Add the generated span with the language
      link.children.push(span);
    }
  };
};

Last we can update the CSS to not inject extra content anymore and instead style the .hreflang selector appropriately:

.hreflang {
  vertical-align: super;
  font-size: 80%;
}

It's been a bit bumpy, but there we go! With the plug-in in and the CSS, the language hints for the links are now more robust and accessible as they're straight in the HTML. remark and the ecosystem it's part of have been really handy for manipulating structured text like Markdown and HTML. They're not tied to static site generation and can be used in a variety of situations (as a library or CLI). A good one for the toolbox!