Revisiting the language hints
The web gets written in multiple languages. Knowing you're about to click on a link that sends you to a page in another language really helps navigation.
Thomas, a good friend of mine, published recently an article explaining how to provide language hints with CSS (fr). Following a discussion on twitter, it turns out injecting content with CSS causes accessibility issues (thanks for pointing it out Julie (fr)). Users with custom stylesheets for their needs might override the content generated by CSS and miss out on the information. As I'm using a very similar technique, there was need to update the site again.
While client-side JavaScript would have allowed to inject some HTML with the content, making it more accessible. It had the same brittleness issue as the CSS, though: if the JavaScript doesn't get loaded, no content. Instead, I preferred looking at generating those hints at build time.
Embracing abstract syntax trees
Presented with a heap of Markdown, the first step will be to find all the links with an hreflang
attribute. CSS and client-side JavaScript can use the a[hreflang]
selector to spot them. Working with remark
, already in place for rendering the content, we can do the same for processing the Markdown.
This tool turns the Markdown text into a tree of JavaScript objects (an abstract syntax tree, or AST). A bit like an HTML document is turned into the DOM in the browser, but each node representing a piece of Markdown: a Link
, a Heading
… The tree can then be traversed and the nodes transformed with plug-ins.
Working with an AST makes the process much more robust than processing the text itself, via a regular expression for example. The object structure removes the need to care for whether the link has other attributes before or after the hreflang
, possibly HTML content inside, maybe with line breaks. We can focus on finding links with an hreflang
in its properties
and adding a final node to their children
.
Converting the tree the tree
Actually, we want to find all the <a>
tags, not just the [Markdown links]
and then inject a <span>
. These are HTML concepts, not Markdown ones, and won't appear in remark
's tree. But we can convert it to one representing HTML, thanks to remark
being part of an ecosystem of tools for manipulating text, structured like HTML and Markdown, or not.
This is the role of two plug-ins:
remark-rehype
will turn the MarkdownLink
s into<a>
nodesrehype-raw
will take care of any<a>
written as inline HTML
Those need to be added to the pipeline of remark
plug-ins currently processing the Markdown. Because of jstransformer-remark
's API for receiving plug-ins, there's a little gymnastic with functions to do for configuring remark-rehype
. Without the allowDangerousHtml
option inline <a>
tags wouldn't be detected afterwards by rehype-raw
.
This brings the pipeline to (after installing the two packages with npm i remark-rehype rehype-raw
):
/* … */
.use(
inPlace({
engineOptions: {
plugins: [
require('remark-slug'),
require('remark-autolink-headings'),
function() {
return require('remark-rehype')({
allowDangerousHtml: true
});
},
require('rehype-raw')
]
}
})
/* … */
At this stage, the resulting HTML is a bit wrecked though, having lost all its semantics. This is due to the jstransformer-remark
glue used for processing, so let's set to fix that.
Patching an NPM module
So it transforms Markdown to HTML out of the box, jstransformer-remark
adds by default the remark-html
plug-in after all the plug-ins provided by users. However, we're now feeding it a tree representing HTML instead of Markdown and it's all lost.
Converting back to a Markdown tree, with rehype-remark
, loses a good few classes and attributes, unfortunately. Instead, we'll be looking at patching jstransformer-remark
to remove the addition of remark-html
.
The code being inside node_modules
, any change would be wiped when we reinstall the project and not saved by Git. Enters patch-package
, which will let us save the changes as a patch and apply them after each install.
It's perfect for making one-off little change to a small library, much lighter than forking the project. It can still be nice to raise an issue on the original package to see if the change could be integrated to it and remove the need for the patch in the future.
Once patch-package
is installed with npm i -D patch-package
, we can go and edit the file from jstransformer-remark
, in node_modules/jstransformer-remark/index.js
:
/* … */
// plugins.push(html)
/* … */
Once it's done, running npx patch-package
will create the patch, comparing our edits with a clean version, and store it inside a patches
folder.
Remains to apply this patch after each install. This can be done through a postinstall
script in the package.json
file. It'll run patch-package
, which will apply the patches in the repository, ensuring that jstransform-remark
is in the shape we need it to be.
remark-html
was responsible for rendering the tree back into a string of HTML. Now we're removed it, we need to add that back. rehype-stringify
, added at the end of the plug-in list, will take care of that.
Building the plug-in
Everything is now ready for creating our own plug-in in src/rehype/hreflang.js
. remark/rehype plug-ins have a very similar structure to metalsmith ones: an attacher
function that receives the plug-in options and returns the function that'll actually handle the processing:
module.exports = function hreflang(/* options */) {
return function(tree) {
/* transform the tree here */
}
}
Let's not forget to add it to the list of remark plug-ins:
/* … */
.use(
inPlace({
engineOptions: {
plugins: [
require('remark-slug'),
require('remark-autolink-headings'),
function() {
return require('remark-rehype')({
allowDangerousHtml: true
});
},
require('rehype-raw'),
require('./rehype/hreflang'),
require('rehype-stringify')
]
}
})
/* … */
Picking the links
The syntax tree doesn't have a document.querySelectorAll
like the DOM, but the hast-util-select
package provides the same feature with its selectAll
function.
const {selectAll} = require('hast-util-select');
module.exports = function hreflang({
selector = 'a[hreflang]'
} = {}) {
return function(tree) {
const linksWithHreflang = selectAll(selector, tree);
/* Coming soon */
}
}
The CSS injecting the language info had a little escape hatch for links we didn't want to show the language: adding the no-hreflang
. We can use the the matches
function from hast-util-select
to allow the same behaviour in the plug-in:
const {selectAll, matches} = require('hast-util-select');
module.exports = function hreflang({
selector = 'a[hreflang]',
ignoreSelector = '.no-hreflang'
} = {}) {
return function(tree) {
const linksWithHreflang = selectAll(selector, tree).filter(
link => !ignoreSelector || !matches(ignoreSelector, link)
);
/* Coming soon */
}
}
Adding the language information
We need to loop over those links and add the appropriate span. Constructing a node manually is very possible. A text node representing a space looks like this, for example: {type: 'text', value: ' '}
.
The object notation quickly becomes verbose for more complex HTML structure, though. The hastscript
package provides a handy shortcut to quickly build hast
nodes.
module.exports = function({
selector = '[hreflang]',
ignoreSelector = '.no-hreflang',
className = 'hreflang'
} = {}) {
return function(tree) {
const linksWithHreflang = selectAll(selector, tree).filter(
link => !ignoreSelector || !matches(ignoreSelector, link)
);
for (const link of linksWithHreflang) {
const span = h('span', { class: className }, link.properties.hrefLang);
// Add a little space
link.children.push({ type: 'text', value: ' ' });
// Add the generated span with the language
link.children.push(span);
}
};
};
Last we can update the CSS to not inject extra content anymore and instead style the .hreflang
selector appropriately:
.hreflang {
vertical-align: super;
font-size: 80%;
}
It's been a bit bumpy, but there we go! With the plug-in in and the CSS, the language hints for the links are now more robust and accessible as they're straight in the HTML. remark
and the ecosystem it's part of have been really handy for manipulating structured text like Markdown and HTML. They're not tied to static site generation and can be used in a variety of situations (as a library or CLI). A good one for the toolbox!