HTML is an extremely flexible markup language. No really, you can put <div>
s inside <span>
s inside <p>
tags but that doesn't mean there are no rules. Over the course of many years, browser engines became quite resilient to malformed HTML.
But they weren't ready for Javascript.
The Curious Case of a <pre>
tag
I was working on the Notesnook Web Clipper and testing it out on various different websites when I came across this anomaly on a Hackernoon blog post.
<p class="paragraph">bla bla bla
<pre><code>code</code></pre>
bla bla bla
</p>
You can inspect the actual source by yourself. Anywhere, you see an
inline code
, be sure that it's actually a<pre>
tag.
There's nothing really wrong with this code syntactically. But browsers can't render it as it is, because ideally, <pre>
tags should not be rendered inside <p>
tags. So what do browsers do? They move the <pre>
tag outside loyal to the HTML spec.
<p class="paragraph">bla bla bla</p>
<pre><code>code</code></pre>
bla bla bla
<p></p>
But that doesn't explain how Hackernoon guys were forcing the browser to render the invalid code.
Aside: Whoever runs the blog over at Hackernoon, kudos! But seriously, why not just use
<code>
for inline code instead of<pre>
by forciblydisplay: inline
it using CSS? I am genuinely curious.
Turns out the answer was quite simple: NextJS or more precisely: Javascript.
NextJS Hydration
How NextJS works is that it sends over a raw, static HTML blob which gets rendered by the browser. After the first render, this blob is hydrated and turned into a dynamic monster of a website. This allows you to have all the SEO benefits + a modern web experience. A cool strategy if the result before & after the hydration step looks visually the same.
In the case of Hackernoon, it doesn't.
To confirm my suspicion about Javascript being responsible for this, I manually throttled the network speed and stopped the hydration step. And lo and behold! The browser was working as it should i.e., it was automatically fixing the bad code as expected. So for sure the magic was somewhere in the client-side Javascript.
I opened the DevTools, switched to the console tab, and just for the heck of it tried putting a <pre>
tag into a <p>
tag. It worked.
const paragraph = document.createElement('p');
const pre = document.createElement('pre');
paragraph.appendChild(pre);
And then I realized, since NextJS is based on React which renders everything via Javascript where this restriction doesn't exist; this was happening automatically. I bet the folks over at Hackernoon don't even realize what's going on.
Does this work for all tags?
Next thing, I tried putting a <p>
tag inside <br>
. It failed of course but I received no errors. This was, however, expected because <br>
tag doesn't have children.
But what about putting <pre>
inside <script>
?
const script = document.createElement('script');
const pre = document.createElement('pre');
script.appendChild(pre);
console.log(script.outerHTML);
// Result: '\x3Cscript><pre></pre>\x3C/script>'
(Strange how the <
characters are printed as \x3C
).
There's no way to render HTML inside <script>
tag directly, right? This shouldn't work & it doesn't because everything inside <script>
is supposed to be Javascript, right?!
<body>
<script>
<pre>hello</pre>;
</script>
</body>
But what if you append a <script>
that contains a <pre>
inside it via Javascript? You get this.
This means you can write Javascript inside a <pre>
tag and put it inside a <script>
tag...and it will get treated like a normal <pre>
tag by the browser. Crazy.
So what?
Okay yes, this isn't ground breaking but it is annoying. I can live with it but it raises a few questions:
- Why not add checks to prevent this?
- What else might be living around in browsers that can be similarly twisted & turned?
- Is this intended? And if so, why prevent it on the inital render?
Maybe someone can reasonably answer those but one thing is for sure: Javascript adds a baffling amount of uncertainty & unpredictability to the web.