10 May 2021 - 20:42

After I have built the site, the next step was checking performance and security. The log, besides being my learning notebook, is also a test-bed for my experiments.

What is CSP? #

An HTTP header for fine-grained control over where resources are loaded from. By employing content-security-policy, we can eliminate almost all XSS (Cross Site Scripting) attacks. Read further for why XSS is a problem.

May 6 — GitHub Pages #

GitHub pages does not let us specify HTTP headers. One way is to include <meta http-equiv="Content-Security-Policy" content="..."> as first child of <head>. Yet, netlify lets us set our response headers beyond other goodies, so I skipped ahead. Out of the box the grade is a D.

May 8 — Netlify (A false hope) #

Setting up a site on netlify from GitHub is trivial. Point to your repository, enter your build command and publish directory. Done.

Our response headers are in _headers file at the root of publish directory. Mozilla suggests starting with default-src 'none'; img-src 'self'; script-src 'self'; style-src 'self' for CSP.

base-uri - restrict URLs for <base>

Directive	Feature
`default-src`	default policy for allowed fallback sources
`img-src`	for images
`font-src`	for fonts
`script-src`	for scripts, i.e. JavaScript
`style-src`	for styles, i.e. CSS
`frame-src`	for iframes
`connect-src`	e.g. XMLHttpRequest, WebSocket
`object-src`	for plugins, e.g. Flash, Silverlight

'none' - nothing

'self' - same site
https://example.com/external.js - specific external resource

https: - only HTTPS
'unsafe-hashes' - only code in event handler attributes, e.g. onclick
'unsafe-inline' - only inline blocks
'unsafe-eval' - `eval,

I began with something like what Mozilla suggests, extended to allow CDNs for third party scripts. All <script>s required to have a hash or a nonce. A nonce is a cryptographically secure random token per request for a script block. It is impossible for a static site to return them. So we should include sha256 hashes on integrity attributes (SRI — Sub-resource integrity) to ensure they are not tampered. Simple with Hugo templates.

<!-- For inline script blocks -->
{{ with (resources.Get "inline.js" | minify | fingerprint) }}
<script integrity="{{ .Data.Integrity }}">
  {{ .Content | safeJS }}
</script>
{{ end }}
<!-- For external scripts -->
{{ $script := resources.Get "external.js" | minify | fingerprint }}
<script
  src="{{ $script.RelPermalink }}"
  integrity="{{ $script.Data.Integrity }}"
></script>

Result is an A+. Yet, there is a problem. Error log shows that MathJax contains inline scripts and evals. So we are not yet done.

Hugo can highlight code blocks (no highlight.js), can preprocess SCSS (via hugo-extended, no node.js), can minify resources and generate hashes during build. But it can’t yet generate diagrams (mermaid.js) nor typeset math (KaTeX, MathJax). Also, a client side search (FlexSearch in my case) requires JavaScript. Thus, we still need some third party libraries.

May 9 — Two steps back #

I had to add unsafe-inlines and domains of CDNs to restore full functionality, although errors about evals were false flags. B+.

May 10 — Onward! #

I learned strict-dynamic and parsed integrity hashes of all inline and external scripts. It worked in Chrome. Sadly caused many problems on Firefox. Following hours of debugging and reading bug reports, I grasped, though it is supported for years, it is unusable. Since CSP-3 being a working draft, hashes for external scripts are unsupported. Still B+.

'strict-dynamic' - let trusted code blocks to load additional scripts

A Bittersweet Victory #

It seems CSP-2 (current W3C Recommendation) only supports hashes for inline scripts, requiring more fine-grained regexps.

I created a git pre-commit hook to update hashes whenever I commit my site, in PowerShell being on Windows. Search is using ripgrep, -oIN meaning only print matches without filename or line numbers, and -r to modify result by adding single quotes around it. Unique results filtered and joined on a single line, and written to a file.

Where first regexp for all integrity strings, second one filters only inline scripts, and third one listing sources of external scripts.

hugo --minify
(rg -oIN '<script.*?(sha\d{3}-.{43}=)\"' -r '''$1''' public | sort -unique) -join ' ' | out-file -encoding ASCII -noNewline data/script_hash.txt
(rg -oIN '<script.*?(sha\d{3}-.{43}=)\".*?>[^\n<>]+?</script>' -r '''$1''' public | sort -unique) -join ' ' | out-file -encoding ASCII -noNewline data/inline_script_hash.txt
(rg -oIN '<script.*?src=\"?(http.*?\.js)[ \">]' -r '$1' public | sort -unique) -join ' ' | out-file -encoding ASCII -noNewline data/external_script_source.txt

Hugo layout template index.headers is used to generate _headers. Here is only the relevant part for script-src.

script-src 'sha256-aECzxYUJ57J5H6YymaVqtppSpIqD2Z9YAIAZfd/2xMY='
'sha256-MktN23nRzohmT1JNxPQ0B9CzVW6psOCbvJ20j9YxAxA='
'sha256-OBZ1TAxtlr9xf3a+8VMnoX0v39PPCWCsN6DfNkKio/I=' 'self'
https://cdn.jsdelivr.net/npm/mathjax@3.1.4/es5/tex-mml-chtml.js
https://cdn.jsdelivr.net/npm/mermaid@8.9.3/dist/mermaid.min.js;

And the result is A+. Whole content security policy line:

default-src 'none'; base-uri 'self'; manifest-src 'self'; connect-src 'self';
font-src 'self' https://cdn.jsdelivr.net; img-src 'self' data:; script-src
'sha256-aECzxYUJ57J5H6YymaVqtppSpIqD2Z9YAIAZfd/2xMY='
'sha256-MktN23nRzohmT1JNxPQ0B9CzVW6psOCbvJ20j9YxAxA='
'sha256-OBZ1TAxtlr9xf3a+8VMnoX0v39PPCWCsN6DfNkKio/I=' 'self'
https://cdn.jsdelivr.net/npm/mathjax@3.1.4/es5/tex-mml-chtml.js
https://cdn.jsdelivr.net/npm/mermaid@8.9.3/dist/mermaid.min.js; style-src 'self'
'unsafe-inline' https://cdn.jsdelivr.net; object-src 'none'

Demistifying Content Securiy Policy