jq, but for HTML
html-query is a small utility I wrote to simplify extracting structured data from HTML documents, in a similar vein to
jq
. It uses CSS selectors to extract data from HTML documents and outputs JSON.
I wanted to experiment with webassembly, so I wrote it in Rust and compiled it to wasm to provide an in-browser playground: https://orf.github.io/html-query/
Example - extracting hacker news post titles
Running:
curl https://news.ycombinator.com | \
hq '{posts: .athing | [ {title: .titleline > a, url: .titleline > a | @(href)} ] }'
Would output a JSON document like so:
{
"posts": [
{
"title": "Database Fundamentals",
"url": "https://tontinton.com/posts/database-fundementals/"
},
{
"title": "I bricked my Christmas lights",
"url": "https://www.whizzy.org/2023-12-14-bricked-xmas/"
},
...
]
}