html-query Github Icon orf/html-query ( Github Stars 596 )

jq, but for HTML

html-query is a small utility I wrote to simplify extracting structured data from HTML documents, in a similar vein to jq. It uses CSS selectors to extract data from HTML documents and outputs JSON.

I wanted to experiment with webassembly, so I wrote it in Rust and compiled it to wasm to provide an in-browser playground: https://orf.github.io/html-query/

Example - extracting hacker news post titles

Running:

curl https://news.ycombinator.com | \
  hq '{posts: .athing | [ {title: .titleline > a, url: .titleline > a | @(href)} ] }'

Would output a JSON document like so:

{
  "posts": [
    {
      "title": "Database Fundamentals",
      "url": "https://tontinton.com/posts/database-fundementals/"
    },
    {
      "title": "I bricked my Christmas lights",
      "url": "https://www.whizzy.org/2023-12-14-bricked-xmas/"
    },
    ...
  ]
}