Code Review Videos > Postgres > How to Count and Group JSON Data in PostgreSQL Using FILTER

How to Count and Group JSON Data in PostgreSQL Using FILTER

Today I needed a query that could take some nested JSON data, a column ID, and then create a single row for each of these distinct column ID’s to show some summary numbers of different nested JSON values.

I’ve horribly explained that, and so you can imagine just how much fun I had writing the query. SQL is always harder if you can’t quite articulate in a sentence or two exactly what it is that you’re trying to display.

Anyway, words fail me, so hopefully a code based example will clear it up much more definitively.

Setting Up Our Demo Environment

I’m going to use Docker to quickly spin up a throwaway Postgres DB. For this I will use docker-compose:

version: '3.9'
services:
  db:
    image: postgres:15.2
    ports:
      - "5495:5432"
    environment:
      POSTGRES_DB: app_db
      POSTGRES_USER: app_user
      POSTGRES_PASSWORD: my_top_secret_password
    volumes:
      - "./init.sql:/docker-entrypoint-initdb.d/init.sql"Code language: JavaScript (javascript)

These contents go into a docker-compose.yaml file.

This docker compose file defines a single service named “db”. The service runs a Postgres database version 15.2 using the official Postgres Docker image.

The service exposes port 5495 on the host machine (your local machine), which is mapped to the default Postgres port 5432 in the container. This allows you to access the database from your PC.

The environment variables should hopefully be self explanatory, but they are defined by the official postgres docker image, and I am using the explicit config above to override them to my own values.

The volumes section maps the local file “./init.sql” to the container directory “/docker-entrypoint-initdb.d/init.sql“. This allows you to run any SQL scripts in that file when the container is started up, which is very useful for initialising the database schema and loading our demo data.

Here’s the corresponding init.sql file:

create table some_form_data (
  id serial primary key,
  category_id varchar(6) not null,
  submission_response jsonb
);

insert into some_form_data (category_id, submission_response)
values
  ('CAT001', '{"age": 35, "has_children": true}'),
  ('CAT002', '{"age": 42, "has_children": false}'),
  ('CAT001', '{"age": 25, "has_children": true}'),
  ('CAT003', '{"age": 30, "has_children": true}'),
  ('CAT001', '{"age": 50, "has_children": false}'),
  ('CAT002', '{"age": 28, "has_children": true}'),
  ('CAT003', '{"age": 21, "has_children": false}'),
  ('CAT002', '{"age": 39, "has_children": true}'),
  ('CAT003', '{"age": 29, "has_children": true}'),
  ('CAT001', '{"age": 47, "has_children": true}');Code language: JavaScript (javascript)

You can then spin this up with docker-compose up.

I’m going to use JetBrains DataGrip to connect, and I documented how to do this over on this post. So if you’re unsure, check that out.

Counting and Grouping JSON Data in PostgreSQL Using FILTER

OK, so we have our demo database up and running.

Imagine, if you will, that the some_form_data contains form submission data, where the category ID is a known quantity, but the submission_response is built up using a wizard, and so the resulting data is stored off as JSON for … well, simplicity I guess. I don’t know. I inherited the database, I didn’t create it.

What I needed to do was get a report where for each CATEGORY_ID, I wanted something like this:

Category IDover_30_with_childrenunder_30_without_children
CAT00120
CAT00211
CAT00301

And it took a lot of frustration on my part to figure out how to do this.

I was trying joins and CTEs, and all kinds of stuff.

But after taking a pause and reading the manual a bit further, I stumbled across this bit:

postgres manual filter where clause

Now, not only did that look like exactly what I was trying to achieve, it massively simplified my “work in progress” / horrible query.

From there I got the query down to this:

select
  category_id,
  count(*) filter (where
      submission_response->>'has_children' = 'true'
          and
      submission_response->>'age' >= '30'
  ) as over_30_with_children,
  count(*) filter (where
      submission_response->>'has_children' = 'false'
          and
      submission_response->>'age' < '30'
  ) as under_30_without_children
from some_form_data
group by category_id;Code language: SQL (Structured Query Language) (sql)

That actually looks messier than it was, because I’ve formatted it above so it doesn’t have a horizontal scroll.

For me it looked like:

select
  category_id,
  count(*) filter (where submission_response->>'has_children' = 'true' and submission_response->>'age' >= '30') as over_30_with_children,
  count(*) filter (where submission_response->>'has_children' = 'false' and submission_response->>'age' < '30') as under_30_without_children
from some_form_data
group by category_id;Code language: SQL (Structured Query Language) (sql)

Which, considering what I had, is a massive improvement.

The FILTER clause is used in conjunction with an aggregate function like COUNT, SUM or AVG to selectively include or exclude certain rows from the resulting aggregate value.

The basic syntax of a FILTER clause looks like this:

<code>COUNT(*) FILTER (WHERE some_condition)
</code>Code language: SQL (Structured Query Language) (sql)

COUNT(*) is the aggregate function being used.

some_condition would be something that resolves to a boolean value, which we can use to determine whether a given row should be included in the count.

Ultimately from the query above I got this:

postgres select count filter where output

It might not seem like much, but that’s the thing with SQL. When you finally get the outcome you want, the query often doesn’t look like that much at all. But all the hard work goes into refining the query to get exactly what you want. That’s the hard part. Or at least, it is for me.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.