til / Using generate_series to avoid gaps in data in Postgres
To create statistics, I started collecting information whenever a user views a post. The data contains:
- When the view occurred (
created_at
) - Which post it was (referenced to the
post
table) - The user agent
With this information, I can create some nice graphs, see which browser and operating systems the user has, and create top lists. No information that contains personal data, I don’t want to run into any GDPR issues. 😅
Since I currently don’t have that much data, I needed to pad it to create the graphs for the last 30 days and the current year. To achieve this, I used Postgres’s generate_series
function to create the dates and then fill them with the data I have.
-- Generate a series from the current date - 30 days until the current
-- date with an interval of one day. The time part becomes 00:00:00 and
-- is not needed, so we cast the values to DATE to remove the time.
-- The result is: 2022-05-11, 2022-05-22, ..., 2022-06-10,
SELECT generate_series (
CURRENT_DATE - '30 days'::INTERVAL,
CURRENT_DATE,
'1 day'
)::DATE
Now that I have a series, I can combine it with the actual data by matching the generated date with the date in the created_at
column of my data.
-- Generate a temporary table with the dates of the last 30 days
WITH days AS (
SELECT generate_series (
CURRENT_DATE - '30 days'::INTERVAL,
CURRENT_DATE,
'1 day'
)::DATE AS day
)
-- Join the dates with our actual data by matching with the date
-- when the view was created.
-- Select the day and the number of views based
-- on the number of IDs when grouping the views by day.
-- Order the list ascending by the date.
SELECT days.day, COUNT(v.id)
FROM days
LEFT JOIN views AS v ON date_trunc('day', created_at) = days.day
GROUP BY 1
ORDER BY 1 ASC
With this I get data that is perfect for the graphs.
day | count |
---|---|
2022-05-11 | 0 |
28 dates… | 0 |
2022-06-10 | 10 |