til / using generate_series to avoid gaps in data in Postgres
To create statistics, I started collecting information whenever a user views a post. The data contains:
- When the view occurred (
- Which post it was (referenced to the
- The user agent
With this information, I can create some nice graphs, see which browser and operating systems the user has, and create top lists. No information that contains personal data, I don’t want to run into any GDPR issues. 😅
Since I currently don’t have that much data, I needed to pad it to create the graphs for the last 30 days and the current year. To achieve this, I used Postgres’s
generate_series function to create the dates and then fill them with the data I have.
1-- Generate a series from the current date - 30 days until the current 2-- date with an interval of one day. The time part becomes 00:00:00 and 3-- is not needed, so we cast the values to DATE to remove the time. 4-- The result is: 2022-05-11, 2022-05-22, ..., 2022-06-10, 5SELECT generate_series ( 6 CURRENT_DATE - '30 days'::INTERVAL, 7 CURRENT_DATE, 8 '1 day' 9)::DATE
Now that I have a series, I can combine it with the actual data by matching the generated date with the date in the
created_at column of my data.
1-- Generate a temporary table with the dates of the last 30 days 2WITH days AS ( 3 SELECT generate_series ( 4 CURRENT_DATE - '30 days'::INTERVAL, 5 CURRENT_DATE, 6 '1 day' 7 )::DATE AS day 8) 9 10-- Join the dates with our actual data by matching with the date 11-- when the view was created. 12 13-- Select the day and the number of views based 14-- on the number of IDs when grouping the views by day. 15-- Order the list ascending by the date. 16SELECT days.day, COUNT(v.id) 17FROM days 18LEFT JOIN views AS v ON date_trunc('day', created_at) = days.day 19GROUP BY 1 20ORDER BY 1 ASC
With this I get data that is perfect for the graphs.