til/using generate_series to avoid gaps in data in Postgres
To create statistics, I started collecting information whenever a user views a post. The data contains:
- When the view occurred (
- Which post it was (referenced to the
- The user agent
With this information, I can create some nice graphs, see which browser and operating systems the user has, and create top lists. No information that contains personal data, I don't want to run into any GDPR issues. 😅
Since I currently don't have that much data, I needed to pad it to create the graphs for the last 30 days and the current year. To achieve this, I used Postgres's
generate_series function to create the dates and then fill them with the data I have.
-- Generate a series from the current date - 30 days until the current -- date with an interval of one day. The time part becomes 00:00:00 and -- is not needed, so we cast the values to DATE to remove the time. -- The result is: 2022-05-11, 2022-05-22, ..., 2022-06-10, SELECT generate_series ( CURRENT_DATE - '30 days'::INTERVAL, CURRENT_DATE, '1 day' )::DATE
Now that I have a series, I can combine it with the actual data by matching the generated date with the date in the
created_at column of my data.
-- Generate a temporary table with the dates of the last 30 days WITH days AS ( SELECT generate_series ( CURRENT_DATE - '30 days'::INTERVAL, CURRENT_DATE, '1 day' )::DATE AS day ) -- Join the dates with our actual data by matching with the date -- when the view was created. -- Select the day and the number of views based -- on the number of IDs when grouping the views by day. -- Order the list ascending by the date. SELECT days.day, COUNT(v.id) FROM days LEFT JOIN views AS v ON date_trunc('day', created_at) = days.day GROUP BY 1 ORDER BY 1 ASC
With this I get data that is perfect for the graphs.