Home Writing

til / using generate_series to avoid gaps in data in Postgres

To create statistics, I started collecting information whenever a user views a post. The data contains:

  • When the view occurred (created_at)
  • Which post it was (referenced to the post table)
  • The user agent

With this information, I can create some nice graphs, see which browser and operating systems the user has, and create top lists. No information that contains personal data, I don’t want to run into any GDPR issues. 😅

Since I currently don’t have that much data, I needed to pad it to create the graphs for the last 30 days and the current year. To achieve this, I used Postgres’s generate_series function to create the dates and then fill them with the data I have.

1-- Generate a series from the current date - 30 days until the current
2-- date with an interval of one day. The time part becomes 00:00:00 and
3-- is not needed, so we cast the values to DATE to remove the time.
4-- The result is: 2022-05-11, 2022-05-22, ..., 2022-06-10,
5SELECT generate_series (
6  CURRENT_DATE - '30 days'::INTERVAL,
8  '1 day'  

Now that I have a series, I can combine it with the actual data by matching the generated date with the date in the created_at column of my data.

 1-- Generate a temporary table with the dates of the last 30 days
 2WITH days AS (
 3  SELECT generate_series (
 4    CURRENT_DATE - '30 days'::INTERVAL,
 6	'1 day'
 7  )::DATE AS day  
10-- Join the dates with our actual data by matching with the date
11-- when the view was created.
13-- Select the day and the number of views based
14-- on the number of IDs when grouping the views by day.
15-- Order the list ascending by the date.
16SELECT days.day, COUNT(v.id)  
17FROM days
18LEFT JOIN views AS v ON date_trunc('day', created_at) = days.day  

With this I get data that is perfect for the graphs.

day count
2022-05-11 0
28 dates… 0
2022-06-10 10

  • Loading next post...
  • Loading previous post...