Home Writing

til / using generate_series to avoid gaps in data in Postgres

To create statistics, I started collecting information whenever a user views a post. The data contains:

  • When the view occurred (created_at)
  • Which post it was (referenced to the post table)
  • The user agent

With this information, I can create some nice graphs, see which browser and operating systems the user has, and create top lists. No information that contains personal data, I don’t want to run into any GDPR issues. 😅

Since I currently don’t have that much data, I needed to pad it to create the graphs for the last 30 days and the current year. To achieve this, I used Postgres’s generate_series function to create the dates and then fill them with the data I have.

-- Generate a series from the current date - 30 days until the current
-- date with an interval of one day. The time part becomes 00:00:00 and
-- is not needed, so we cast the values to DATE to remove the time.
-- The result is: 2022-05-11, 2022-05-22, ..., 2022-06-10,
SELECT generate_series (
  CURRENT_DATE - '30 days'::INTERVAL,
  CURRENT_DATE,
  '1 day'  
)::DATE

Now that I have a series, I can combine it with the actual data by matching the generated date with the date in the created_at column of my data.

-- Generate a temporary table with the dates of the last 30 days
WITH days AS (
  SELECT generate_series (
    CURRENT_DATE - '30 days'::INTERVAL,
	CURRENT_DATE,
	'1 day'
  )::DATE AS day  
)  

-- Join the dates with our actual data by matching with the date
-- when the view was created.

-- Select the day and the number of views based
-- on the number of IDs when grouping the views by day.
-- Order the list ascending by the date.
SELECT days.day, COUNT(v.id)  
FROM days
LEFT JOIN views AS v ON date_trunc('day', created_at) = days.day  
GROUP BY 1
ORDER BY 1 ASC

With this I get data that is perfect for the graphs.

day count
2022-05-11 0
28 dates… 0
2022-06-10 10

  • Loading next post...
  • Loading previous post...