I want to share with you some of the capabilities that postgresql offer. I hope that you would like it and please feel free to share your feedback.
I’ve break down some of the features that help me during my work for the past couple of years, from search by tags, location, time series and etc… I think of this article as a quick cookbook for using postgres.
Let’s say you build an app that helps you find the best restaurants in town and you would like to add a filter to the restaurant tags.
The schema should look something like this:
restaurant_id (PK) | Location {lat, lan} | tags {STRING} | …
insert into “restaurants” (“location”, tags”) values (‘{ “lat”: 12.12, “lng”: 12.12 }’, ‘{“sushi”}’);
Create will return the restaurant_id
*ignore the location column, I’ll cover it in a different post since location search is a big subject by itself.
Update: Now we would like to add a tag but keep the old tags, so we can use array_cat or ||
UPDATE “restaurants” SET tags = array_cat(topics, ‘{sea food, japanese}’);
UPDATE “restaurants” SET tags = topics || ‘{sea food, japanese, rice}’;
Remove: We realize that “rice” as tags doesn’t make sense let’s remove it.
UPDATE “restaurants” SET tags = array_remove(tags, ‘rice’);
Replace: We also realize we want better performance and would like to change ‘sea food’ to ‘seafood’
UPDATE “restaurants” SET tags = array_remove(tags, ‘sea food’) || ‘{seafood}’ WHERE topics @> ARRAY[‘sea food’];
Search: now you would like to filter the results by tags we can just use this query.
SELECT * FROM “restaurants” WHERE ‘sushi’ = ANY (tags)
Another way for searching is to search by specific tag and filter out values by another tag.
SELECT * FROM “restaurants” WHERE ‘romantic’ = ANY (tags) and ‘sushi’ != ALL (tags)
In the last query we will get restaurants that tag as romantic but not tag as sushi
Today IOT is defiantly a HOT term! storing your data in a time series format is crucial for success. We can find time series data everywhere from smart house devices up to self driving cars that send their data such as battery consumption, location, logs, etc…
In most cases before designing my data schema I think about the usage, meaning how what’s the trade-off between read/write is it 50/50 or 80/20 mostly write to the database.
This will effect how we design the schema, the indexes that we should use and how we’re going to shared our data across the databases.
I design this schema to be very generic and to be able to solve most of the issues with timeseries, so it might not fit to your problem 100% so feel free to customize the schema to be more fitted to your needs but the basic idea is here:
A schema example is:
For the timestamp column, the data type should be timestamp without time zone this mean you’ll have to make sure that every timestamp you inserting to your database is a UTC timestamp and NOT GMT.
Our database should be looking like this:
We’re able to search by date and filter by source and type.
For Example:
SELECT * FROM timeseries WHERE SOURCE=1 AND timestamp between ‘2017–01–01T00:00:00.000Z’ and ‘2017–02–01T00:00:00.000Z’ AND TYPE=’battery’
That’s a pretty basic and simple query and you might want to break your query to specific time interval for example break the data to 10 min interval (meaning equal time gap), We can use number of postgres native function to achieve it:
to_timestamp: convert string to time stamp
extract: extract(field from timestamp)
epoch: For date and timestamp values, the number of seconds since 1970–01–01 00:00:00–00
An example from Postgres site
SELECT EXTRACT(EPOCH FROM timestamptz ‘2013–07–01 12:00:00’) -
EXTRACT(EPOCH FROM timestamptz ‘2013–03–01 12:00:00’);
Result: 10537200
time series query example, let’s say for example we want to print a battery report for a 10 min interval for a self driving car and let’s say that we have a fleet of self driving car, We would like to view the logs of car id 1 and view the battery data in a 10 min interval between a specific date range.
SELECT DISTINCT ON (time_series) time_series, data, type FROM (SELECT TO_TIMESTAMP((EXTRACT(EPOCH FROM timestamp) / (600))::int * 600) AS time_series, data, timestamp, type
FROM timeseries WHERE source=1 AND type = ‘battery’
AND timestamp BETWEEN ‘2017–01–01T00:00:00.000Z’ and ‘2017–01–25T00:10:00.000Z’) timeseries ORDER BY time_series, ABS(EXTRACT(EPOCH FROM timestamp) — EXTRACT(EPOCH FROM time_series)) limit 1000
Wait what?!? don’t worry let’s explain what’s happen in this query?!
I’ll start from the inner query
SELECT TO_TIMESTAMP((EXTRACT(EPOCH FROM timestamp) / (600))::int * 600)
This will return a list of dates break to 600 seconds interval (our 10 min interval)
AS time_series, data, timestamp, type
FROM timeseries WHERE source=1 AND type = ‘battery’
Here we define the columns and filter the data by the source to be 1 and the type.
AND timestamp BETWEEN ‘2017–01–01T00:00:00.000Z’ and ‘2017–01–25T00:10:00.000Z’)
Set the date range
timeseries ORDER BY time_series, ABS(EXTRACT(EPOCH FROM timestamp) — EXTRACT(EPOCH FROM time_series)
this will order our result by time_series. This part will make sure we get the "closest" time to our time interval, meaning it will choose timestamp=9:59:59 instead of choosing timestamp=10:02:00 for the 10:00 time interval.
Than the outer query will distinct the result by our time_series column.
If you like it please comment, share or just press the like button.
Also, if you find a bug or have question feel free to comment, or ping me.
cheers,
Doron.