Speeding up PostgreSQL data loading

postgres(44)

Published in

Programming & Dev

Words

0

Reading

0 min

Listen

Play

2023-09-07 20:03

Greetings Hive minds! I'm a long time PostgreSQL contributor and author specializing in performance tuning. Now that Hive account@mahdiyari has the PostgreSQL based HafSQL in beta, I noticed index building is becoming a serious drag. Thought I'd share my last conference video as possibly helpful: Speedrunning the Open Street Map osm2pgsql Loader

That took the roughly terabyte sized Open Street Map Planet data set and re-tuned everything for NVMe to drop loading time, which on current hardware I now have down to just over 4 hours. All the config changes to PG and Linux are documented. HafSQL's starter postgresql.conf probably needs less shared_buffers and more maintenance_work_mem to speed all its index builds up; few disk GB of max_wal_size first would help too. Hoping to get the full Hive data set running here so I can test myself.

Day job at Crunchy Data is 100% open source work like multi-cloud PostgreSQL. I run a benchmark lab and the Hive blockchain makes a nice sized data set for my upcoming work.