This content was deleted by the author. You can see it from Blockchain History logs.

Quickly Import a Postgres Production Database to your local psql docker environment.

Now I know you are thinking. "Could he make the title any longer?"

Yes. Yes, I could.

The full title should in fact be:
"How to export a production Postgres database using pg_dump and then import it into your Laravel Sail and Psql docker environment for local testing with pg_restore."


Source

The problem


Now this might just be a "Me" problem but I don't like dealing with the database and I do shy away from CLI heavy operations. So it is fair to say that convenience comes at a cost.

As a person is developing an application you want to be able to have access to realistic data sets, edge cases happen and the closer your local data is to the data on the live site. The better and quicker you can debug any issues that might arise.

So aside from backup purposes, using a replica of the live database allows you to better test and code for edge cases and it is just better, ok. Dammit.

For the longest time I have been using pgAdmin to export the database to SQL and that is my backup. Fair enough, it works.

Now without writing any fancy import script myself, I would use HeidiSql to then import that exported file. Sure the DB is small 1mb, took about 30minutes.

I did not know better.

The DB is now at 7mb , and close to 50 000 rows and takes 2 hours to import via HeidiSql.

So I just import fresh data less often.

What if production went down and I needed to restore the database? Would that take 2 hours, eventually longer, data grows quickly and when it does with how I was doing it to the best of my knowledge, it would take almost a day to restore.

Obviously eventually you have better fail-safes etc. So for all intents and purposes just know that you should always be looking at how you can further optimize and improve your workflow when running applications.

One of those steps for me was finally taking the evening and figuring out how to speed up the process.

Turns out that the pgAdmin SQL dump is a text format, this means that Heidi (gui) and cli tools like psql will use insert to create the data on a fresh database.

the SQL INSERT command is extremely slow and has to write each row one by one. There are ways to improve it, and reading through this article on the Postgres Docs - www.postgresql.org/docs/10/populate.html. I realised that I was not intending on writing my own custom importer.

COPY is faster than INSERT

One of the biggest things I read in that article, was that COPY was faster than INSERT on large data sets. I equate this to copying a 4gb zip is quicker than copying 2000 small files of equal size.

The same idea applies, since read and writes need to be closed off each time. So if you only have one then there is less time wasted finalizing operations.

Please note a lot of how I phrase things and explain is based on my own opinion and how well I may or may not understand something. Feel free to correct me.

pgAdmin dump/export is a text file

Now for all intents and purposes the .sql I was getting from pgAdmin is perfect. If you were writing your own importer I think the tips from that Postgres docs article would be great and you could implement them all.

Why their tip on using COPY stuck with me is, because COPY works on tables. So as such you would need to loop the .sql file and read the tables one by one and copy them over. I might attempt to write my own in the future but many tutorials keep mentioning pg_dump and pg_restore and I want fast and convenient.

Why pg_dump

Because it exports the data and writes the instructions for how to restore a database in the most efficient way;

pg_dump already implements most of the tips from the populate article above. Its counterpart pg_restore then is able to use the exported instructions to quickly restore the database in a fraction of the time it takes a normal import from say HeidiSql or the Psql utility.

My import of 2 hours before took only about a minute when I changed to using that combination. So all of this is just my experience, I guess results may vary.



Exporting and importing your database

Ok so apart from a example in using pg_dump and pg_restore this is also specific to how I did it using Laravel Sail with Postgres (pgsql image), which is a docker compose virtual environment.

I am also using WSL inside windows 10 with docker desktop to run everything, apart from the live data which is on a standard VPS.


First Dump your Data

You will want to log in to your production server, and if you don't have a specific location you keep backup files then make a folder for that. Say we make a "mybackups" folder in your home directory of the user you are using, then you can always dump to that location.

The cli command you will run in your terminal:
pg_dump -U userName -d databaseName -h 127.0.0.1 -Ft > /mybackups/myfilename.tar.gz

To break it down, pg_dump is the utility you are using, the -U userName flag is the name of the user that has permission to access the database, -d databaseName is the database you want to export and -h 127.0.0.1 is the hosts IP.

Adding the host IP avoids the issue where it will silently fail, and this will instruct pg_admin to prompt you for the password of the postgres user that has permissions on the database.

The last bit -Ft > /mybackups/myfilename.tar.gz sets the FORMAT -F to tar, and I also name the export file appropriately. There are multiple formats available and you should check docs to see what suits you.

Using an archive format like .tar also removes the ability to add flags such as --no-owner or --clean which we will instruct pg_restore to do instead.

Run the command and enter the password when prompted, you should then be able to find the file in the location you specified.

Copy the dump over to the PGSQL container

Copy the file from the server to your local machine, and place it in a location you prefer. I copied mine via the scp tool inside my WSL to a folder and then move it one to the root of my Laravel install and added that folder to the .gitignore so it won't be committed.

scp username@[remoteIP]:~/mybackupfolder/backupfile /destination/directory/onlocal

Use a period if you are already in the destination directory to download to the same location.

Then you can optionally move or copy the file into your laravel root or inside another folder. It is best to have it in the project because we will need to mount it as a volume for the pgsql container to use.
mv backupFile.tar.gz /laravelProject/backups/mackupFile.tar.gz


Accessing the backup inside the PGSQL container

Laravel Sail uses docker-compose to orchestrate the local environment, to use files inside the containers they need to be accessible. To make them accessible you need to instruct Docker to mount the data into the container.

Inside your Laravel docker-compose file you can add a volume to the pgsql container instructions.

Below the,
volumes:
- 'sailpgsql:/var/lib/postgresql/data'

line add your local folder location and the destination folder on the container as a new volume:

volumes:
- 'sailpgsql:/var/lib/postgresql/data'
- ./folderWithBackups:/var/lib/folderOnContainer

Then spin up your Sail install with sail up -d and then we will get to accessing the Pgsql container.


using psql and pg_restore inside the Pgsql Container with exec

When Sail is up and your site is running, run docker ps -a and copy the ID of the Pgsql container.

Now you could run pg_restore with the

--clean

flag and trust that everything is wiped but I chose to manually remove and recreate the sail database just to be sure.

First we will drop the entire database:
docker exec -it [containerID] psql -U sail -d template1 -c "drop database yourDatabaseName;"

You will find the User and Database name in your .env file for Sail , those are the credentials it uses like you normally connect. The template1 being connected to via the -d flag is a default database inside postgres because you can't drop a database you are connected to. So we connect to template1 then drop "yourDatabase".

Then we recreate the database with:
docker exec -it [containerID] psql -U sail -d template1 -c 'create database [yourDatabaseName] with owner sail;'

We create the database with the owner of sail and the same name as that in the .env file. This means the permissions everything should still be the same for Laravel to operate just fine.


Finally pg_restore

docker exec -it [containerId] pg_restore -U sail -d yourDatabaseName --clean --no-owner -Ft /locationOnContainer/toYourFile.tar.gz

Now, you execute pg_restore on the file that is inside the folder you mounted earlier. The flags used are very similar and in many cases exactly the same as with pg_dump the only difference when using a -F format of t .tar which is an archive, the pg_dump ignores flags such as --clean and --no-owner

I use --clean to tell pg_restore to remove any old data and overwrite with the new data, because we dropped first that is a non-issue but you don't need to drop the database beforehand if you use that option.

I use --no-owner because I want my local user to own all the data in the database and not have issues with permissions.

Once you have the workflow down, or maybe wrote a shell script to run it for you , restoring a local copy or even maybe a production database should go a hell of a lot quicker.