Hive Pressure #3: Catching up with the head block.

Basic Hive node has a very simple configuration and with minor changes it can serve as a seed node, a witness node, a broadcaster node, a private node for your wallet (that’s what exchanges are using) or even a simple API node for your Hive microservices.

Regardless of its role, as long as a node has unrestricted network access, it will be part of the Hive p2p network, thus supporting Hive reliability and resilience.

Before your node becomes fully functional, it has to reach the head block of the blockchain.

Get the Hive daemon

Get the blocks

The easy way or the fast way.

  • Sync from the p2p network
    By default, when a fresh Hive node starts, it connects to the Hive p2p network and retrieves blocks from it.
    See: --resync-blockchain

  • Get blocks yourself
    Hive node can use an existing block_log either from another instance or from a public source such as https://gtg.openhive.network/get/blockchain
    Our goal is to reach the head block as soon as possible so we chose that way.
    block_log currently takes over 350GB, so depending on your connection and source, downloading it might take less than an hour or even half a day (for 1Gbps and 100Mbps respectively).
    By default it’s expected to be located at ~/.hived/blockchain/block_log.

Configure your node

Configuration settings are by default in ~/.hived/config.ini
This should be enough:

plugin = witness
plugin = rc

shared-file-dir = "/run/hive"
shared-file-size = 24G

flush-state-interval = 0

Please note that I’m using a custom location for shared_memory.bin file, keeping it on a tmpfs volume for maximum performance, make sure you have enough space there if you are going to use it.

Process the blocks

Having all the blocks is not enough, your node needs to be aware of the current state of Hive.
Live nodes get blocks from the p2p network and process them updating state one block at a time (every three seconds), but when you start from scratch, you have to catch up.

  • Snapshot
    Snapshot is the fastest way because most of the job is already done.
    That however will work only for compatible configurations.
    We will play with snapshots another time.

  • Replay
    Once you have a block_log and config.ini files in place, you need to start hived with --replay-blockchain.
    Replay uses the existing block_log to build up the shared memory file up to the highest block stored there, and then it continues with sync, up to the head block.
    There's very little use of multi-threading here because every block depends on the previous one.
    A lot of data is being processed, so your hardware specs really do matter here.
    Not long ago Hive crossed the 55 millions block mark.
    Let’s see how long does it take to replay that many blocks using different hardware specs.

hived --force-replay --set-benchmark-interval 100000

Test Setups

Alpha

A popular workstation setup. Good enough but will run out of storage soon.

Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
64GB RAM (DDR4 4x16GB 2133MHz)
2x256GB SSD in RAID0 (SAMSUNG MZ7LN256HMJP)

Bravo

Old but not obsolete. CPU released in 2014. New disks after the old ones failed.

Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
32GB RAM (DDR3 4x8GB 1600MHz)
2x480GB SSD in RAID0 (KINGSTON SEDC500M480G)

Charlie

The newest and the most expensive CPU in my list. Also the only AMD.

AMD Ryzen 5 3600
64GB RAM (DDR4 4x16GB 2666MHz)
2x512GB NVMe in RAID0 (SAMSUNG MZVLB512HBJQ)

Delta

My favorite, high quality components for serious tasks.

Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz
64GB RAM (DDR4 4x16GB 2666MHz ECC)
2x512GB NVMe in RAID0 (WD CL SN720)

Warning: spoilers ahead

What do you think? Which one will win the race?

Results

Server[s]H:M:S
Alpha281207h48m40s
Bravo262807h18m00s
Charlie250326h57m12s
Delta233146h28m34s

What are your --replay times?

H2
H3
H4
3 columns
2 columns
1 column
11 Comments
Ecency