Below is a list of Hive-related programming issues worked on by BlockTrades team during the last week:
Hived work (blockchain node software)
We launched the first testnet of hived for hardfork 25 last Wednesday and have been experimenting with configuring tinman for it.
Tinman is a testnet management tool for hived that can create accounts and inject various forms of test data into a hived node.
During some of our testing, we found a problem with hived where it would eat up 100% CPU of a core if a hived was launched without a valid seed node because the blockchain code was stuck in a tight loop polling for transactions and blocks that never show up (because the node has no peers to get data from).
Even more problematically, no reasonable diagnostic information was reported in this situation, so it took a little while for us to identify the real problem. We’re currently working on a fix for this issue that lowers the CPU overhead in this case, and also adds warning messages to identify the problem.
We found a problem with beem (the python-based API library for Hive) used for running API tests on hived. @howo provided a quick patch and we’re working on a long term fix now (should be fixed tomorrow).
@howo also made updates to the recurrent transfers function based on specification change requests by our devs, and we are planning a final review of those changes tomorrow, with the plan to merge it into the develop branch for deployment in the next testnet launch.
Jussi caching optimization
Over the weekend, I observed that our hivemind was loaded much more heavily than normal (this was initially noticeable on hive.blog where it was taking longer to open a post to read).
Eventually I found that our node is getting hit with a dramatic increase in the number of bridge.get_discussion API calls (these calls are made when you navigate to a user's post on a site like peakd or hive.blog) and this led to a CPU bottleneck in the python code that process API calls.
It turns out there's a new site that allows for browsing Hive posts, but unfortunately it is currently coded to make this call every second or so (I suspect to detect new comments on the post) and this resulted in a large increase in the number of API calls our hivemind had to process.
The ultimate solution was fairly simple: we reconfigured our Jussi process to cache results of this call for 1 second (previous we had no caching configured for this call).
All API calls first go to Jussi before being passed to hivemind, and if Jussi has cached the answer to a previous API call, then it can just return the previous answer instead of asking hivemind again.
Up till now, we haven't enabled a lot of caching on api.hive.blog other than a few basic things so that we could identify which calls are expensive for hivemind to process, and then optimize the processing of those calls.
But now that optimization work is mostly complete, so we'll be taking a look soon at ways to optimize our Jussi caching to reduce load on hivemind and hived, as this should allow for a substantial scaling increase for our node.
Hivemind (2nd layer applications + social media middleware)
We isolated the cause of the memory leak in hivemind: it appears to be a case where python isn’t releasing memory from dictionaries when the data in the dictionary is cleared. We’ve added code to do a “deep clean” of those dictionaries. We'll likely have performance measurements late next week on how much memory this saves at the current block height.
As a side note, I believe current nodes can recover this memory prior to obtaining our fix by stopping and restarting their hivemind instance.
We’ve also been making some changes to hivemind tests based on the change in the way community muting is being implemented, and those changes will probably be merged into the develop branch tomorrow.
We’ve been running full hivemind syncs on several systems to benchmark performance under different hivemind configurations. We found a slowdown on one machine running with one new hivemind command-line option and we’re trying to analyze if it is due to the use of the new command-line option or some other configuration issue on that system (e.g. hardware or database configuration). At this point, we’re trying various experiments to isolate the root problem on that system.
Modular hivemind (framework for hive applications)
This week, we completed the first pass work on the fork resolution code for modular hivemind using a fully SQL-based solution relying on shadow tables that save changes that need to be undone in the case of a fork switch.
We’re currently creating an architecture document to describe how the fork resolution code works and how to use it to create a Hive-based application.
Updating hived testnet
We’re planning to launch an updated testnet on Thursday with the latest fixes to hived discussed above.
That will be followed by a launch of an API node configured to draw data from the testnet (probably on Friday, if all goes well).
This API node will allow Hive applications to begin adding code changes to support new features added by the hardfork, such as vote expiration reporting.
New hivemind release planned soon
In the next week we’ll continue testing the latest changes to hivemind: first more performance testing of hivemind sync, then real-world performance testing on api.hive.blog.
If all that testing goes well, we may be able to release a new version of hivemind to API node operators by the end of next week. This release would contain the various bug fixes and performance optimizations reported in previous posts.
SQL Account history plugin for hived (requirement for modular hivemind)
We planned to get to this task last week, but unfortunately it was delayed by other tasks. But we hope to complete the changes to the SQL account history plugin that pushes data to postgres (eliminating the need for hivemind to pull the data via RPC calls) in the new couple of days.
After the plugin changes are completed, we’ll run a simultaneous replay of hived with a full sync of hivemind to measure the speedup, and see if it matches our expected improvement in hivemind full sync time (we expect 2 days versus the current 4 days required).
Anticipated hardfork date
I believe we still need to do more testnet-based testing before notifying exchanges of the new version of the code, and since we want to give them approximately 30 days to update after we have a well-tested code release, I think we're still at least 1.5 months out from a possible hardfork date.