Hadoop MapReduce brainstorming: Bitshares

Why? I'm currently studying for an upcoming BigData exam which largely focuses on Map Reduce.

Market maker incentives

Given the transaction history for all holders of a select asset (large amount of data dumped to disk in a nosql manner), summarize the market maker participation (sum of buys and sells in a time period) for use during a manual sharedrop (market maker incentives).

The final output will be a text file containing rows of 'username final_trading_value percent_total_market_activity'.

We could then perform a sharedrop which takes percent_total_market_activity and the user's current_asset_holdings into account when performing the final distribution (so as to incentivize holding), or simply the activity so that users that no longer hold said asset but performed market maker activities are still rewarded.

Acknowledged proposed MR design limitations

Chained MR jobs instead of a single MR job. Potentially could introduce a partitioner/sorter to merge the two jobs into the one.
The current plan is to treat buyers and sellers identically, however we could include an user input for selecting a strategy (incentivize buyers/sellers only or both).
Not production MR code, simply a pseudocode plan.
If a 'market-maker' does not hold the

User input variables

Timestamp range for market history
Trading pair - bitUSD:BTS
Reference asset - bitUSD.

Pre-MR Data steps

Note: This will require interacting with the full_node client (with the History_API plugin enabled) through websockets as the cli_wallet has insufficient commands available to it & I don't believe you can authenticate over http remote procedure calls.

Example websocket commands:

Documentation: Github wiki docs, Bitshares.eu wiki docs

Note: The API docs do not have example output, so you'll need to run them before understanding their full output.

Login

> {"id":2,"method":"call","params":[1,"login",["",""]]}
< {"id":2,"jsonrpc":"2.0","result":true}

Get asset holder count

> {"id":1, "method":"call", "params":[2,"get_asset_holders_count",["1.3.0"]]}
< {"id":1,"jsonrpc":"2.0","result":24085}

Get asset holder accounts & asset holdings (10 instead of 24085 for simple example

Acquire list of asset holders -> Output to text file 'asset_holders.json'

> {"id":1, "method":"call", "params":[2,"get_asset_holders",["1.3.0", 0, 10]]}
< {"id":1,"jsonrpc":"2.0","result":[{"name":"poloniexcoldstorage","account_id":"1.2.24484","amount":"29000120286608"},{"name":"chbts","account_id":"1.2.224015","
amount":"21323905140061"},{"name":"yunbi-cold-wallet","account_id":"1.2.29211","amount":"14000000042195"},{"name":"bitcc-bts-cold","account_id":"1.2.152313","amo
unt":"10943523959939"},{"name":"yunbi-granary","account_id":"1.2.170580","amount":"10000000048617"},{"name":"jubi-bts","account_id":"1.2.253310","amount":"699215
7568429"},{"name":"bittrex-deposit","account_id":"1.2.22583","amount":"6843227690310"},{"name":"btschbtc","account_id":"1.2.224081","amount":"5000098977059"},{"n
ame":"bterdeposit","account_id":"1.2.9173","amount":"2195728656599"},{"name":"aloha","account_id":"1.2.10946","amount":"2061578333527"}]}

Dump each asset holder's transaction history to json file on disk

Note: This stage doesn't require websockets & can be performed using the web rpc.

curl --data '{"jsonrpc": "2.0", "method": "get_account_history", "params": ["customminer", "1000"], "id": 1}' http://127.0.0.1:8092/rpc > customminer_account_history.json

Finally merge the files

Merge the many json files into the one massive json file containing asset holders transaction history (potentially using the unix jq program).

Websocket clients

I've been looking into this and I don't believe you can automate wscat nor dump the command output to disk, so a simple bash script is out of the equation. I've narrowed down my preference to either Haskell or NodeJs.

Haskell: wuss
NodeJS: ws
Python: python-bitshares?
Other: https://github.com/joewalnes/websocketd (Maybe server only?)

Map Phase 1

Import user-variable: Timestamp range
- Disregard any transaction outwith timestamp range!
Filter each entry in transaction histroy json file for: Filled Order (FO)
For each parsed FO:
- Extract trade participants (buyer & seller)
- Extract trade amount
- Counter: Overall_traded_currency
Produce key: User1_User2 //buyer_seller
Produce value: User1TradeAmount_User2TradeAmount
End Map phase, outputting the key and value pair towards the reducer.

Reduce Phase 1

Note: Within the 1st reduce phase we have every occurrence of trades between user1_user2, nobody else and not the reverse user2_user1. Splitting this will require a second reduce job or changing the logic of the first MR task.

Input key & value pair from the mapper.
Output to text file 'participants.txt'
- User1:buy_amount
- User2:sell_amount

Map phase 2

For each row in 'participants.txt' file:
- Split row on ':'
- Key = Username
- Value = buy/sell amount

Partitioner

Sort alphabetically on key so as to send identical username <k,v> pairs to the same reducer.

Reduce phase 2

Sum the buy & sell values (amount, not trading value) for each user.
- Divide the total by the 'Overall_traded_currency' counter used during MR phase 1. This provides us a % of total market activity for an user.
- Output to text file 'end_data.txt'
  - username summed_trading_value percent_total_market_activity

Any input? Please do comment below!

Do you have an idea for a Map Reduce program for Bitshares or Steemit? I'd love to hear about it!

An alternative to getting the list of asset holders and their individual full account history would be to dump the tx from each blocks within a time range input by the user, if possible..

Best regards,
@cm-steem

Hadoop MapReduce brainstorming: Bitshares