Using hiveSQL to find content

I was asked recently about using @hiveSQL as a tool to find 'good posts' to curate, and so at the risk of facing the wrath of @arcange (who may appear in the comments to tell me my query is killing the DataBase), here is a sample query with the code explanations.


data.jpg
middle right, lol


First you will need some login details for hiveSQL. Send 0.01 HBD to @hiveSQL and wait for an encrypted memo. I use peakD and Hive Keychain and so there will be a padlock next to the memo to decrypt it.


image.png
Reaz wants dataaa, hiveSQL obliges


image.png
Click the padlock to find out what's inside!

Next you will need a tool to connect to the DataBase. I use Linqpad which is free for what we require: https://www.linqpad.net/Download.aspx

Once it is downloaded and installed, you can use the details sent to you via the encrypted memo to connect.


image.png
Check 'Remember this connection' at the bottom of this window to save some time


Now you are connected it's time to pull some data. But first, a quick run through the buttons you'll need. From left to right:

image.png

Run your query, Stop your query, Rich Text results, and Tabular Results. Also, make sure your Language is set to SQL.

And a note before we start - it is not a good idea to select everything from a table. If your query is running for more than a minute or two, it might be an idea to stop it and have a rethink. It is also not good to use 'LIKE' statements to scan lots of text. I have done that here and but only on a small set of data - there is likely a better way!


Let's start with finding out content created in last 1 hour by author having reputation between 30 to 60 and post having minimum 500 words and reward less than 1$?

This is a fairly basic template, and so I've added some more filters in. The full query is below but I want to break it up to explain various parts.


SELECT
    author, A.reputation_ui,'https://hive.blog/'+ url, 
    left(body,500),
    pending_payout_value, 
    C.created, 
    body_length as CharacterCount

This is what will hopefully be returned by your query. The author, their reputation, the URL of the post (with your choice of condenser appended to the front, the first 500 characters of text (for an idea of the content), when it was created, and the number of characters in the content. A word count is not available and so dividing that total by 4 or 5 should give a rough idea.


FROM 
    Comments C
INNER JOIN Accounts A on C.author=A.name
WHERE

The Comments table is where 'posts' and comments/replies are kept. They are treated the same by the blockchain and SQL database, but are distinguishable in the DB using a couple of methods (one is discussed further below). For the author reputation, a join is made to the Accounts table. This is not so important for today but for those who care, the 'C' and the 'A' are aliases given to these tables when referencing them and the columns they hold.

'WHERE'.... stuff ='s other stuff.


    (
    NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"cn"%') 
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"ru"%')
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"kr"%') 
    )

This is used to filter out tags/topics. I don't know these languages and so wish to exclude them from the results. $.tags is a section of C.json_metadata and so wildcard searching, although painful, is less of an issue. There is likely a better way to do this, and I hope someone can chip in.


AND NOT category IN ('porn','tits','arse','fanny')

The category is the first tag and fortunately has its own column. You can use this to exclude tribes ('hive-XXXXX') or other topics you'd rather not see in the results.


AND depth = 0

This ensures that only top level comments (posts to you and I) are returned and it's use will save time and DataBase pain.


AND NOT author IN ('abh12345','add_spammer_here')

You know those authors who you are sick of seeing each day? You can exclude them here.


AND NOT active_votes LIKE('%"spaminator"%')

Another part of the query to make arcange cry, a wildcard search through the voters so far. In this case, if 'spaminator' has given the thumbs down to the content, it wont appear in the results.


AND datediff(minute, C.created,GEUTCTUTCDATE()) < 60*4

This compares the creation date of the post to this moment in time, converts that into seconds and compares against 60*4 - 4 hours. I ran the query with the suggested 1 hour window and 0 results were returned, damn!


AND A.reputation_ui BETWEEN 30 AND 60

Joining to the Accounts table and using 'A' as the alias, we can check the author reputation. In this case, the reputation will need to be between..... 30 and 60 to appear in the results.


AND pending_payout_value > 0.0100
AND pending_payout_value < 3.0000 

The could also be written as a 'BETWEEN' statement and it is to check the current pending payout of the post.


AND body_length  > 4000

The post must be at least 4000 characters in length, which if we use 4.79 characters as the average word length, is about 835 words. Fairly chunky for a post these days.


ORDER BY created desc

And finally, the way to order the results. I have used the date with the newest post first.


And here is the query in full:

SELECT
    author, A.reputation_ui,'https://hive.blog/'+ url, 
    left(body,500),
    pending_payout_value, 
    C.created, 
    body_length as CharacterCount
FROM 
    Comments C
INNER JOIN Accounts A on C.author=A.name
WHERE
    (
    NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"cn"%') 
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"ru"%')
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"kr"%') 
    )
AND NOT category IN ('hive','porn')
AND depth = 0
AND NOT author IN ('abh12345','add_spammer_here')
AND NOT active_votes LIKE('%"spaminator"%')
AND datediff(minute, C.created, GETUTCUTCDATE()) < 60*4
AND A.reputation_ui BETWEEN 30 AND 60
AND pending_payout_value > 0.0100
AND pending_payout_value < 3.0000 
AND body_length > 4000
ORDER BY created desc

The query above has taken less than a second to return a set of results which has been between 8 and 16 items. To me, that doesn't sound like a lot of posts in the past 4 hours, but obviously it is due to the criteria chosen and this can of course be changed.

And change you will need to, unless you speak Spanish or Polish. Although I did find this one:

@gubbatv/posts

And I'm not convinced it is her so I sent a short message and will comment on the post if I find out more.

Edit: Confirmed to be the owner of https://gubbatv.com/

Anyway, I hope this helps @sanjeevm, and anyone else who is interested. If you plan to have a go with the above, check the post again later to see if arcange has commented with some nicer script to replace the wildcard 'LIKE' searches.

Cheers,

Ash

H2
H3
H4
3 columns
2 columns
1 column
80 Comments
Ecency