@Checky 0.2.1 - Highlighted Differences, Better Mention Detection a...

Click here to get redirected to the GitHub repository of this project.

It's been a while since @checky's last update post. These past two weeks, I've been focusing exclusively on @checky, fixing a few bugs here and there and adding some features that I thought might be useful to some of you. I also put some effort into improving @checky's performances in terms of speed and memory usage. This post is probably the last update post for @checky considering there isn't much left to do as you will see in the "What's coming next ?" section of this post. Before we start looking at changes made to this bot, I want to once again thank you all for the support you gave @checky these past two weeks, whether it was by upvoting, commenting or simply choosing not to flag its comments when it failed at doing its job. Finally before we start, I want to give special thanks to @mineopoly (Absolutely Nothing to Post (Comedy open Mic 33)) and @eoj (@checky Tribute Post) for making me laugh by talking about @checky in their posts ! Ok, now I'm done, let's get started !

Highlighted differences

When @checky made suggestions for lengthy mentions it was sometimes hard to see on first look what the differences between your mentions and the mentions suggested by @checky were. It should now be way easier thanks to differences highlighting. Any added, replaced or transposed character will be written in bold in the suggested mentions so that you can instantly see the differences and correct them faster than ever before. Deleted characters can't be bold though so a solution for that remains to be added to @checky. I have two ideas to show that a character has been deleted. The first one would be to highlight the two characters that surrounded it in the suggested mention (@rageapeanut would get @ragepeanut as a suggested mention), the second one would be to highlight the deleted character in the supposedly wrong mention (@rageapeanut would get @ragepeanut as a suggested mention). If you have any preference, make sure to tell me in the comments so I can see what most steemians prefer. (commit)

Improved mention detection

Since the version 0.1.0 of this bot, special cases that weren't thought of when coding @checky's mention detection started appearing. The one I should have thought of from the very beginning is mentions detected in code sections, it has been fixed by simply ignoring any mention found in code sections, the same applies to quote blocks since the text in them isn't supposed to come from the author (commit 1 - commit 2). Mentions were also detected by the bot in image names (commit) and in links when they were preceded by a hashtag, this has also been fixed. One particular case that I couldn't have possibly thought of when coding @checky was supposedly wrong mentions being found in referenced posts' titles, it has been fixed but may sometimes require the bot to make an additional API call (commit). Up until three days ago, @checky's mention detection was case insensitive (meaning it didn't make any difference between @checky and @ChECky). This was the case because I had realized that a few steemians liked to capitalize parts of the usernames of other steemians they mentioned. However, it ended up being more of a problem than anything because most of the capitalized mentions @checky found were not supposed to be Steem usernames, this is why I decided to make the mention detection case sensitive. Finally, a new app that you may have already heard of got released a few weeks ago, this app is Share2Steem (which aims to do the opposite of Tweem). While I have no problem whatsoever with this app, it being a crossposting app means that mentions in its posts are most likely not Steem usernames, this is why @checky will ignore any post made through this service from now on (commit).

Get some details on the mentions detected by @checky

Perhaps you already got a reply from @checky and you couldn't find where the supposedly wrong mention was in your post. It can quickly be frustrating to know that you made a typo somewhere in your post but to not know where exactly. That's where the !where command comes into play. By simply replying !where to @checky's suggestion comment you will be able to see the 30 words surrounding each supposedly wrong mention of the post. Be aware though that this command is still at its infancy so you may end up getting some weird results, mostly markups getting cut in half. (commit 1 - commit 2 - commit 3)

Suggested tags

After running this bot for a while it has become apparent that more steemians than I had expected confuse the "@" character with the "#" character when talking about tags. In order to make @checky helpful to those users, I had to make a few more checks. If no suggested mention could be generated for a supposedly wrong mention, the bot now tries to see if the mention was actually supposed to be a tag. In order to achieve that, it first looks for the supposedly wrong mention in the post's tags and then, if it doesn't find it, it looks for it in the 1000 trendiest tags. If after all of that it still doesn't find it, it looks at the tags previously used by the author of the post (even though the API call for that seems to be deprecated). (commit 1 - commit 2)

Speed, speed, speed

A major change has been made to @checky's code in order to boost its performances in terms of speed and memory usage: the use of JavaScript sets instead of arrays to store all the edits of supposedly wrong mentions. The old process would create an array for each type of edit (deletes, inserts, transposes and replaces) and make an union of them into one massive array from which correct mentions would be extracted. The new process only requires one set that it passes through the 4 edits functions. Since sets have unique values no matter what, there are less elements at the end of the edits in a set than in an array. Indeed, some edits combinations give the same result (for example a transpose can be equal to a replace in some situations). In order to reduce even more the size of the set, usernames generated by second generation edits need to be validated as correctly structured usernames before being added to the set. The use of a set instead of arrays may not make a big difference for small usernames but it made a huge difference for big usernames that went from getting fully processed in a few minutes with arrays to at most 20 seconds with sets. (commit)

Better stability

A few days ago I came across a post on #steemdev from @fullnodeupdate that intrigued me. @fullnodeupdate is a service/bot created by @holger80 that updates a list of fully working nodes every 3 hours in its account json_metadata. This is great for stability since I have no guarantee that the nodes I'm using for @checky will keep working forever. I'm watching @checky closely for now but there will have to come a time when I will just have to keep it running without looking at its activity every hour to make sure everything is working as expected. In preparation of that moment, I have changed the code to use @fullnodeupdate as a source of nodes and removed the nodes related properties from the config.json file. Just to be extra safe, a fail-safe node (the Steemit one) has been added to the config file. (commit 1 - commit 2)

Other changes

Improved the way the bot finds suggestions. It now checks the generated usernames against the Steem API after having checked them against the already encountered usernames. (commit)
The bot now checks for the existence of supposedly wrong mentions appended by "the". (commit)
A bunch of somewhat useless operations checking has been removed. (commit)
Most of the Lodash functions used by the bot have been replaced by more fitting functions written in utils/helper.js. (commit)
A test_environment setting and a log_error setting have been added to config.json to avoid committing commented code in the future. (commit 1 - commit 2)
A very small risk of concurrent file access on data/users.json has been fixed. (commit)
Started using steno to make use of atomic file writing (avoiding corrupted files if the bot unexpectedly crashes). (commit)

What's coming next ?

The only thing I'm sure of is that I will add a way to delete @checky's comments. Something like automatically deleting the comment if the supposedly wrong mentions in a post have been corrected by its author in the 24 hours following @checky's comment. I fear however that it is going to cause @checky's comments to never be interacted with, which would mean no upvotes to help me with the server's cost. If @checky ever becomes expensive to run I will work on generating weekly statistics posts, I hope it won't ever have to come to that though. I am still debating whether or not making an add-on for @checky is a good idea. The biggest problem is that such an add-on would be pretty RAM-heavy due to Peter Norvig's brute-force spelling corrector algorithm and I'm afraid that it could even freeze some steemians' browsers which is obviously not what I want. I also want to start focusing some of my time on a bigger project of mine. To be honest, I'd rather see mention checking being added to the @steem-plus add-on than making a separate add-on just for that. I almost forgot, in a few days I should be launching a logo contest (more like a profile picture contest actually) for @checky. Currently, its profile picture gives a cold vibe, I'd like to make @checky look more friendly.

Contributions

If you want to contribute to this project or talk about an issue it has, feel free to visit its GitHub page. You can also clone it and follow the instructions written there to get it running (although not recommended since @checky already runs the script). My social medias are listed at the end of the README file. If you add me on Steam, tell me the reason why on my wall, otherwise I won't accept your friend request.

Proof of Work Done

https://github.com/RagePeanut/

@Checky 0.2.1 - Highlighted Differences, Better Mention Detection and More !