Gidi: anti-plagiarism tool for Hive with the intention of being much more.

GIDI

Proposal.png

Gidi will be a tool designed to improve the life of community administrators, having as its main function, in principle, to verify the publications that are notified to us by the administrators of possible plagiarism. With us, the administrators will not have to leave the Hive platforms to check if there is plagiarism or not in the publications that are made within their communities, that´s a goal we want to reach. A website will be created for the platform that provides information about the project and its live results.

It will make use of the Hive SQL Service to be able to process the data of the publications as well as the complaints. It will also make use of the API of a paid anti-plagiarism tool that will check the originality of the publications and possible reference sources.

In the future, we want to partner with Hivewatchers so their experience can provide us with information to improve the platform, as we consider that they play an important role, but as the project states, the volume of reports they have accepted is small in comparison with the volume of content generated in HIVE, hence the intention to automate to process larger volumes.

Initial business rules
• Scan publications in communities automatically using APIs.
• Define the initial plagiarism percentages as 0-50 non-plagiarism, 50-70 possible plagiarism, and 70-100 plagiarism.
• Display on the website publications with high plagiarism rates and report them.
• Notify community managers of positive plagiarism findings about a report in their community.
• Create a blacklist of users who plagiarize and sort them according to the number of failures and other aspects. As well as a white list with users who have a good track record.

Given how expensive a quality anti-plagiarism tool service can be, there will be limitations on the number of publications that will be scanned monthly. That is why we will limit the number of publications that each person can report depending on the use of our tool to avoid abuses. At the moment, a maximum of 2.5 million posts will be checked each month, which represents between 4000 and 6000 posts that can be scanned each month. We will also work on refining the text that is sent to the API so as not to consume more words than necessary, as it is known that many publications are made in different languages or have tags and other characters that do not need to be scanned. In this way, resources will be saved to avoid excessive consumption.

Reports will be made every 15 days in the account that we’ll create for the tool, where we specify the work that has been done in the previous days and the progress we are making. In addition to a monthly report with statistics on the results that the tool is providing.
We will also discuss the different links with the communities and we will focus on creating a community from the tool that will allow the promotion of Hive outside of it. The creation and management of a Twitter account will be the main way of such promotion.

Versions:

  • Pilot Version: The only function of the pilot version will be to test the most important functionalities and to learn from the community in order to provide them with an experience that will help them. That is why this version will only be used by a few communities and will not have a public website, we will work with some administrators and collaborators to process their complaints. Tests will be done to see the performance in a more dynamic environment, as the developers have done some functional testing at their own expense so far. These tests will start a month after the proposal is approved, which is the time it will take us to create a pilot version.

  • Version 1: His version will have a website and will open the possibility of complaints to the general public and these will be processed as allowed by the API and the results of the tests in the pilot version. So far, only texts will be scanned for plagiarism. Future versions will add the function of scanning links, images, and audio for audiovisual platforms.

  • Version 2: The purpose of this version is to bring the tool to the main frontends such as Ecency and Peakd so that it can visually highlight for all users the posts that have a high chance of being plagiarism. For this, we will need to collaborate with the team of developers of these platforms. We will begin to detect the abusive use of crossposting in different communities.

  • Version 2.1: This version intends to start implementing more advanced functionalities for plagiarism detection and other data of interest to HIVE related to the topic, using AI and Data Mining with the intention of optimizing the results and reducing, if possible, the costs of third-party services.

Time and Commitment:

It is a part-time job because the members of the proposal have fixed work commitments. As University Professor and Software Engineer in one case and a Photojournalist, Programmer, and Community Manager in the other.
Even so, we will give our full commitment because of our interest in the growth and improvement of Hive, and because we see all the opportunities that this healthy and creative environment will bring from many points of view.
We know that the process of adjusting the tool will take time and even if we manage to automate some processes, we understand the need to be aware of the results and the possible claims of failures during this process.
Maintaining and improving the infrastructure takes time as we intend to be in a continuous process of improvement.

Average time commitment per 7-day week: approximately 48 hours divided between two people. 48 / 7 = 6.85, rounded to 7 hours per day, every day of the year ($13 HBD per hour).

There are two members:

  • @manuphotos: developer, photographer, community manager, and creator of the HiveCuba community.
  • @tgarcialvarez: developer and teacher from the Cuban university CUJAE.
    Current Ask

Final Thoughts

We are asking for 114 HBD per day. This ensures that our two members and collaborators are paid for their time and dedication and the service can be supported (91 HBD per day) and also cover the operational costs of the anti-plagiarism API (23 HBD per day).

We would appreciate your support, criticism, and collaboration. Thank you for considering this proposal.

You can vote here:

Let's make Hive content cleaner!

H2
H3
H4
3 columns
2 columns
1 column
Join the conversation now