This content was deleted by the author. You can see it from Blockchain History logs.

See, here's the thing…

I don't care about other users.

That sounds pointed, but it is exactly true. If I wanted recommendations from other users, I would just accept the fact that up votes or some kind of crowd indicator. Of course, we know by looking at Trending and Hot that it's a broken measure. It clearly doesn't work.

But more importantly, the opinion of other users has nothing to do with my opinion. The whole idea here is that individuation is far more effective at bringing me things that I will like than looking at what other people like and assuming that they are like me.

The Netflix recommendation system starts with a basic assumption that your registered preferences are representative of what you really want. Then they look at the registered preferences of other people and try to find the ones that are nearest to your express preferences in their fairly limited measure space. Then they look for things which other people have found desirable who are near you in preference space, check against whether you've seen it and rated it, and go from there.

The reason that they use the intermediary of other people's preferences is because they don't have the technology, processing power or time that it would take to directly find features in movies that you have rated preferably. Image processing, audio processing – those are hard problems. It's much easier and less data intensive to look at the information which is easy for them to acquire, expressed ratings and preferences.

We don't actually have that problem. We could, if we wanted, create an eigenvector which describes the up vote tendencies based on word frequency of any individuals' history. Then we could find the distance in our vector space of interesting words between those up vote tendencies.

But that wouldn't really be useful for actual discovery. It would only be useful for the discovery of content which someone else has already found and up voted.

I want to go the opposite direction. I want to be able to take a new post which no one has actually seen before and compare it to the eigenvector which describes the things I have historically liked in order to determine if this new piece of content is something that I might like. No amount of information from other people will actually make that more possible. Quite the opposite. It would only pollute the signal of my preferences.

Again, I am working actively to avoid and impositional process which originates on the system. I don't want the system to tell me what other people like, I want the system to be able to look at content and filter it for the things that I might like.

So far today I've managed to extract from the database my likes for an arbitrary period of time in the past, extract the text of those posts and comments, process them into word frequency lists, and I'm now preparing them to go into some kind of semantic analysis – but this is where things start getting hard.