How it works

This application is designed as a tool to discourage the use of offensive language and cyber-bullying. It uses a subtle two layer system - evasion and awareness - to achieve this, which is outlined below:

Evasion

Before any offensive language detection takes place, the keyboard provides a next-word prediction feature to make typing easier for the user. This is accomplished using an n-gram model. It uses a corpus of the ~1 000 000 most commonly used English trigrams (ex. she-ate-the) modified to steer the user away from writing offensive sentences in the first place. This along with a spellchecker void of overtly inappropriate words encourages politer messages without the user even noticing.

Awareness

Then, when the phrase is complete and the button is pressed, the application takes in the input and analyzes it's offensiveness. To do this, it first breaks up the phrase into it's individual words, and then uses a Part-of-Speech tagger to identify what the word is (i.e. noun, adjective, determiner). From there, the application will run a series of tests on the phrase. Once it knows the results of those tests, it will make a decision, based on machine learning. Some of the tests are:

What is the length of the sentence? How many exclamation marks are there? How many question marks are there? How many emoji's are there? How many verbs are there? How many determiners are there? How many adjectives are there? How many negations are there? (ex. Not, isn't) How many intensifiers are there? (ex. Very, extremely)

And most importantly, What is the polarity of the sentence?

How we determine the polarity is similar. We take the phrase, and break it up into it's individual parts. We then run each word through a large corpus of individually annotated words with polarity scores (ex. Shit = -4). If the word is polar, we obtain the associated value and add it to the total. If there happens to be a negation, we may reverse the polarity of an individual value (-4 -> +4), or if there is an intensifier, we may multiply the value (-4 -> -8).

Machine learning

Now, we introduce machine learning. It works by taking in our features (ex. Number of determiners), and identifying future instances of offensiveness based off past trends. In our case, we use supervised machine learning, because we are classifying data which is already annotated and it's offensiveness known. We provide a data-set for our classifier to train on, and we also provide the data to test. In this way, through our approach, the machine has "learned" how to classify phrases in the future with some accuracy.

With all of our features in mind, we then run a linear regression classifier to predict whether or not the phrase is offensive. Linear regression classification works by taking our set of distinct data-points and then plotting the best possible linear function through the points to separate them with accuracy. The classifier has already built a function through supervised learning on annotated data from a ~5500 phrase twitter corpus, so now all that must be done is plotting the feature values on the grid to predict if it is offensive.


Get it on Google Play