Easing the load
Labeling data by hand can be a very boring and repetitive task. The end-goal of this project was to assist humans assigning categories to companies by automatically generating recommendations and also in some cases automatically categorizing them. This takes away some of the burden and lets people focus on more productive tasks.
Data is king
In machine learning a good model is worth nothing without good training data so the first step of this project was to create a pipeline to collect as much relevant data about companies as possible. This includes augmenting the data and extending it by pulling in external data sources.
Understanding natural language
Much of the data used as input to the model was text written by humans. In order for the model to understand human language, it needs to be processed and converted into a form the computer can understand. To do this we used state-of-the-art NLP techniques to normalize and tokenize the text, taking into account spelling errors and other mistakes.