Our team was tasked with leveraging speech recognition and machine learning to process thousands of hand written safety reports into accurately categorized data in a timely manner.
In the current workflow, data was recorded by hand, manually translated to computers, then became subject to further human analysis prior to determining results. The process was labour-intensive, expensive, and created room for inaccuracies.
Our team built a system that receives input data from the user. The system then processes this data to prepare an outcome. Various text mining techniques are applied to the input including removing whitespaces, lowercase, removing numbers, removing punctuations, removing stop-words and stemming the words. Further data was cleaned, chunked and tokenized, and trained, to ultimately achieve a predictive outcome.
Machine learning paradigms such as Semantic analysis, Text and Word Classification and Supervised Learning were explored to evaluate the training dataset against the predicting dataset.
- Data preparation
- Data interpretation
- Natural Language Processing
- Artificial Intelligence
- Machine learning
- Proof of concept
When compared with its predecessor, accuracy of model and improved over iterations until reaching an F1 score of over 95% accuracy. This model can now be leveraged as a proof of concept for larger implementations throughout the industry.