Cambridge based Linguamatics, Brandwatch and the University of Sussex are being jointly funded by the UK’s Technology Strategy Board to address challenges faced by automated language processing software in harnessing diverse data sources.
The project forms part of a broader TSB initiative focusing on enabling technologies to harness ‘big data’ for economic growth.The development will improve automatic extraction of information from scientific papers, news or social media for applications in research and development, marketing, and competitive intelligence.
Dr David Milward, Linguamatics CTO, said: “Good quality vocabularies are a key part of 'intelligent' text mining. This project will allow us to develop vocabularies much faster, and adapt them efficiently for new applications.”
The current generation of language processing has had considerable success in extracting useful information from unstructured text, whether this is research literature or social media.
However, adapting to a new domain is often a laborious process, with respect both to the type of data (e.g. newswire vs. patent literature) and to the terminology used in a given domain (e.g. in medical practice vs. pharmaceutical research).
Humans can perform these tasks on small data sets, but face a huge challenge in the face of massively increasing amounts of electronic text.
The EVOKES project, which stands for Exploitation of Diverse Data via Automatic Adaptation of Knowledge Extraction Software, will exploit distributional similarity techniques developed by the University of Sussex. The project will run for 18 months.
Linguamatics is the leader in deploying natural language processing (NLP) based text mining for complex, high-value problem solving. Used by nine of the world’s top 10 pharmaceutical companies and other major commercial, academic and government organisations, Linguamatics operates globally, with offices in the UK and US.
• PHOTOGRAPH SHOWS: Dr David Milward





Linguamatics wins TSB funding

