 |

Key features of RCO text analysis
- complete parsing of a text using all linguistic domains (morphology, syntax, semantic, discourse)
- high precision text mining - a line of proprietary algorithms is implemented to take into account different linguistic rules and to resolve ambiguity;
- a wide field of application - full information about text constituents is provided (part of speech, part of sentence, grammar and semantic attributes, syntactic relations etc.)
- wide information extraction features:
- special entities (date, address, phone, monetary amount, credit card and account numbers, vehicle and passport numbers, different measures);
- proper named entities (persons, organizations, geography, goods and other proper named objects);
- terms named by noun phrases;
- relationships between entities;
- events, facts and their participants;
- topics of text on which the author's attention was focused;
- key linguistic algorithms are language independent;
RCO team has rich experience in tuning of linguistic algorithms for different European languages. RCO team has developed text analyzers for the next languages: English, Russian, Ukrainian.
Key stages of RCO text analysis
- text segmentation and extraction of blocks, paragraphs, sentences, delimiters and words;
- processing of text by dictionaries and grammar attributing of words and words strings;
- unknown words grammar attributing (based on rules of morphology);
- preliminary ambiguity resolution based on local context;
- recognition of special constructions based on specials rules (extraction of date, address, phone, etc.);
- proper names extraction based on their contexts in full text (extraction of persons, organizations, geography, goods and other proper named objects);
- anaphora resolution for proper named entities;
- complete syntactic analysis of sentences and semantic network building (extracting of relationships between objects);
- noun and verb phrases generation and standardization (generation of terms for representation of text content);
- search of target graph templates in semantic network (extraction of events and facts with their participants);
Semantic network: the RCO way to represent content of text
This network is the result of parsing sentence: On September 7th, 2006 John Smith accepted conditions of a contract with New Design ltd. for reconstruction of his family castle.
Key features are illustrated:
- special entities extraction (date)
- proper named objects extraction (person, organization)
- words grammar and semantic attributing
- relationships extraction
- noun phrases extraction
Semantic templates: the RCO way to extract events and facts with their participants
This network template has described semantic restrictions on a set of sentences about 'contract events', like the next:
- X has broken conditions of agreement with Y for Z.
- Conditions of deed with X have been accepted by Y.
- Conditions of contract on Z have been performed by Y.
So, 'contract event' may be detected in the next sentence: On September 7th, 2006 John Smith accepted conditions of a long-term agreement with New Design Ltd. for reconstruction of his family castle.
Result of event participants extraction:
- Signer1 = 'John Smith'
- Signer2 = 'New Design'
- Contract = 'long-term agreement'
- Subject = 'reconstruction of family castle'
- Event = 'accept'
- Date = 'On September 7th, 2006'


|