Home About us News Products Technologies Publications in Russian: in Russian


Home
Technologies

Technologies














Key features of RCO text analysis

  • complete parsing of a text using all linguistic domains (morphology, syntax, semantic, discourse)
    • high precision text mining - a line of proprietary algorithms is implemented to take into account different linguistic rules and to resolve ambiguity;
    • a wide field of application - full information about text constituents is provided (part of speech, part of sentence, grammar and semantic attributes,  syntactic relations etc.)
  • wide information extraction features:
    • special entities (date, address, phone, monetary amount, credit card and account numbers, vehicle and passport numbers, different measures);
    • proper named entities (persons, organizations, geography, goods and other proper named objects);
    • terms named by noun phrases;
    • relationships between entities;
    • events, facts and their participants;
    • topics of text on which the author's attention was focused;
  • key linguistic algorithms are language independent;

RCO team has rich experience in tuning of linguistic algorithms for different European languages. RCO team has developed text analyzers for the next languages: English, Russian, Ukrainian.

Key stages of RCO text analysis

  • text segmentation and extraction of blocks, paragraphs, sentences, delimiters and words;
  • processing of text by dictionaries and grammar attributing of words and words strings;
  • unknown words grammar attributing (based on rules of morphology);
  • preliminary ambiguity resolution based on local context;
  • recognition of special constructions based on specials rules (extraction of date, address, phone, etc.);
  • proper names extraction based on their contexts in full text (extraction of persons, organizations, geography, goods and other proper named objects);
  • anaphora resolution for proper named entities;
  • complete syntactic analysis of sentences and semantic network building (extracting of relationships between objects);
  • noun and verb phrases generation and standardization (generation of terms for representation of text content);
  • search of target graph templates in semantic network (extraction of events and facts with their participants);

Semantic network: the RCO way to represent content of text  

This network is the result of parsing sentence: On September 7th, 2006 John Smith accepted conditions of a contract with New Design ltd. for reconstruction of his family castle.

Key features are illustrated:

  • special entities extraction (date)
  • proper named objects extraction (person, organization)
  • words grammar and semantic attributing
  • relationships extraction
  • noun phrases extraction

Semantic templates: the RCO way to extract events and facts with their participants

This network template has described semantic restrictions on a set of sentences about 'contract events', like the next:

  • X has broken conditions of agreement with Y for Z.
  • Conditions of deed with X have been accepted by Y.
  • Conditions of contract on Z have been performed by Y.

So, 'contract event' may be detected in the next sentence:
On September 7th, 2006 John Smith accepted conditions of a long-term agreement with New Design Ltd. for reconstruction of his family castle.

Result of event participants extraction:

  • Signer1 = 'John Smith'
  • Signer2 = 'New Design'
  • Contract = 'long-term agreement'
  • Subject = 'reconstruction of family castle'
  • Event = 'accept'
  • Date = 'On September 7th, 2006'

 


  top


Contacts

  Tel./Fax: +7 495 287-9887 e-mail: info@rco.ru



Copyright © 2010 RCO. All Rights Reserved.