Home About us News Products Technologies Publications in Russian: in Russian


Home
Publications

Publications




In total, the works of RCO developers on computer linguistics and artificial intelligence are presented in detail in more than 50 publications.

Since 2003, RCO is one of cofounders and active participants of Russian Information Retrieval Evaluation Seminar (our recent reports on 2009, 2008, 2006, 2005, 2004, 2003).

The following list contains abstracts of the most interesting publications on key aspects of RCO technologies and most significant milestones of their evolution. To our regret, full texts of the publications are available only in Russian. You can find these texts at our Russian web-site.


Ermakov A.E. Automatic extraction of facts from text of dossier: experience in anaphora resolution
Computational Linguistics and Intellectual Technologies: in Proc. of the International Conference Dialogue’2007, Moscow
The report is devoted to the experience in computer analysis and extracting facts from the text of special documents - dossier. Technical solutions for facts search based on a syntactic parser and syntactic-semantic templates are described. The author focused on special rules of discourse organization used for anaphora resolution.


Ermakov A.E., Kiselev S.L. The linguistic model for computer-based analysis of emotions introduced in text of mass media
Computational Linguistics and Intellectual Technologies: in Proc. of the International Conference Dialogue’2005, Moscow
The report is devoted to the experience of solving the problem of determining emotion category expressed in a text relative to a given object (e.g. person, company). The means used by the authors to form a rich emotional image of an object are sistemized, and a linguistic model to recognize all components of this image is proposed. The scheme of choice between positive/negative emotion categories is described.


Ermakov A.E. Reference of person and organization notation in Russian text of mass media: some empirical rules for computer-based text analysis
Computational Linguistics and Intellectual Technologies: in Proc. of the International Conference Dialogue’2005, Moscow
The report highlights problems that arise when computer-based named entity extraction (person and organization notations) in text is performed. Key features of appearance of these notations in mass media text are considered. Taking these features into account gives computer program abilities to determine co-referent notations with acceptable error rate. Author’s implementation of named entity extraction algorithm is proposed.


Ermakov A.E., Pleshko V.V. Computer morphology in the context of coherent text analysis
Computational Linguistics and Intellectual Technologies: in Proc. of the International Conference Dialogue’2004, Moscow
The report is devoted to key problems of morphological analysis of words in Russian language texts. Topics connected with unknown words analysis, homonym disambiguation, complex entities extraction are considered. It is shown how to use formal text properties and contextual information in order to improve precision of analysis. Also other topics are covered, such as design of morphological analyzer, effective dictionary coding, analysis of unknown words by means of rules and by analogy with known words. In conclusion, content of morphological dictionary is discussed.


Kiselev S.L., Ermakov À.Å., Pleshko V.V. Fact extraction from natural language text by means of network descriptions
Computational Linguistics and Intellectual Technologies: in Proc. of the International Conference Dialogue’2004, Moscow
The report is devoted to framework of analysis and fact extraction from Russian language texts. In the framework a text is represented in the form of a network with syntactic and semantic relations between objects, which reflects proposition structure used by author. This network is invariant to the form in which a fact was expressed by author. Facts are represented as network patterns that may contain conditions of node and connection matching, bindings of fact properties to nodes. The framework allows to find required semantic structures in text network and then to transform and interpret them.


Ermakov A.E. The meaning of text elements from the point of view of syntactic paradigm theory
Russian Language: its Historical Destiny and Present State: The Second International Congress of Russian Language Researchers, Moscow, Moscow State University, 2004
The method for estimation of text element meanings based on communicative aspects of a text generation is described. This method uses syntactic analysis to determinate parts of a sentence and their topic structure.


Ermakov A.E., Pleshko V.V., Mityunin V.A. RCO Pattern Extractor: program to extract special constitients from text
Informatization and information security of law-enforcement authorities: XI International Conference. Moscow, 2003
This article describes the software product RCO Pattern Extractor, which is intended to extract textual constituents according to patterns based on formal grammar. Product’s usage area includes recognition of complex textual elements, that have special spelling different from usual words and noun phrases, e.g. date, address, phone, etc. The article tells about text processing and key features of formal grammar used for building patterns.


Ermakov A.E., Pleshko V.V. Application of syntactic parsing in statistical text analysis systems
Information Technologies. Vol 7, 2002.
Application of syntactic parsing algorithms in statistical text analysis systems is considered. Our experience has shown that usage of these algorithms could increase quality of some methods for statistical text analysis when solving such problems as extraction of meaningful topics, detection of associative links between topics, automated text summarization.


Ermakov A.E. Incomplete syntactic analysis of text in information retrieval systems
Computational Linguistics and Intellectual technologies: in Proc. of the International Conference Dialogue’2002, Moscow, 2002
The report is devoted to the development of incomplete syntactic parser for Russian language and embedding it into full-text document analysis systems. The simplified syntactic parsing that omits verb control is still capable to extract noun-groups and resolve morphological ambiguity. We designed such parser on the base of context-free grammar. Simplification of the parsing algorithm allows us to process large arrays of text documents in times suitable for use in information retrieval and text analyzing systems. Our experiments demonstrate that this parser could be used on document preprocessing stage in statistical text analysis algorithms to increase the precision of these systems.


Ermakov A.E. Thematic analysis of natural language text structure
Information Technologies. Vol 11, 2000
The approach to text analysis in information retrieval systems is described. Its basic idea is to use associative semantic network to investigate text structure as a set of thematically related segments. Several applications of this approach, such as text summarization and thematic tracking are suggested.


Ermakov A.E., Pleshko V.V. Associative model of text generation in document classification task
Information Technologies. Vol 12, 2000.
A probabilistic model of natural language text generation is proposed, based on neuropsychological interpretations of human language communication process. The model takes in account semantic relations between terms and sentences. It is shown that document classification task can be performed by means of this model. Issues of model parameters estimation over training text corpora are also considered.


Ermakov A.E., Pleshko V.V. Associative model of text content in applied tasks of computer text analysis
Russian Language: its Historical Destiny and Present State: The Second International Congress of Russian Language Researchers, Moscow, Moscow State University, 2001
A probabilistic associative model of natural language text generation and perception is proposed, based on neuropsychological interpretations of human language communication process. Applications of the model to computer analysis of full-text documents, such as automatic classification and abstracting, are presented.


  top


Contacts

  Tel./Fax: +7 495 287-9887 e-mail: info@rco.ru



Copyright © 2010 RCO. All Rights Reserved.