IPP-BAS - Institute for Parallel Processing, Bulgarian Academy of Sciences

The Linguistic Modelling Laboratory (LML), a department of the Institute for Parallel Processing (IPP) of the Bulgarian Academy of Sciences (BAS) will lead WP3 (Common sense ontology engineering) and Workpackage 5 (Multimedia ontology) as key enablers for the semantic-based knowledge flow system.

Since its establishment in 1987, the LML hosted a number of projects dealing with the application of knowledge representation to natural language processing from two different perspectives - using represented knowledge for tasks of semantic analysis and exploring methods for representation and acquisition of linguistic knowledge itself. The researchers at the LML have experience with knowledge representation languages and systems that support them like Conceptual Graphs, Description Logics and Typed Features Logics. In April 2000 the IPP-BAS was recognised as a Centre of Excellence in Information Technology (CEIT), financially supported by the EC under the Fifth Framework Programme. The total staff of IPP-BAS is 112 persons including six full professors, 24 associate professors, 39 research assistants and 20 university educated specialists.

The experience of LML with ontologies is in the area of semantic dictionary construction for Bulgarian. They will use the Core Ontology of the SIMPLE project (http://www.ub.es/gilcub/ingles/projects/european/simple.html) which contains about 140 concepts and about 50 relations. In the creation of the Bulgarian semantic dictionary they classified the sense of about 20 000 words with respect to the Core Ontology. With respect to NLP (Natural Language Processing) tasks they have great experience with Bulgarian. They have participated in CLEF twice in the following tasks: Bulgarian-English Question Answering and Bulgarian-Bulgarian Question Answering. Also, in the past they worked on the adaptation of GATE architecture to Bulgarian and Russian named entities. Thus they will start with texts in Bulgarian and English from which they will extract the semantic information to support the ontologies construction. After they have the ontologies “in shape” they will extend the access to it via constructing appropriate dictionaries in French and Italian.

The BulTreeBank team is a part of the Linguistic Modelling Laboratory. It developed an XML-based system, called CLaRK. The main aim of the system’s design was the minimisation of human intervention during the creation of language resources. It incorporates several technologies: XML technology; Unicode; Regular Cascaded Grammars; Constraints over XML Documents.