Access to the opinions of the Portuguese Republic Attorney (PGR) via web, by incorporating knowledge about Portuguese Language (namely a large lexicon, and multi-word units automatically extracted from the PGR corpus) in the search engine used.
Started in January 1998 and was concluded in 2001.
Participating entities: Heurística, CENTRIA - UNL, Procuradoria Geral da República.
Funding entity: Agência de Inovação.
Principal researcher: Gabriel Pereira Lopes.
1) Automatic extraction of thesaurus from partially parsed PGR corpus. The results of this effort were not yet inserted in the search engine used in this pro ject. 2) Supervised and unsupervised classification of documents of this collection of opinions. The first method used a neural network based approach and the key words used in those documents. The unsupervised classification used automatically extracted multi-word lexical units and statistical methods. Both must still be incorporated in the search engine used. 3) Statistically based parallel text alignment and translation equivalents extraction from parallel corpora continued. However it is: still required a large effort in order to enable access to the opinions of the Portuguese General Attorney, using any of the European Community languages. Work in the framework of project TRADAUT-PT will provide a large basis for making this possible, at least for English and French speaking people. #28 publications and a demo, together with final report.