Browse our site
About
People
Research Areas
Projects
Publications
Books
Book chapters
Journal articles
In proceedings
M. Sc. Dissertations
Ph. D. Dissertations
Technical reports
Events
Seminars
News
Management
You are here:
Home
Publications
View
Publication details
Publication details
Main information
Title:
Extraction and Transformation of Data from Semi-Structured Text
Publication date:
June 2007
Citation:
Raminhos07:thesis
Abstract:
The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.
M. Sc. dissertation
Authors:
Ricardo Raminhos
Supervisors:
João Moura Pires
School:
Universidade Nova de Lisboa
Note:
-
Url address:
-
Export formats
Plain text:
Ricardo Raminhos, Extraction and Transformation of Data from Semi-Structured Text, João Moura Pires (superv.), Universidade Nova de Lisboa, June 2007.
HTML:
<b>Ricardo Raminhos</b>, <u>Extraction and Transformation of Data from Semi-Structured Text</u>, <a href="http://centria.di.fct.unl.pt/people/members/view.php?code=542b14e1830dcf7566974fd36b6fccc7_amp_cscd=37b86088e9e80895b43f651bd3fa1bd5" class="supervisor">João Moura Pires</a> (superv.), Universidade Nova de Lisboa, June 2007.
BibTeX:
@mastersthesis {Raminhos07:thesis, author = {Ricardo Raminhos}, title = {Extraction and Transformation of Data from Semi-Structured Text}, school = {Universidade Nova de Lisboa}, note = {Jo{\~a}o Moura Pires (superv.); }, abstract = {The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.}, keywords = {ETL, Declarative Approach, Text Processing}, month = {June}, year = {2007}, }
Publication's urls
Full url:
http://centria.di.fct.unl.pt/publications/view.php?code=ccafedc8cd2832e3b55f811fb1518f35
Friendly url:
http://centria.di.fct.unl.pt/publications/view.php?code=Raminhos07:thesis
Departamento de Informática, FCT/UNL
Quinta da Torre 2829-516 CAPARICA - Portugal
Tel. (+351) 21 294 8536 FAX (+351) 21 294 8541