ON THE INTEGRATION OF LARGE DATABANKS BY A POWERFUL CATALOGUING METHOD

Authors

  • Zsolt T. Kardkovács
  • Gábor M. Surányi
  • Sándor Gajdos

Abstract

The integration of huge, disparate information sources has been an outstanding issue for more than a decade. Having realized that the dissimilarity of sources even with the same or related subject is manifold and difficult to tackle, constructing mediators and wrappers has become the common practice. Although under certain circumstances this approach delivers satisfactory results, it lacks the most important property: scalability. Furthermore, it gives no support for discovering similarity, which is often needed whenever no exact match can be returned for a particular search condition. For these reasons we address the problem of true unification of data sources in the present paper. We assume that the sources share a common schema, i.e. the main objective is the identification of compatible or identical entities. Our novel method accomplishes this task by automatically establishing a catalogue of data elements. Each category of the catalogue holds a set of compatible or identical items. This organization structure has two advantages: there is an intrinsic, fast lookup method, and similarity among data elements can be defined. We prove that the catalogue is theoretically computable even if the schema contains derived attributes.

Keywords:

automated data integration, data identity, data similarity, relational database, 0NF schema, catalogue

How to Cite

T. Kardkovács, Z., M. Surányi, G., Gajdos, S. “ON THE INTEGRATION OF LARGE DATABANKS BY A POWERFUL CATALOGUING METHOD”, Periodica Polytechnica Electrical Engineering, 48(1-2), pp. 61–70, 2004.

Issue

Section

Articles