Learning Causal Bayesian Networks from Literature Data

Authors

  • Péter Antal
  • András Millinghofer

Abstract

In biomedical domains free text electronic literature is an important resource for knowledge discovery and acquisition. It is particularly true in the context of data analysis, where it provides a priori components to enhance learning, or references for evaluation. The biomedical literature contains the rapidly accumulating, voluminous collection of scientific observations boosted by the new high-throughput measurement technologies. The broader context of our work is to support statistical inference about the structural properties of the domain model. This is a two-step process, which consists of (1) the reconstruction of the beliefs over mechanisms from the literature by learning generative models and (2) their usage in a subsequent learning phase. To automate the extraction of this prior knowledge we discuss the types of uncertainties in a domain with respect to causal mechanisms and introduce a hypothesis about certain structural faithfulness between the causal Bayesian network model of the domain and a binary Bayesian network representing occurrences (i.e. causal relevance) of domain entities in publications describing causal relations. Based on this hypothesis, we propose various generative probabilistic models for the occurrences of biomedical concepts in scientific papers. Finally, we investigate how Bayesian network learning with minimal linguistic analysis support can be applied to discover and extract causal dependency domain models from the domain literature.

Keywords:

Bayesian network learning, text mining

How to Cite

Antal, P., Millinghofer, A. “Learning Causal Bayesian Networks from Literature Data”, Periodica Polytechnica Electrical Engineering, 50(3-4), pp. 201–221, 2006.

Issue

Section

Articles