Data Preparation for Knowledge Discovery: A Comparison of Research and Clinical Databases

2.50
Hdl Handle:
http://hdl.handle.net/10755/159263
Type:
Presentation
Title:
Data Preparation for Knowledge Discovery: A Comparison of Research and Clinical Databases
Abstract:
Data Preparation for Knowledge Discovery: A Comparison of Research and Clinical Databases
Conference Sponsor:Midwest Nursing Research Society
Conference Year:2004
Author:Poynton, Mollie, MSN, RN, APRN, BC
P.I. Institution Name:IUPUI
Contact Address:Environments for Health, 1111 Middle Drive - NU483, Indianapolis, IN, 46202, USA
Co-Authors:Mollie R. Poynton, MSN, RN, APRN, BC; Brynja Örlygsdóttir, MSN, RN, Doctoral student; Josette F. Jones, PhD, RN, BC, Assistant Professor; Connie Delaney, PhD, RN, FAAN
Data preparation is a crucial component of the knowledge discovery in databases (KDD) process. The nature and process of data preparation varies according to type of data and project goals. Data collected for clinical and research purposes differs dramatically. The purpose of this project was to compare and contrast data preparation involving a research database vs. a clinical database. Comparison of the data preparation process for research and clinical databases was guided by Fayyad, Piatetsky, and Shapiro’s conceptualization of data preparation within the KDD process. Two databases were compared in this study. The 2000 U.S. National Health Interview Survey, data purposefully collected by the National Institutes of Health for estimating health indicators, served as the exemplar research database. The exemplar clinical database was a database of information from an Icelandic clinical information system. The three conceptual data preparation steps were actualized as a data assay. For each database, a data assay was conducted, consisting of data characterization, data exploration, and data set assembly. Additionally, preliminary feature selection was performed for one database. The software programs WizRule and Weka were applied as analytic aids. Machine learning algorithms from Weka, including BestFirst and GeneticSearch, were applied for feature selection. The research database (NHIS) was consistently structured with no missing values and no apparent data pollution. Though purpose-built for statistical analysis and relatively amenable to the KDD process, it necessitated substantial pre-processing prior to application of data mining algorithms. Success of data pre-processing was evidenced by convergence of preliminary machine learning algorithms. Preparation of the clinical database posed issues of data access. Preprocessing of the clinical database was hampered by access issues, and the data assay was limited. The nature of data preprocessing differed dramatically for the clinical and research databases.
Repository Posting Date:
26-Oct-2011
Date of Publication:
17-Oct-2011
Sponsors:
Midwest Nursing Research Society

Full metadata record

DC FieldValue Language
dc.typePresentationen_GB
dc.titleData Preparation for Knowledge Discovery: A Comparison of Research and Clinical Databasesen_GB
dc.identifier.urihttp://hdl.handle.net/10755/159263-
dc.description.abstract<table><tr><td colspan="2" class="item-title">Data Preparation for Knowledge Discovery: A Comparison of Research and Clinical Databases </td></tr><tr class="item-sponsor"><td class="label">Conference Sponsor:</td><td class="value">Midwest Nursing Research Society</td></tr><tr class="item-year"><td class="label">Conference Year:</td><td class="value">2004</td></tr><tr class="item-author"><td class="label">Author:</td><td class="value">Poynton, Mollie, MSN, RN, APRN, BC</td></tr><tr class="item-institute"><td class="label">P.I. Institution Name:</td><td class="value">IUPUI</td></tr><tr class="item-address"><td class="label">Contact Address:</td><td class="value">Environments for Health, 1111 Middle Drive - NU483, Indianapolis, IN, 46202, USA</td></tr><tr class="item-co-authors"><td class="label">Co-Authors:</td><td class="value">Mollie R. Poynton, MSN, RN, APRN, BC; Brynja &Ouml;rlygsd&oacute;ttir, MSN, RN, Doctoral student; Josette F. Jones, PhD, RN, BC, Assistant Professor; Connie Delaney, PhD, RN, FAAN </td></tr><tr><td colspan="2" class="item-abstract">Data preparation is a crucial component of the knowledge discovery in databases (KDD) process. The nature and process of data preparation varies according to type of data and project goals. Data collected for clinical and research purposes differs dramatically. The purpose of this project was to compare and contrast data preparation involving a research database vs. a clinical database. Comparison of the data preparation process for research and clinical databases was guided by Fayyad, Piatetsky, and Shapiro&rsquo;s conceptualization of data preparation within the KDD process. Two databases were compared in this study. The 2000 U.S. National Health Interview Survey, data purposefully collected by the National Institutes of Health for estimating health indicators, served as the exemplar research database. The exemplar clinical database was a database of information from an Icelandic clinical information system. The three conceptual data preparation steps were actualized as a data assay. For each database, a data assay was conducted, consisting of data characterization, data exploration, and data set assembly. Additionally, preliminary feature selection was performed for one database. The software programs WizRule and Weka were applied as analytic aids. Machine learning algorithms from Weka, including BestFirst and GeneticSearch, were applied for feature selection. The research database (NHIS) was consistently structured with no missing values and no apparent data pollution. Though purpose-built for statistical analysis and relatively amenable to the KDD process, it necessitated substantial pre-processing prior to application of data mining algorithms. Success of data pre-processing was evidenced by convergence of preliminary machine learning algorithms. Preparation of the clinical database posed issues of data access. Preprocessing of the clinical database was hampered by access issues, and the data assay was limited. The nature of data preprocessing differed dramatically for the clinical and research databases.</td></tr></table>en_GB
dc.date.available2011-10-26T21:51:18Z-
dc.date.issued2011-10-17en_GB
dc.date.accessioned2011-10-26T21:51:18Z-
dc.description.sponsorshipMidwest Nursing Research Societyen_GB
All Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated.