2.50
Hdl Handle:
http://hdl.handle.net/10755/159518
Type:
Presentation
Title:
Data Mining the Nhanes Iii: Evolutionary Computation for Feature Selection
Abstract:
Data Mining the Nhanes Iii: Evolutionary Computation for Feature Selection
Conference Sponsor:Midwest Nursing Research Society
Conference Year:2002
Author:Cullen, Phyllis
P.I. Institution Name:University of Iowa
Title:Assistant Professor
Contact Address:College of Nursing, 482 Nursing Building, Iowa City, IA, 52242, USA
Contact Telephone:319.353.3019
Feature selection is the step in data mining where clusters of potentially important variables are identified in the dataset. Data mining uses a variety of techniques to detect patterns related to health outcomes which are not easily detected using traditional statistical methods. This methodological study in nursing informatics tested the applicability of the evolutionary computational method of genetic algorithm to high dimensional, diverse healthcare data using as an example the classification item of activity status. Purpose: To examine feature selection methods for an intelligent systems classifier (ISC) with diverse healthcare data. Aims: 1) Apply an ISC to secondary analysis of the NHANES III 2) Evaluate the performance of feature selection methods for an ISC; Theoretical framework: Philosophical realism; Sample: Dataset of the 20,050 adult cases of the NHANES III national health survey; Method: Data mining of the NHANES III using a genetic algorithm for feature selection and a neural network with a back-propagation variant for the classification task. Analysis: Description of feature selection sets and evaluation for reduction, descriptive statistics, standard measure of error with confusion matrix, and significance testing for differences in classifier performance on independent test sets. Findings: Evolutionary computation, specifically a genetic algorithm, reduced the initial cluster of 225 features to a cluster of 11 features. The classification performance with the 11 features was better than the random classification line of the ROC. (p=.000, SE=.008, two-tailed, non-parametric, n=5048) The genetic algorithm produced a set of diverse features including sensory, laboratory, exercise, and nutritional elements. No one group of features, such as laboratory data, psychosocial, or physical exam was exclusively selected. Therefore, a genetic algorithm is an effective tool for feature selection in nursing, a profession that routinely employs high dimensional, diverse data in research and practice.
Repository Posting Date:
26-Oct-2011
Date of Publication:
17-Oct-2011
Sponsors:
Midwest Nursing Research Society

Full metadata record

DC FieldValue Language
dc.typePresentationen_GB
dc.titleData Mining the Nhanes Iii: Evolutionary Computation for Feature Selectionen_GB
dc.identifier.urihttp://hdl.handle.net/10755/159518-
dc.description.abstract<table><tr><td colspan="2" class="item-title">Data Mining the Nhanes Iii: Evolutionary Computation for Feature Selection</td></tr><tr class="item-sponsor"><td class="label">Conference Sponsor:</td><td class="value">Midwest Nursing Research Society</td></tr><tr class="item-year"><td class="label">Conference Year:</td><td class="value">2002</td></tr><tr class="item-author"><td class="label">Author:</td><td class="value">Cullen, Phyllis</td></tr><tr class="item-institute"><td class="label">P.I. Institution Name:</td><td class="value">University of Iowa</td></tr><tr class="item-author-title"><td class="label">Title:</td><td class="value">Assistant Professor</td></tr><tr class="item-address"><td class="label">Contact Address:</td><td class="value">College of Nursing, 482 Nursing Building, Iowa City, IA, 52242, USA</td></tr><tr class="item-phone"><td class="label">Contact Telephone:</td><td class="value">319.353.3019</td></tr><tr class="item-email"><td class="label">Email:</td><td class="value">phyllis-cullen@uiowa.edu</td></tr><tr><td colspan="2" class="item-abstract">Feature selection is the step in data mining where clusters of potentially important variables are identified in the dataset. Data mining uses a variety of techniques to detect patterns related to health outcomes which are not easily detected using traditional statistical methods. This methodological study in nursing informatics tested the applicability of the evolutionary computational method of genetic algorithm to high dimensional, diverse healthcare data using as an example the classification item of activity status. Purpose: To examine feature selection methods for an intelligent systems classifier (ISC) with diverse healthcare data. Aims: 1) Apply an ISC to secondary analysis of the NHANES III 2) Evaluate the performance of feature selection methods for an ISC; Theoretical framework: Philosophical realism; Sample: Dataset of the 20,050 adult cases of the NHANES III national health survey; Method: Data mining of the NHANES III using a genetic algorithm for feature selection and a neural network with a back-propagation variant for the classification task. Analysis: Description of feature selection sets and evaluation for reduction, descriptive statistics, standard measure of error with confusion matrix, and significance testing for differences in classifier performance on independent test sets. Findings: Evolutionary computation, specifically a genetic algorithm, reduced the initial cluster of 225 features to a cluster of 11 features. The classification performance with the 11 features was better than the random classification line of the ROC. (p=.000, SE=.008, two-tailed, non-parametric, n=5048) The genetic algorithm produced a set of diverse features including sensory, laboratory, exercise, and nutritional elements. No one group of features, such as laboratory data, psychosocial, or physical exam was exclusively selected. Therefore, a genetic algorithm is an effective tool for feature selection in nursing, a profession that routinely employs high dimensional, diverse data in research and practice.</td></tr></table>en_GB
dc.date.available2011-10-26T22:05:22Z-
dc.date.issued2011-10-17en_GB
dc.date.accessioned2011-10-26T22:05:22Z-
dc.description.sponsorshipMidwest Nursing Research Societyen_GB
All Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated.