An Application of Machine Learning for the Identification of Adolescent Smoking Risk Factors

2.50
Hdl Handle:
http://hdl.handle.net/10755/621851
Category:
Full-text
Format:
Text-based Document
Type:
Poster
Level of Evidence:
N/A
Research Approach:
N/A
Title:
An Application of Machine Learning for the Identification of Adolescent Smoking Risk Factors
Author(s):
Chung, Sophia; Li, Youngji
Lead Author STTI Affiliation:
Gamma
Author Details:
Sophia J. Chung, PhD, MSN, RN, Professional Experience: 2015-present -- Assistnat Professor, Department of Nursing, University of Ulsan, South Korea Author or coauthor of three publication primarily related to child health care and three additional publication related to various topic, including health promotion and health literacy About ten presentations at conferences or meetings Author Summary: Sophia Chung is an assistant professor from University of Ulsan. Her research focuses on adolescents' health promotion, including physical activity, diet, and smoking. At this presentation, she explains the risk factors related to adolescents’ smoking behaviors using machine learning approach.
Abstract:
Purpose: Smoking is known to be a modifiable risk behavior that causes various health problems that include cancer and respiratory disease. Moreover, the literature reveals that adolescent smoking behaviors are likely to persist through adulthood, and this is the case in countries worldwide. In South Korea, despite many effeorts to reduce smoking among Korean adolescents, this modifiable risk behavior remains a significant social problem. An effective intervention to target and modify the behavior of adolescents concerning smoking must understand and address the factors that underlie and influence the behavior of smoking. These factors canbe surfaced in data using an appropriate approach. Machine learning is an approach that is well suited to reveal patterns of infromation in large, complex datasets that are useful in predicting outcomes (Chekround, 2016). For example, machine learning has been used to predict readmission in in-patients (Mortazavi, 2016; Frizzell, 2016). However, this approach had not yet been applied to address an adolescents risk behavior, such as smoking. Therefore, the goal of this study was to identify the predictors of adolescents smoking behaviors in South Korea using a machine-learning approach.

Methods: The 2015 Korean Youth Risk Behviors Web-based Survey (KYRBS) was used as the data source of this study. The KYRBS is an annual, nationwide survey conducted in South Korea to examine health behaviors that include cigarette smoking, individual hygiene, and alcohol consumption. Data gatered in the 2015 KYRBS was collected via self-report questionnaires responded to by 68,043 students in grades 7 through 12 in randomly-selected 800 schools in South Korea. For this study, we used 5,123 surveys which completed items concerning smooking on the questionnaires. This study utilized the machine-learning pipeline developed by Fayyad (1996) and Yoon (2015). To reduce the "surse of dimensionality," in which a high number of inter-related variables in large dataset interfere with the accuracy of the machine-learning model, we selected clinically meaningful features based on the concpetual framework for adolescent risk behaviors (Jessor, 1991). Then, we applied three machine learning algorithms embedded in Weka (i.e., J48, Naïve Bayes, and Logistic Regression) to build a predictive model for the smoking behavior of the adolescents represented by the KYRBY dataset. The final model was selected based on the accuracy of not only the predictive model, but also the F-measure calculated using precision and recall rate.

Results: Through the feature selection process, we classified 40 features into three predictive categories. Among three machine algorithms we applied, we found that the Logistic Regression algorithm demonstrated the highest level of accuracy (i.e., 84.0% of adolescent smokers were correctly classified; F-measure = 0.795). Using this model, grade (-0.06) and alcohol consumption (-0.56) were the top two features with the highest coefficietns. In other words, middle school students and students who had never drank alcohol were highly associated with the behavior of smoking.

Conclusion: Our studey demonstrates that a machine-learning approach is effective in identifying behavioral predictors from a large, complex dataset—in this case, the behavioral predicators associated with smoking using the KYRBY. However, our study results were inconsistent with those reported in the literature. Previous study shooed that increasing grade and previous alcohol consumption were associated with adolescents' smoking behaviors (Mendol, 2013; Talip, 2015). Further study with association between smoking behaviors and alcohol consumption among Korean adolescent is needed. Although this study did have some limitations (e.g., the data from the KYRBY is cross-sectional), our machine-learning approach shows promise, and subsequent research using longitudinal data can take into account the trends of association implicit in creating a predictive model.

Keywords:
adolescents; cigarette smoking; machine learning
Repository Posting Date:
14-Jul-2017
Date of Publication:
14-Jul-2017
Other Identifiers:
INRC17PST
Conference Date:
2017
Conference Name:
28th International Nursing Research Congress
Conference Host:
Sigma Theta Tau International
Conference Location:
Dublin, Ireland
Description:
Event Theme: Influencing Global Health Through the Advancement of Nursing Scholarship

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.type.categoryFull-texten
dc.formatText-based Documenten
dc.typePosteren
dc.evidence.levelN/Aen
dc.research.approachN/Aen
dc.titleAn Application of Machine Learning for the Identification of Adolescent Smoking Risk Factorsen_US
dc.contributor.authorChung, Sophiaen
dc.contributor.authorLi, Youngjien
dc.contributor.departmentGammaen
dc.author.detailsSophia J. Chung, PhD, MSN, RN, Professional Experience: 2015-present -- Assistnat Professor, Department of Nursing, University of Ulsan, South Korea Author or coauthor of three publication primarily related to child health care and three additional publication related to various topic, including health promotion and health literacy About ten presentations at conferences or meetings Author Summary: Sophia Chung is an assistant professor from University of Ulsan. Her research focuses on adolescents' health promotion, including physical activity, diet, and smoking. At this presentation, she explains the risk factors related to adolescents’ smoking behaviors using machine learning approach.en
dc.identifier.urihttp://hdl.handle.net/10755/621851-
dc.description.abstract<div class="sectionbox"> <div class="section paperreviewdisplay paperdefaultdisplay reviewdisplay defaultdisplay"> <div class="columnwrapper"> <div class="displayinfo leftcolumn"> <div class="section"> <div class="item"><strong>Purpose: </strong>Smoking is known to be a modifiable risk behavior that causes various health problems that include cancer and respiratory disease. Moreover, the literature reveals that adolescent smoking behaviors are likely to persist through adulthood, and this is the case in countries worldwide. In South Korea, despite many effeorts to reduce smoking among Korean adolescents, this modifiable risk behavior remains a significant social problem. An effective intervention to target and modify the behavior of adolescents concerning smoking must understand and address the factors that underlie and influence the behavior of smoking. These factors canbe surfaced in data using an appropriate approach. Machine learning is an approach that is well suited to reveal patterns of infromation in large, complex datasets that are useful in predicting outcomes (Chekround, 2016). For example, machine learning has been used to predict readmission in in-patients (Mortazavi, 2016; Frizzell, 2016). However, this approach had not yet been applied to address an adolescents risk behavior, such as smoking. Therefore, the goal of this study was to identify the predictors of adolescents smoking behaviors in South Korea using a machine-learning approach. <p><strong>Methods: </strong>The 2015 Korean Youth Risk Behviors Web-based Survey (KYRBS) was used as the data source of this study. The KYRBS is an annual, nationwide survey conducted in South Korea to examine health behaviors that include cigarette smoking, individual hygiene, and alcohol consumption. Data gatered in the 2015 KYRBS was collected via self-report questionnaires responded to by 68,043 students in grades 7 through 12 in randomly-selected 800 schools in South Korea. For this study, we used 5,123 surveys which completed items concerning smooking on the questionnaires. This study utilized the machine-learning pipeline developed by Fayyad (1996) and Yoon (2015). To reduce the "surse of dimensionality," in which a high number of inter-related variables in large dataset interfere with the accuracy of the machine-learning model, we selected clinically meaningful features based on the concpetual framework for adolescent risk behaviors (Jessor, 1991). Then, we applied three machine learning algorithms embedded in Weka (i.e., J48, Naïve Bayes, and Logistic Regression) to build a predictive model for the smoking behavior of the adolescents represented by the KYRBY dataset. The final model was selected based on the accuracy of not only the predictive model, but also the F-measure calculated using precision and recall rate.</p> <p><strong>Results: </strong>Through the feature selection process, we classified 40 features into three predictive categories. Among three machine algorithms we applied, we found that the Logistic Regression algorithm demonstrated the highest level of accuracy (i.e., 84.0% of adolescent smokers were correctly classified; F-measure = 0.795). Using this model, grade (-0.06) and alcohol consumption (-0.56) were the top two features with the highest coefficietns. In other words, middle school students and students who had never drank alcohol were highly associated with the behavior of smoking.</p> <p><strong>Conclusion: </strong>Our studey demonstrates that a machine-learning approach is effective in identifying behavioral predictors from a large, complex dataset—in this case, the behavioral predicators associated with smoking using the KYRBY. However, our study results were inconsistent with those reported in the literature. Previous study shooed that increasing grade and previous alcohol consumption were associated with adolescents' smoking behaviors (Mendol, 2013; Talip, 2015). Further study with association between smoking behaviors and alcohol consumption among Korean adolescent is needed. Although this study did have some limitations (e.g., the data from the KYRBY is cross-sectional), our machine-learning approach shows promise, and subsequent research using longitudinal data can take into account the trends of association implicit in creating a predictive model.</p> </div> </div> </div> </div> </div> </div>en
dc.subjectadolescentsen
dc.subjectcigarette smokingen
dc.subjectmachine learningen
dc.date.available2017-07-14T20:12:54Z-
dc.date.issued2017-07-14-
dc.date.accessioned2017-07-14T20:12:54Z-
dc.conference.date2017en
dc.conference.name28th International Nursing Research Congressen
dc.conference.hostSigma Theta Tau Internationalen
dc.conference.locationDublin, Irelanden
dc.descriptionEvent Theme: Influencing Global Health Through the Advancement of Nursing Scholarshipen
All Items in this repository are protected by copyright, with all rights reserved, unless otherwise indicated.