|Year : 2016 | Volume
| Issue : 1 | Page : 71-75
Comparison of linear and zero-inflated negative binomial regression models for appraisal of risk factors associated with dental caries
Manu Batra1, Aasim Farooq Shah2, Prashant Rajput3, Ishrat Aasim Shah4
1 Department of Public Health Dentistry, Surendera Dental College and Research Institute, Sri Ganganagar, Rajasthan, India
2 Department of Public Health Dentistry, Government Dental College and Hospital, Srinagar, India
3 Department of Public Health Dentistry, Kothiwal Dental College and Research Centre, Moradabad, Uttar Pradesh, India
4 Department of Health, School Health Program, J and K Government Health Services, Srinagar, Jammu and Kashmir, India
|Date of Web Publication||2-Feb-2016|
Aasim Farooq Shah
Department of Public Health Dentistry, Government Dental College and Hospital, Shreen Bagh, Srinagar - 190 010, Jammu and Kashmir
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Context: Dental caries among children has been described as a pandemic disease with a multifactorial nature. Various sociodemographic factors and oral hygiene practices are commonly tested for their influence on dental caries. In recent years, a recent statistical model that allows for covariate adjustment has been developed and is commonly referred zero-inflated negative binomial (ZINB) models. Aim: To compare the fit of the two models, the conventional linear regression (LR) model and ZINB model to assess the risk factors associated with dental caries. Materials and Methods: A cross-sectional survey was conducted on 1138 12-year-old school children in Moradabad Town, Uttar Pradesh during months of February-August 2014. Selected participants were interviewed using a questionnaire. Dental caries was assessed by recording decayed, missing, or filled teeth (DMFT) index. Statistical Analysis Used: To assess the risk factor associated with dental caries in children, two approaches have been applied - LR model and ZINB model. Results: The prevalence of caries-free subjects was 24.1%, and mean DMFT was 3.4 ± 1.8. In LR model, all the variables were statistically significant. Whereas in ZINB model, negative binomial part showed place of residence, father's education level, tooth brushing frequency, and dental visit statistically significant implying that the degree of being caries-free (DMFT = 0) increases for group of children who are living in urban, whose father is university pass out, who brushes twice a day and if have ever visited a dentist. Conclusion: The current study report that the LR model is a poorly fitted model and may lead to spurious conclusions whereas ZINB model has shown better goodness of fit (Akaike information criterion values - LR: 3.94; ZINB: 2.39) and can be preferred if high variance and number of an excess of zeroes are present.
Keywords: Dental caries, linear regression, oral hygiene, zero-inflated negative binomial model
|How to cite this article:|
Batra M, Shah AF, Rajput P, Shah IA. Comparison of linear and zero-inflated negative binomial regression models for appraisal of risk factors associated with dental caries. J Indian Soc Pedod Prev Dent 2016;34:71-5
|How to cite this URL:|
Batra M, Shah AF, Rajput P, Shah IA. Comparison of linear and zero-inflated negative binomial regression models for appraisal of risk factors associated with dental caries. J Indian Soc Pedod Prev Dent [serial online] 2016 [cited 2019 Nov 20];34:71-5. Available from: http://www.jisppd.com/text.asp?2016/34/1/71/175521
| Introduction|| |
Oral health is an integral component of general health and is essential for well-being. Both of these have a reciprocal effect on each other. Among the various oral health diseases, dental caries is one of the most common and widely spread diseases affecting oral health. It is a cumulative and progressive disease causing pain, infection, and possible disfigurement, particularly in children. 
Due to its high global prevalence; dental caries among children has been described as a pandemic disease. The World Health Organization (WHO) has ranked it as number three among all chronic noncommunicable diseases that require worldwide attention for prevention and treatment.  Nearly, 20% of 2-4-year-old children have clinically detectable caries, and by age 17, nearly 80% of young people have one or more teeth with caries.  The multifactorial nature of dental caries has been proven by numerous studies in the past. Various sociodemographic factors and oral hygiene practices of the population are commonly tested for their influence on dental caries.
The most extensively used index in epidemiological studies on dental caries prevalence and experience is decayed, missing or filled surface (dmfs)/DMFS index; it is a simple addition of the number of decayed, missing or filled teeth (DMFT) (or surfaces), and represents the cumulative severity of dental caries experience in an individual or a population. DMF index normally has a high proportion of excess zeroes with respect to probability distributions. Zeroes are outcome values, and it is important to explicitly account for them in the analysis. Since counting outcomes do not meet the normality assumption that is required by many standard statistical tests, analysts have relied on a transformation to induce normality, which often does not work, or on the categorization of the outcome which may result in loss of information.  However, these models often underestimate the observed dispersion due to which result may get skewed.
Statistical models for counting data with an excess of observed zeroes have received some courtesy in recent years, especially in the econometric literature. ,, In the context of dental epidemiology, one solution is to assume that there are two latent or unobserved groups which could contribute to the excess zeroes, a most common such subpopulation is of children who comes from the zero state (they have no decayed teeth due to their personal characteristics) and another subpopulation of children, who are susceptible to caries development, or zero for chance or misclassification.  A recent model that recognize the existence of these two groups also allows for covariate adjustment in each group which have been developed and is commonly referred zero-inflated negative binomial (ZINB) models. ,, The ZINB model takes care of both overdispersion and zero-inflated issues.
The aim of this study is to compare the fit of the two models, the conventional linear regression (LR) model and ZINB model to assess the risk factors associated with dental caries among 12-year-old school children in Moradabad, Uttar Pradesh.
| Materials and Methods|| |
A cross-sectional survey was conducted on 1138 12-13-year-old schoolchildren in Moradabad Town, Uttar Pradesh during months of February-August 2014. According to the WHO (2013),  the importance given to 12 years age group is due to the fact that it is the age that children leave primary school. Therefore, it is the last age at which data can be easily obtained through a reliable sample of the school system. Moreover, it is possible that at this age all the permanent teeth except third molars have already erupted. Thus, the age of 12 was determined as the age of global monitoring of caries for international comparisons and monitoring of disease trends.
Ethical clearance was taken from the Ethical Committee of Kothiwal Dental College and Research Centre, Moradabad and permission in the selected schools were also obtained from the respective school authorities. A multistage stratified cluster sampling technique was used to achieve a representative sample. At the first stage, the city was divided into four zones - North, South, East, and West. Later at the second stage, four government schools were selected randomly from each zone based on probability proportional to enrollment (PPE) size. According to PPE, the schools with a high number of regularly attending students were more likely to be selected than schools with low number of students regularly attending. From selected schools, clusters of the 12-13-year-old students were selected to be included in the study.
Selected participants were interviewed using a questionnaire which comprised of questions attributed for sociodemographic background and oral hygiene practices. Sociodemographic data comprised gender, birth order, family size, place of residence, and parent education level. A number of visits to the dentist, tooth brushing frequency and its time comprised of the oral hygiene practices related questions.
The dental caries status was assessed by recording DMFT index for the children. DMFT index expresses numerically the caries burden in permanent teeth and is obtained by calculating the number of teeth which are decayed (D), missing due to caries (M), and filled (F). It illustrates how much the permanent dentition until the day of examination has become affected by dental caries. The examination was conducted by three examiners. For assessing inter-examiner reliability, the examiners were made to do the clinical examination for 20 students (not included in the study). The intra-class correlation between the results of examination of DMFT scores for the three examiners was 0.89, signifying a high agreement rate between the examiners. Descriptive statistics was reported for caries free (DMFT = 0) students (dependent variable) and also for other categorical variables which comprised of sociodemographic and oral hygiene practices.
To assess the risk factor associated with dental caries in children, two approaches have been applied - LR model and ZINB model. The zero part of ZINB model takes care of the zeroinflation, and the negative binomial part takes care of the correlated aspect of the DMFT. The Akaike information criterion (AIC) index was used to compare the two models used here. All these regression models were fitted to our data using the LR and ZINB modules of the Stata 9 Statistical software (Stata Corp., TX, USA).
| Results|| |
Sociodemographic characteristics and oral hygiene habits related descriptive statistics for 1138 children are shown in [Table 1]. Majority of children (67.4%) resided in urban areas. Among the whole sample, only 24.1% were the first child to the parents. The level of education was higher for fathers (30.8%) than mothers (16.3%). 53.1% of students did tooth brushing in the morning whereas only 12.9% did tooth brushing both times, i.e., morning and night. 68.5% of the students had never visited a dentist. The prevalence of caries-free subjects was 24.1% for the whole sample. The mean DMFT among students with any caries was 3.4 ± 1.8.
|Table 1: Sociodemographic, oral hygiene practices, and dental caries status of children|
Click here to view
The results of fitting the linear and ZINB regression model to the DMFT index with independent caries risk predictors are shown in [Table 2]. In LR model, all the variables were statistically significant. Whereas in ZINB model, negative binomial part showed place of residence, father's education level, tooth brushing frequency, and dental visit statistically significant implying that the degree of being caries-free (DMFT = 0) increases for group of children who are living in urban, whose father is university pass out, who brushes twice a day and if have ever visited a dentist. However, after taking care of all the "zeros" in the dataset, zero-inflated part revealed father's education and tooth brushing frequency as statistically significant.
|Table 2: Comparison between linear regression model and zero-infl ated negative binomial model for caries free (DMFT = 0) as dependent variable|
Click here to view
For comparing the goodness of fit the AIC values for the regression models were assessed (LR: 3.94; ZINB: 2.39), it is evident that the ZINB model since having lower AIC value fits the data better than the other model.
| Discussion|| |
Dental caries is one of the most common diseases of childhood, can be associated with various risk factors; among these factors sociodemographic and oral hygiene practices are considered as key factors. To reduce the burden of dental caries among children, there is a need to identify the risk factors associated with high dental caries. An appropriate statistical modeling has an important role in understanding caries risk factors.
The tendency of DMFT index data contains an excess zero, it does not perfectly fit the some standard distributions and referred to as zero-inflated , because of a number of extra zeros caused by the real effect on caries distribution of interest and it is a special case of overdispersion. , It creates problem while making a sound statistical inference by violating the basic assumptions implicit in the utilization of the standard distributions and misinterpretations of the variance-mean relationship of the error. 
Therefore, the process of choosing the best model to examine risk factors which are associated with a disease is a trade-off between simplicity and accuracy. This is particularly true for caries disease in childhood since a large proportion of children are caries-free (zero counts) according to DMFT index while a small number of children typically account for an extreme amount of caries.  The study has compared the fit of the two models, the conventional LR model and ZINB model to assess the risk factors associated with dental caries.
The current study has reported that LR model has resulted in a poorly fitted model. It may lead to spurious conclusions, such as concluding that a factor is important in predicting caries disease when in fact this may not be the case. It is stated that although the LR model is still recommended for analyzing count data, it almost always does not fit very well because of overdispersion and it should be used for modeling independent counts. However, it is evident that DMFT is a count based on dependent counts. In fact, most count data are detected in the same mouth. Consequently, when the LR model is used to predict the probability of the dependence of dental caries on the influence of childhood sociodemographic factors and on the oral hygiene practices, ,, there is the risk of biased prediction.
In the present cross-sectional survey, where greater differences in the DMFT values have been found amongst large sections than within them, with a very large number of zero counts, the detected frequency of zero was larger than that predicted by the LR model. Consequently, a zero-inflated model was used to re-evaluate the results by LR model.
It was found that all factors which have been considered into the LR model resulted as significant predictors for DMFT = 0 [Table 2], whereas in ZINB model, negative binomial part showed place of residence, father's education level, tooth brushing frequency, and dental visit statistically significant implying that the degree of being caries-free (DMFT = 0) increases for group of children who are living in urban, whose father is university pass out, who brushes twice a day and if have ever visited a dentist. However, after taking care of all the "zeros" in the dataset, zero-inflated part revealed only father's education and tooth brushing frequency as statistically significant predictors. The significant factors showed by ZINB model are also reported as associated with that the degree of being caries-free (DMFT = 0). ,,
In current settings where there was an excess of zeroes ZINB model have shown the better goodness of fit (AIC values - LR: 3.94; ZINB: 2.39). Since DMFT count data frequently exhibit overdispersion in addition to possible zero-inflation, an obvious methodology is to use a model that can accommodate overdispersion and zero-inflation. It is evident that ZINB can be preferred if high variance and a number of an excess of zeroes are present. Moreover, as reported, the ZINB approach was found to have the best fit not only with cross-sectional caries data but also with data in the longitudinal study. 
Limitations of the present study include its cross-sectional nature, which limits the ability to identify causative factors. Longitudinal designs would increase the knowledge on the determinants of dental caries. In addition, no information was available about nutritional factors of the children. A positive aspect of this study was the comparison of the conventional LR and ZINB model to counter the high proportion of zero scores for DMFT index.
| Conclusion|| |
It can be concluded that while analyzing the caries data, such models should be used which provide an appropriate fit and meaningful interpretation. The attention should be paid to the functional form of the outcome to ensure that underlying assumptions of the utilized methods are met. The current study has reported that the LR model is a poorly fitted model and may lead to spurious conclusions whereas ZINB model has shown the better goodness of fit and can be preferred if high variance and a number of an excess of zeroes are present.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Selwitz RH, Ismail AI, Pitts NB. Dental caries. Lancet 2007;369:51-9.
Petersen PE, Bourgeois D, Ogawa H, Estupinan-Day S, Ndiaye C. The global burden of oral diseases and risks to oral health. Bull World Health Organ 2005;83:661-9.
Kopycka-Kedzierawski DT, Auinger P, Billings RJ, Weitzman M. Caries status and overweight in 2- to 18-year-old US children: Findings from national surveys. Community Dent Oral Epidemiol 2008;36:157-67.
Petersen MR, Deddens JA. A comparison of two methods for estimating prevalence ratios. BMC Med Res Methodol 2008;8:9.
Greene WH. Accounting for Excess Zeroes and Samples Election in Poisson and Negative Binomial Regression Models. Working Paper. New York: Department of Economics, Stern School of Business, New York University; 1994.
Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992; 34:1-14.
Mullahy J. Heterogeneity, excess zeros, and the structure of count data models. J Appl Econom 1997;12:337-50.
Bohning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U. The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc Ser A 1999;162:195-209.
Lewsey JD, Thomson WM. The utility of the zero-inflated poisson and zero-inflated negative binomial models: A case study of cross-sectional and longitudinal DMF data examining the effect of socio-economic status. Community Dent Oral Epidemiol 2004;32:183-9.
Mwalili SM, Lesaffre E, Declerck D. The zero-inflated negative binomial regression model with correction for misclassification: An example in caries research. Stat Methods Med Res 2008;17:123-39.
Hur K, Hedeker D, Henderson W, Khuri S, Daley J. Modeling clustered count data with excess zeros in health care outcomes research. Health Serv Outcomes Res Methodol 2002;3:5-20.
WHO. Oral Health Surveys: Basic Methods. 5 th
ed. Geneva: World Health Organization; 2013.
Heilbron DC. Zero-altered and other regression models for count data with added zeros. Biom J 1994;36:531-47.
Tu W. Zero-Inflated Data. In: El-Shaarawi AH, Peiegorsch WW, editors. Encyclopedia of Environmetrics. Chichester: John Wiley and Sons; 2002.
McCullagh P, Nelder JA. Generalized Linear Models. 2 nd
ed. London: Chapman & Hall; 1989.
Hinde J, Demetrio CG. Overdispersion: Models and estimation. Comput Stat Data Anal 1998;27:151-70.
Barry SC, Welsh AH. Generalized additive modelling and zero inflated count data. Ecol Modell 2002;157:179-88.
Solinas G, Campus G, Maida C, Sotgiu G, Cagetti MG, Lesaffre E, et al.
What statistical method should be used to evaluate risk factors associated with dmfs index? Evidence from the National pathfinder survey of 4-year-old Italian children. Community Dent Oral Epidemiol 2009;37:539-46.
Diane L, Eastman MA. Dental outcomes of preterm infants. Newborn Infant Nurs Rev 2003;3:93-8.
Alvarez JO, Navia JM. Nutritional status, tooth eruption, and dental caries: A review. Am J Clin Nutr 1989;49:417-26.
Iida H, Auinger P, Billings RJ, Weitzman M. Association between infant breastfeeding and early childhood caries in the United States. Pediatrics 2007;120:e944-52.
Peres KG, Bastos JR, Latorre M do R. Caries severity in children and relationship with social and behavioral aspects. Rev Saude Publica 2000;34:402-8.
Traebert J, Guimarães Ldo A, Durante EZ, Serratine AC. Low maternal schooling and severity of dental caries in Brazilian preschool children. Oral Health Prev Dent 2009;7:39-45.
Mulu W, Demilie T, Yimer M, Meshesha K, Abera B. Dental caries and associated factors among primary school children in Bahir Dar city: A cross-sectional study. BMC Res Notes 2014;7:949.
[Table 1], [Table 2]