UNIT10B：LogisticReg. - The Coefficients

pacman::p_load(caTools, ggplot2, dplyr)

【A】Preparing the Data

Data: The quality of (medical) care for diabetic patients

D = read.csv("data/quality.csv")  # Read in dataset
summary(D)

    MemberID     InpatientDays      ERVisits     OfficeVisits    Narcotics    
 Min.   :  1.0   Min.   : 0.00   Min.   : 0.0   Min.   : 0.0   Min.   : 0.00  
 1st Qu.: 33.5   1st Qu.: 0.00   1st Qu.: 0.0   1st Qu.: 7.0   1st Qu.: 0.00  
 Median : 66.0   Median : 0.00   Median : 1.0   Median :12.0   Median : 1.00  
 Mean   : 66.0   Mean   : 2.72   Mean   : 1.5   Mean   :13.2   Mean   : 4.57  
 3rd Qu.: 98.5   3rd Qu.: 3.00   3rd Qu.: 2.0   3rd Qu.:18.5   3rd Qu.: 3.00  
 Max.   :131.0   Max.   :30.00   Max.   :11.0   Max.   :46.0   Max.   :59.00  
 DaysSinceLastERVisit      Pain        TotalVisits   ProviderCount
 Min.   :  6          Min.   :  0.0   Min.   : 0.0   Min.   : 5   
 1st Qu.:207          1st Qu.:  1.0   1st Qu.: 8.0   1st Qu.:15   
 Median :641          Median :  8.0   Median :15.0   Median :20   
 Mean   :481          Mean   : 15.6   Mean   :17.4   Mean   :24   
 3rd Qu.:731          3rd Qu.: 23.0   3rd Qu.:22.5   3rd Qu.:30   
 Max.   :731          Max.   :104.0   Max.   :69.0   Max.   :82   
 MedicalClaims     ClaimLines    StartedOnCombination AcuteDrugGapSmall
 Min.   : 11.0   Min.   : 20.0   Mode :logical        Min.   : 0.00    
 1st Qu.: 25.5   1st Qu.: 83.5   FALSE:125            1st Qu.: 0.00    
 Median : 37.0   Median :120.0   TRUE :6              Median : 1.00    
 Mean   : 43.2   Mean   :142.9                        Mean   : 2.69    
 3rd Qu.: 49.5   3rd Qu.:185.0                        3rd Qu.: 3.00    
 Max.   :194.0   Max.   :577.0                        Max.   :71.00    
    PoorCare    
 Min.   :0.000  
 1st Qu.:0.000  
 Median :0.000  
 Mean   :0.252  
 3rd Qu.:0.500  
 Max.   :1.000

Before modeling, we should examine the probability of P[y=1] within the data

base = table(D$PoorCare) # the base probability
base %>% prop.table


      0       1 
0.74809 0.25191

🌻 The sample.split() generate a splitting vector that

split the entire data set into training and testing data in the ratio specified by SplitRatio
make sure that, within the training and testing data sets, the composition of y is (almost) identical.

set.seed(88)
split = sample.split(D$PoorCare, SplitRatio = 0.7)  # split vector
table(split) %>% prop.table()

split
  FALSE    TRUE 
0.29771 0.70229

table(D$PoorCare, split) %>% prop.table(2)

   split
      FALSE    TRUE
  0 0.74359 0.75000
  1 0.25641 0.25000

Splitting the data D into Train (TR) and Test (TS) sets

TR = subset(D, split == TRUE)
TS = subset(D, split == FALSE)

Building and examining the model - glm1

glm1 = glm(PoorCare ~ OfficeVisits + Narcotics, TR, family=binomial)
summary(glm1)


Call:
glm(formula = PoorCare ~ OfficeVisits + Narcotics, family = binomial, 
    data = TR)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.878  -0.635  -0.517  -0.224   2.146  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -2.5662     0.5249   -4.89 0.000001 ***
OfficeVisits   0.0814     0.0310    2.62   0.0087 ** 
Narcotics      0.0620     0.0320    1.94   0.0527 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 103.470  on 91  degrees of freedom
Residual deviance:  84.528  on 89  degrees of freedom
AIC: 90.53

Number of Fisher Scoring iterations: 4

【B】Model Coefficients and Marginal Effects

Model :

Pr[y = 1] = 1/(1+exp(-f(x)))
\(Logit = f(x) = b_0 + b_1 x_1 + b_2 x_2 \;\; (1)\)
\(Logit = f(x) = -2.6461 + 0.0821 \times OfficeVisits + 0.0763 \times Narcotics \;\; (2)\)

Coefficients & Marginal Effect :

\(Odd_0 = Exp(b_0 + b_1 x_1)\;\;(3)\)
\(Odd_1 = Exp[b_0 + b_1(x_1+1)] = Exp(b_0 + b_1 x_1 + b_1) = Exp(b_0 + b_1 x_1) \times Exp(b_1) \;\;(4)\)
\(Odd_1 = Odd_0 \times Exp(b_1) \:\:(5)\)
\(\frac{Odd_1}{Odd_0} = Exp(b_1) \:\:(6)\)

💡 Marginal Effects of Predictors

The exponent of coefficients indicates the ratio of odds
The effects multiple (rather than add) to each others
Note the difference between ratio of odds vs. Increments of Probability
The same ratio of odd incurs different increments of probability in different level of probability.
as shown below, given a ratio of odd (k=2), it’d increase the probability by
- 0.08 when the probability level is 0.1
- 0.13 when the probability level is 0.2
- 0.16 when the probability level is 0.3
- and so on

pop = function(p, k) {o = p/(1-p);  o = k * o; o/(1+o)}
p0 = seq(0.1, 0.9, 0.1); k = 2
p1 = sapply(seq(0.1, 0.9, 0.1), pop, k)
data.frame(k, p0, p1, increment=p1-p0) %>% round(2)

  k  p0   p1 increment
1 2 0.1 0.18      0.08
2 2 0.2 0.33      0.13
3 2 0.3 0.46      0.16
4 2 0.4 0.57      0.17
5 2 0.5 0.67      0.17
6 2 0.6 0.75      0.15
7 2 0.7 0.82      0.12
8 2 0.8 0.89      0.09
9 2 0.9 0.95      0.05

Sometimes it is desirable to specify the marginal effects in increments of probability. In order to do that we need to specify a level of probability. Usually we’d estimate the P[y=1] that all predictors equal to their medians as a benchmark. For an example …

df = data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics))
predict(glm1, df, type="response")

      1 
0.17831

Given that both predictors equals to their medians, using the model we can predict P[y=1] equals to 0.178

exp(0.0814)

[1] 1.0848

The marginal effect of OfficeVisits is exp(0.0814)=1.0848

df = data.frame(OfficeVisits = median(D$OfficeVisits)+1, Narcotics=median(D$Narcotics))
predict(glm1, df, type="response")

      1 
0.19054

At the median level (P[y=1] = 0.178), an unit increment in officeVisits would increase the probability of poor care to 0.191 … and

df = data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics)+1)
predict(glm1, df, type="response")

      1 
0.18757

an unit increment in Narcotis would increase the probability of poor care to 0.188

【C】Practice

quantile(D$OfficeVisits)

  0%  25%  50%  75% 100% 
 0.0  7.0 12.0 18.5 46.0

quantile(D$Narcotics)

  0%  25%  50%  75% 100% 
   0    0    1    3   59

🗿 Quiz：
■ When OfficeVisits and Narcotic both equal to their first quantile (25%)：
■ What is the prob. that PoorCare = 1?
■ In term of odd ratio, what are the marginal effects of the predictors?
■ In term of probability increment, what are the marginal effects of the predictors?
■ When OfficeVisits and Narcotic both equal to their third quantile (75%)：
■ What is the prob. that PoorCare = 1?
■ In term of odd ratio, what are the marginal effects of the predictors?
■ In term of probability increment, what are the marginal effects of the predictors?
■ Comparing the above, can we tell the difference between odd ratio and increment probability?

UNIT10B：LogisticReg. - The Coefficients

Case：Predictive Medication

Tony Chuo, Sun Yatsen U., Taiwan

2021-12-25 14:02:14

【A】Preparing the Data

【B】Model Coefficients and Marginal Effects

【C】Practice