UNIT10B：LogisticReg. - The Coefficients

pacman::p_load(caTools, ggplot2, dplyr)

【A】Preparing the Data

Data: The quality of (medical) care for diabetic patients

D = read.csv("data/quality.csv")  # Read in dataset
summary(D)

    MemberID     InpatientDays      ERVisits     OfficeVisits    Narcotics    
 Min.   :  1.0   Min.   : 0.00   Min.   : 0.0   Min.   : 0.0   Min.   : 0.00  
 1st Qu.: 33.5   1st Qu.: 0.00   1st Qu.: 0.0   1st Qu.: 7.0   1st Qu.: 0.00  
 Median : 66.0   Median : 0.00   Median : 1.0   Median :12.0   Median : 1.00  
 Mean   : 66.0   Mean   : 2.72   Mean   : 1.5   Mean   :13.2   Mean   : 4.57  
 3rd Qu.: 98.5   3rd Qu.: 3.00   3rd Qu.: 2.0   3rd Qu.:18.5   3rd Qu.: 3.00  
 Max.   :131.0   Max.   :30.00   Max.   :11.0   Max.   :46.0   Max.   :59.00  
 DaysSinceLastERVisit      Pain        TotalVisits   ProviderCount
 Min.   :  6          Min.   :  0.0   Min.   : 0.0   Min.   : 5   
 1st Qu.:207          1st Qu.:  1.0   1st Qu.: 8.0   1st Qu.:15   
 Median :641          Median :  8.0   Median :15.0   Median :20   
 Mean   :481          Mean   : 15.6   Mean   :17.4   Mean   :24   
 3rd Qu.:731          3rd Qu.: 23.0   3rd Qu.:22.5   3rd Qu.:30   
 Max.   :731          Max.   :104.0   Max.   :69.0   Max.   :82   
 MedicalClaims     ClaimLines    StartedOnCombination AcuteDrugGapSmall
 Min.   : 11.0   Min.   : 20.0   Mode :logical        Min.   : 0.00    
 1st Qu.: 25.5   1st Qu.: 83.5   FALSE:125            1st Qu.: 0.00    
 Median : 37.0   Median :120.0   TRUE :6              Median : 1.00    
 Mean   : 43.2   Mean   :142.9                        Mean   : 2.69    
 3rd Qu.: 49.5   3rd Qu.:185.0                        3rd Qu.: 3.00    
 Max.   :194.0   Max.   :577.0                        Max.   :71.00    
    PoorCare    
 Min.   :0.000  
 1st Qu.:0.000  
 Median :0.000  
 Mean   :0.252  
 3rd Qu.:0.500  
 Max.   :1.000

Before modeling, we should examine the probability of P[y=1] within the data

base = table(D$PoorCare) # the base probability
base %>% prop.table


      0       1 
0.74809 0.25191

🌻 The sample.split() generate a splitting vector that

split the entire data set into training and testing data in the ratio specified by SplitRatio.
it also reserve the distribution of y in the training and testing data sets.

set.seed(88)
split = sample.split(D$PoorCare, SplitRatio = 0.7)  # split vector
table(split) %>% prop.table()

split
  FALSE    TRUE 
0.29771 0.70229

Splitting the data D into Train (TR) and Test (TS) sets

TR = subset(D, split == TRUE)
TS = subset(D, split == FALSE)

The proportion of y==1 in the training and testing data sets (TR & TS) are close to that of D.

sapply(list(D=D,TR=TR,TS=TS), function(df) mean(df$PoorCare))

      D      TR      TS 
0.25191 0.25000 0.25641

Building and examining the model - glm1

glm1 = glm(PoorCare ~ OfficeVisits + Narcotics, TR, family=binomial)
summary(glm1)


Call:
glm(formula = PoorCare ~ OfficeVisits + Narcotics, family = binomial, 
    data = TR)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.878  -0.635  -0.517  -0.224   2.146  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -2.5662     0.5249   -4.89 0.000001 ***
OfficeVisits   0.0814     0.0310    2.62   0.0087 ** 
Narcotics      0.0620     0.0320    1.94   0.0527 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 103.470  on 91  degrees of freedom
Residual deviance:  84.528  on 89  degrees of freedom
AIC: 90.53

Number of Fisher Scoring iterations: 4

Extracting the coefficients

b = coef(glm1); b

 (Intercept) OfficeVisits    Narcotics 
   -2.566177     0.081360     0.062005

🌻 Marginal Effects in Odd Ratios:

For every predictors, exponent of its coefficient estimate its marginal effect in odd ratio.

exp(b[2])

OfficeVisits 1.0848

When OfficeVices increase by 1, the odd of y=1 is multiplied by 1.0848

exp(b[2])^2

OfficeVisits 1.1767

When OfficeVices increase by 2, the odd of y=1 is multiplied by 1.1767

exp(b[3])

Narcotics 1.064

When Narcotics increase by 1, the odd of y=1 is multiplied by 1.064

exp(b[3])^1.5

Narcotics 1.0975

When Narcotics increase by 1.5, the odd of y=1 is multiplied by 1.0975

【B】Model Coefficients and Marginal Effects

Model :

Pr[y = 1] = 1/(1+exp(-f(x)))

$Logit = f(x) = b_0 + b_1 x_1 + b_2 x_2 \;\; (1)$

$Logit = f(x) = -2.6461 + 0.0821 \times OfficeVisits + 0.0763 \times Narcotics \;\; (2)$

Coefficients & Marginal Effect :

$Odd_0 = Exp(b_0 + b_1 x_1)\;\;(3)$

$Odd_1 = Exp[b_0 + b_1(x_1+1)] = Exp(b_0 + b_1 x_1 + b_1) = Exp(b_0 + b_1 x_1) \times Exp(b_1) \;\;(4)$

$Odd_1 = Odd_0 \times Exp(b_1) \:\:(5)$

$\frac{Odd_1}{Odd_0} = Exp(b_1) \:\:(6)$

💡 Marginal Effects of Predictors

The exponent of coefficients indicates the ratio of odds

The effects multiple (rather than add) to each others

Note the difference between ratio of odds vs. Increments of Probability

The same ratio of odd incurs different increments of probability in different level of probability.

as shown below, given a ratio of odd (k=2), it’d increase the probability by

0.08 when the probability level is 0.1

0.13 when the probability level is 0.2

0.16 when the probability level is 0.3

and so on

pop = function(p, k) {o = p/(1-p); o = k * o; o/(1+o)} p0 = seq(0.1, 0.9, 0.1); k = 2 p1 = sapply(seq(0.1, 0.9, 0.1), pop, k) data.frame(k, p0, p1, increment=p1-p0) %>% round(2)

k p0 p1 increment 1 2 0.1 0.18 0.08 2 2 0.2 0.33 0.13 3 2 0.3 0.46 0.16 4 2 0.4 0.57 0.17 5 2 0.5 0.67 0.17 6 2 0.6 0.75 0.15 7 2 0.7 0.82 0.12 8 2 0.8 0.89 0.09 9 2 0.9 0.95 0.05

Sometimes it is desirable to specify the marginal effects in increments of probability. In order to do that we need to specify a level of probability. Usually we’d estimate the P[y=1] that all predictors equal to their medians as a benchmark. For an example …

df = data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics)) predict(glm1, df, type="response")

1 0.17831

Given that both predictors equals to their medians, using the model we can predict P[y=1] equals to 0.178

exp(0.0814)

[1] 1.0848

The marginal effect of OfficeVisits is exp(0.0814)=1.0848

df = data.frame(OfficeVisits = median(D$OfficeVisits)+1, Narcotics=median(D$Narcotics)) predict(glm1, df, type="response")

1 0.19054

At the median level (P[y=1] = 0.178), an unit increment in officeVisits would increase the probability of poor care to 0.191 … and

df = data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics)+1) predict(glm1, df, type="response")

1 0.18757

an unit increment in Narcotis would increase the probability of poor care to 0.188

【C】Practice

quantile(D$OfficeVisits)

0% 25% 50% 75% 100% 0.0 7.0 12.0 18.5 46.0

quantile(D$Narcotics)

0% 25% 50% 75% 100% 0 0 1 3 59

🗿 Quiz：
■ When OfficeVisits and Narcotic both equal to their first quantile (25%)：
■ What is the prob. that PoorCare = 1?
■ In term of odd ratio, what are the marginal effects of the predictors?
■ In term of probability increment, what are the marginal effects of the predictors?
■ When OfficeVisits and Narcotic both equal to their third quantile (75%)：
■ What is the prob. that PoorCare = 1?
■ In term of odd ratio, what are the marginal effects of the predictors?
■ In term of probability increment, what are the marginal effects of the predictors?
■ Comparing the above, can we tell the difference between odd ratio and increment probability?

UNIT10B：LogisticReg. - The Coefficients

Case：Predictive Medication

Tony Chuo, Sun Yatsen U., Taiwan

2022-11-23 14:59:00

【A】Preparing the Data

【B】Model Coefficients and Marginal Effects

【C】Practice