`::p_load(caTools, ggplot2, dplyr) pacman`

Data: The quality of (medical) care for diabetic patients

```
= read.csv("data/quality.csv") # Read in dataset
D summary(D)
```

```
MemberID InpatientDays ERVisits OfficeVisits Narcotics
Min. : 1.0 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 33.5 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 7.0 1st Qu.: 0.00
Median : 66.0 Median : 0.00 Median : 1.0 Median :12.0 Median : 1.00
Mean : 66.0 Mean : 2.72 Mean : 1.5 Mean :13.2 Mean : 4.57
3rd Qu.: 98.5 3rd Qu.: 3.00 3rd Qu.: 2.0 3rd Qu.:18.5 3rd Qu.: 3.00
Max. :131.0 Max. :30.00 Max. :11.0 Max. :46.0 Max. :59.00
DaysSinceLastERVisit Pain TotalVisits ProviderCount
Min. : 6 Min. : 0.0 Min. : 0.0 Min. : 5
1st Qu.:207 1st Qu.: 1.0 1st Qu.: 8.0 1st Qu.:15
Median :641 Median : 8.0 Median :15.0 Median :20
Mean :481 Mean : 15.6 Mean :17.4 Mean :24
3rd Qu.:731 3rd Qu.: 23.0 3rd Qu.:22.5 3rd Qu.:30
Max. :731 Max. :104.0 Max. :69.0 Max. :82
MedicalClaims ClaimLines StartedOnCombination AcuteDrugGapSmall
Min. : 11.0 Min. : 20.0 Mode :logical Min. : 0.00
1st Qu.: 25.5 1st Qu.: 83.5 FALSE:125 1st Qu.: 0.00
Median : 37.0 Median :120.0 TRUE :6 Median : 1.00
Mean : 43.2 Mean :142.9 Mean : 2.69
3rd Qu.: 49.5 3rd Qu.:185.0 3rd Qu.: 3.00
Max. :194.0 Max. :577.0 Max. :71.00
PoorCare
Min. :0.000
1st Qu.:0.000
Median :0.000
Mean :0.252
3rd Qu.:0.500
Max. :1.000
```

Before modeling, we should examine the probability of
`P[y=1]`

within the data

```
= table(D$PoorCare) # the base probability
base %>% prop.table base
```

```
0 1
0.74809 0.25191
```

🌻 The `sample.split()`

generate a splitting vector
that

- split the entire data set into training and testing data in the
ratio specified by
`SplitRatio`

. - it also reserve the distribution of
`y`

in the training and testing data sets.

```
set.seed(88)
= sample.split(D$PoorCare, SplitRatio = 0.7) # split vector
split table(split) %>% prop.table()
```

```
split
FALSE TRUE
0.29771 0.70229
```

Splitting the data `D`

into Train (`TR`

) and
Test (`TS`

) sets

```
= subset(D, split == TRUE)
TR = subset(D, split == FALSE) TS
```

The proportion of `y==1`

in the training and testing data
sets (`TR`

& `TS`

) are close to that of
`D`

.

`sapply(list(D=D,TR=TR,TS=TS), function(df) mean(df$PoorCare))`

```
D TR TS
0.25191 0.25000 0.25641
```

Building and examining the model - `glm1`

```
= glm(PoorCare ~ OfficeVisits + Narcotics, TR, family=binomial)
glm1 summary(glm1)
```

```
Call:
glm(formula = PoorCare ~ OfficeVisits + Narcotics, family = binomial,
data = TR)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.878 -0.635 -0.517 -0.224 2.146
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.5662 0.5249 -4.89 0.000001 ***
OfficeVisits 0.0814 0.0310 2.62 0.0087 **
Narcotics 0.0620 0.0320 1.94 0.0527 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 103.470 on 91 degrees of freedom
Residual deviance: 84.528 on 89 degrees of freedom
AIC: 90.53
Number of Fisher Scoring iterations: 4
```

Extracting the coefficients

`= coef(glm1); b b `

```
(Intercept) OfficeVisits Narcotics
-2.566177 0.081360 0.062005
```

🌻 **Marginal Effects in Odd Ratios**:

For every predictors, **exponent** of its coefficient estimate its
marginal effect in **odd ratio .**

`exp(b[2]) `

```
OfficeVisits
1.0848
```

When `OfficeVices`

increase by 1, the odd of y=1 is
multiplied by 1.0848

`exp(b[2])^2 `

```
OfficeVisits
1.1767
```

When `OfficeVices`

increase by 2, the odd of y=1 is
multiplied by 1.1767

`exp(b[3])`

```
Narcotics
1.064
```

When `Narcotics`

increase by 1, the odd of y=1 is
multiplied by 1.064

`exp(b[3])^1.5`

```
Narcotics
1.0975
```

When `Narcotics`

increase by 1.5, the odd of y=1 is
multiplied by 1.0975

**Model :**

`Pr[y = 1] = 1/(1+exp(-f(x)))`

\(Logit = f(x) = b_0 + b_1 x_1 + b_2 x_2 \;\; (1)\)

\(Logit = f(x) = -2.6461 + 0.0821 \times OfficeVisits + 0.0763 \times Narcotics \;\; (2)\)

**Coefficients & Marginal Effect :**

\(Odd_0 = Exp(b_0 + b_1 x_1)\;\;(3)\)

\(Odd_1 = Exp[b_0 + b_1(x_1+1)] = Exp(b_0 + b_1 x_1 + b_1) = Exp(b_0 + b_1 x_1) \times Exp(b_1) \;\;(4)\)

\(Odd_1 = Odd_0 \times Exp(b_1) \:\:(5)\)

\(\frac{Odd_1}{Odd_0} = Exp(b_1) \:\:(6)\)

💡 Marginal Effects of Predictors

- The exponent of coefficients indicates the ratio of odds
- The effects multiple (rather than add) to each others
- Note the difference between
**ratio of odds**vs.**Increments of Probability** - The same ratio of odd incurs different increments of probability in different level of probability.
- as shown below, given a ratio of odd (
`k=2`

), it’d increase the probability by- 0.08 when the probability level is 0.1
- 0.13 when the probability level is 0.2
- 0.16 when the probability level is 0.3
- and so on

```
= function(p, k) {o = p/(1-p); o = k * o; o/(1+o)}
pop = seq(0.1, 0.9, 0.1); k = 2
p0 = sapply(seq(0.1, 0.9, 0.1), pop, k)
p1 data.frame(k, p0, p1, increment=p1-p0) %>% round(2)
```

```
k p0 p1 increment
1 2 0.1 0.18 0.08
2 2 0.2 0.33 0.13
3 2 0.3 0.46 0.16
4 2 0.4 0.57 0.17
5 2 0.5 0.67 0.17
6 2 0.6 0.75 0.15
7 2 0.7 0.82 0.12
8 2 0.8 0.89 0.09
9 2 0.9 0.95 0.05
```

Sometimes it is desirable to specify the marginal effects in
increments of probability. In order to do that we need to specify a
level of probability. Usually we’d estimate the `P[y=1]`

that
all predictors equal to their medians as a benchmark. For an example
…

```
= data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics))
df predict(glm1, df, type="response")
```

```
1
0.17831
```

Given that both predictors equals to their medians, using the model
we can predict `P[y=1]`

equals to 0.178

`exp(0.0814)`

`[1] 1.0848`

The marginal effect of `OfficeVisits`

is
`exp(0.0814)=1.0848`

```
= data.frame(OfficeVisits = median(D$OfficeVisits)+1, Narcotics=median(D$Narcotics))
df predict(glm1, df, type="response")
```

```
1
0.19054
```

At the median level (P[y=1] = 0.178), an unit increment in
`officeVisits`

would increase the probability of poor care to
0.191 … and

```
= data.frame(OfficeVisits = median(D$OfficeVisits), Narcotics=median(D$Narcotics)+1)
df predict(glm1, df, type="response")
```

```
1
0.18757
```

an unit increment in `Narcotis`

would increase the
probability of poor care to 0.188

`quantile(D$OfficeVisits)`

```
0% 25% 50% 75% 100%
0.0 7.0 12.0 18.5 46.0
```

`quantile(D$Narcotics)`

```
0% 25% 50% 75% 100%
0 0 1 3 59
```

🗿 Quiz：

■ When
`OfficeVisits`

and `Narcotic`

both equal to their
first quantile (25%)：

■ What is the prob. that
`PoorCare = 1`

?

■ In term of odd ratio, what are the
marginal effects of the predictors?

■ In term of probability
increment, what are the marginal effects of the predictors?

■ When
`OfficeVisits`

and `Narcotic`

both equal to their
third quantile (75%)：

■ What is the prob. that
`PoorCare = 1`

?

■ In term of odd ratio, what are the
marginal effects of the predictors?

■ In term of probability
increment, what are the marginal effects of the predictors?

■
Comparing the above, can we tell the difference between **odd
ratio** and **increment probability**?