Predict default or no default for
Predict default or no default for
Predict default or no default for
Using the information on the next slide
If xj is a single predictor, then rules that determine each decision have the following form
If xj>c then go to one nodeIf xj≤c then go to the other node
This is called binary splitting
G=K∑k=1pmk(1−pmk)
mpg
dataset we will predict whether a car is a 4wd, a rear wheel drive or a front wheel drive.mpg
dataset we will predict whether a car is a 4wd, a rear wheel drive or a front wheel drive.hwy
) and engine size (displ
)rpart
package.rpart
package.mpg
data.rpart
package.mpg
data.rpart
package.mpg
data.#Find total number of observationsn<-NROW(mpg) #Create a vector allocating each observation to train #or testtrain_or_test<-ifelse(runif(n)<0.7,'Train','Test')#Add to mpg data framempg_exp<-add_column(mpg,Sample=train_or_test)#Isolate Training Data mpg_train<-filter(mpg_exp,Sample=='Train')#Isolate Test Data mpg_test<-filter(mpg_exp,Sample=='Test')
#Default Settingsrpart_small<-rpart(drv~displ+hwy,data = mpg_train)#Bigger tree#Allow for partitions with as few as two #training observations#Accept any split that improves fitrpart_big<-rpart(drv~displ+hwy,data = mpg_train, control = rpart.control(minbucket=2, cp=0))#Make predictionspred_small<-predict(rpart_small,mpg_test,type='class')pred_big<-predict(rpart_big,mpg_test,type='class')
To compute test misclassification
mean(pred_small!=mpg_test$drv)
## [1] 0.15625
mean(pred_big!=mpg_test$drv)
## [1] 0.140625
To compute test misclassification
mean(pred_small!=mpg_test$drv)
## [1] 0.15625
mean(pred_big!=mpg_test$drv)
## [1] 0.140625
In the bigger tree perform better out-of-sample (this is rare).
type='class'
option probabilities are returnedtype='class'
option probabilities are returnedtype='class'
option probabilities are returnedpred_small<-predict(rpart_small,mpg_test)pred_small
## 4 f r## 1 0.0000000 1.00000000 0.00000000## 2 0.1315789 0.81578947 0.05263158## 3 0.1315789 0.81578947 0.05263158## 4 0.1315789 0.81578947 0.05263158## 5 0.1315789 0.81578947 0.05263158## 6 0.0000000 0.00000000 1.00000000## 7 0.0000000 1.00000000 0.00000000## 8 0.0000000 1.00000000 0.00000000## 9 0.1315789 0.81578947 0.05263158## 10 0.9743590 0.02564103 0.00000000## 11 0.9743590 0.02564103 0.00000000## 12 0.9743590 0.02564103 0.00000000## 13 0.9743590 0.02564103 0.00000000## 14 0.9743590 0.02564103 0.00000000## 15 0.9743590 0.02564103 0.00000000## 16 0.7777778 0.00000000 0.22222222## 17 0.9743590 0.02564103 0.00000000## 18 0.9743590 0.02564103 0.00000000## 19 0.9743590 0.02564103 0.00000000## 20 0.9743590 0.02564103 0.00000000## 21 0.9743590 0.02564103 0.00000000## 22 0.3333333 0.00000000 0.66666667## 23 0.9743590 0.02564103 0.00000000## 24 0.9743590 0.02564103 0.00000000## 25 0.9743590 0.02564103 0.00000000## 26 0.1315789 0.81578947 0.05263158## 27 0.1315789 0.81578947 0.05263158## 28 0.0000000 0.00000000 1.00000000## 29 0.0000000 1.00000000 0.00000000## 30 0.0000000 1.00000000 0.00000000## 31 0.3125000 0.68750000 0.00000000## 32 0.8571429 0.14285714 0.00000000## 33 0.1315789 0.81578947 0.05263158## 34 0.3125000 0.68750000 0.00000000## 35 0.3125000 0.68750000 0.00000000## 36 0.1315789 0.81578947 0.05263158## 37 0.9743590 0.02564103 0.00000000## 38 0.7777778 0.00000000 0.22222222## 39 0.9743590 0.02564103 0.00000000## 40 0.9743590 0.02564103 0.00000000## 41 0.9743590 0.02564103 0.00000000## 42 0.9743590 0.02564103 0.00000000## 43 0.0000000 1.00000000 0.00000000## 44 0.1315789 0.81578947 0.05263158## 45 0.9743590 0.02564103 0.00000000## 46 0.0000000 0.00000000 1.00000000## 47 0.3125000 0.68750000 0.00000000## 48 0.9743590 0.02564103 0.00000000## 49 0.9743590 0.02564103 0.00000000## 50 0.9743590 0.02564103 0.00000000## 51 0.0000000 1.00000000 0.00000000## 52 0.0000000 1.00000000 0.00000000## 53 0.1315789 0.81578947 0.05263158## 54 0.1315789 0.81578947 0.05263158## 55 0.0000000 1.00000000 0.00000000## 56 0.0000000 1.00000000 0.00000000## 57 0.7777778 0.00000000 0.22222222## 58 0.9743590 0.02564103 0.00000000## 59 0.9743590 0.02564103 0.00000000## 60 0.1315789 0.81578947 0.05263158## 61 0.0000000 1.00000000 0.00000000## 62 0.0000000 1.00000000 0.00000000## 63 0.1315789 0.81578947 0.05263158## 64 0.0000000 1.00000000 0.00000000
rpart.plot
is good for plotting trees themselves.rpart.plot(rpart_small,extra = 0,type = 0)rpart.plot(rpart_small,extra = 0,type = 0)
rpart.plot
actually provides more information. Try the followingrpart.plot
actually provides more information. Try the followingrpart.plot(rpart_small)
rpart.plot
actually provides more information. Try the followingrpart.plot(rpart_small)
rpart.plot
actually provides more information. Try the followingrpart.plot(rpart_small)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |