


Predict default or no default for
Predict default or no default for
Predict default or no default for
Using the information on the next slide

If xj is a single predictor, then rules that determine each decision have the following form
If xj>c then go to one nodeIf xj≤c then go to the other node
This is called binary splitting

G=K∑k=1pmk(1−pmk)







mpg dataset we will predict whether a car is a 4wd, a rear wheel drive or a front wheel drive.mpg dataset we will predict whether a car is a 4wd, a rear wheel drive or a front wheel drive.hwy) and engine size (displ)






rpart package.rpart package.mpg data.rpart package.mpg data.rpart package.mpg data.#Find total number of observationsn<-NROW(mpg) #Create a vector allocating each observation to train #or testtrain_or_test<-ifelse(runif(n)<0.7,'Train','Test')#Add to mpg data framempg_exp<-add_column(mpg,Sample=train_or_test)#Isolate Training Data mpg_train<-filter(mpg_exp,Sample=='Train')#Isolate Test Data mpg_test<-filter(mpg_exp,Sample=='Test')#Default Settingsrpart_small<-rpart(drv~displ+hwy,data = mpg_train)#Bigger tree#Allow for partitions with as few as two #training observations#Accept any split that improves fitrpart_big<-rpart(drv~displ+hwy,data = mpg_train, control = rpart.control(minbucket=2, cp=0))#Make predictionspred_small<-predict(rpart_small,mpg_test,type='class')pred_big<-predict(rpart_big,mpg_test,type='class')To compute test misclassification
mean(pred_small!=mpg_test$drv)
## [1] 0.15625mean(pred_big!=mpg_test$drv)
## [1] 0.140625To compute test misclassification
mean(pred_small!=mpg_test$drv)
## [1] 0.15625mean(pred_big!=mpg_test$drv)
## [1] 0.140625In the bigger tree perform better out-of-sample (this is rare).
type='class' option probabilities are returnedtype='class' option probabilities are returnedtype='class' option probabilities are returnedpred_small<-predict(rpart_small,mpg_test)pred_small
## 4 f r## 1 0.0000000 1.00000000 0.00000000## 2 0.1315789 0.81578947 0.05263158## 3 0.1315789 0.81578947 0.05263158## 4 0.1315789 0.81578947 0.05263158## 5 0.1315789 0.81578947 0.05263158## 6 0.0000000 0.00000000 1.00000000## 7 0.0000000 1.00000000 0.00000000## 8 0.0000000 1.00000000 0.00000000## 9 0.1315789 0.81578947 0.05263158## 10 0.9743590 0.02564103 0.00000000## 11 0.9743590 0.02564103 0.00000000## 12 0.9743590 0.02564103 0.00000000## 13 0.9743590 0.02564103 0.00000000## 14 0.9743590 0.02564103 0.00000000## 15 0.9743590 0.02564103 0.00000000## 16 0.7777778 0.00000000 0.22222222## 17 0.9743590 0.02564103 0.00000000## 18 0.9743590 0.02564103 0.00000000## 19 0.9743590 0.02564103 0.00000000## 20 0.9743590 0.02564103 0.00000000## 21 0.9743590 0.02564103 0.00000000## 22 0.3333333 0.00000000 0.66666667## 23 0.9743590 0.02564103 0.00000000## 24 0.9743590 0.02564103 0.00000000## 25 0.9743590 0.02564103 0.00000000## 26 0.1315789 0.81578947 0.05263158## 27 0.1315789 0.81578947 0.05263158## 28 0.0000000 0.00000000 1.00000000## 29 0.0000000 1.00000000 0.00000000## 30 0.0000000 1.00000000 0.00000000## 31 0.3125000 0.68750000 0.00000000## 32 0.8571429 0.14285714 0.00000000## 33 0.1315789 0.81578947 0.05263158## 34 0.3125000 0.68750000 0.00000000## 35 0.3125000 0.68750000 0.00000000## 36 0.1315789 0.81578947 0.05263158## 37 0.9743590 0.02564103 0.00000000## 38 0.7777778 0.00000000 0.22222222## 39 0.9743590 0.02564103 0.00000000## 40 0.9743590 0.02564103 0.00000000## 41 0.9743590 0.02564103 0.00000000## 42 0.9743590 0.02564103 0.00000000## 43 0.0000000 1.00000000 0.00000000## 44 0.1315789 0.81578947 0.05263158## 45 0.9743590 0.02564103 0.00000000## 46 0.0000000 0.00000000 1.00000000## 47 0.3125000 0.68750000 0.00000000## 48 0.9743590 0.02564103 0.00000000## 49 0.9743590 0.02564103 0.00000000## 50 0.9743590 0.02564103 0.00000000## 51 0.0000000 1.00000000 0.00000000## 52 0.0000000 1.00000000 0.00000000## 53 0.1315789 0.81578947 0.05263158## 54 0.1315789 0.81578947 0.05263158## 55 0.0000000 1.00000000 0.00000000## 56 0.0000000 1.00000000 0.00000000## 57 0.7777778 0.00000000 0.22222222## 58 0.9743590 0.02564103 0.00000000## 59 0.9743590 0.02564103 0.00000000## 60 0.1315789 0.81578947 0.05263158## 61 0.0000000 1.00000000 0.00000000## 62 0.0000000 1.00000000 0.00000000## 63 0.1315789 0.81578947 0.05263158## 64 0.0000000 1.00000000 0.00000000rpart.plot is good for plotting trees themselves.rpart.plot(rpart_small,extra = 0,type = 0)rpart.plot(rpart_small,extra = 0,type = 0)

rpart.plot actually provides more information. Try the followingrpart.plot actually provides more information. Try the followingrpart.plot(rpart_small)rpart.plot actually provides more information. Try the followingrpart.plot(rpart_small)
rpart.plot actually provides more information. Try the followingrpart.plot(rpart_small)

Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |