New York City Airbnb (II) Model Building (with one-hot encoding and IQR)
延續上一篇的主題,今天將會針對紐約市的Airbnb去進行建模的工作: 那我們趕緊開始: 1. 分成訓練集與測試集,7:3的比例 airbnb <- airbnb %>% mutate(id = row_number()) airbnb_train <- airbnb %>% sample_frac(.7) %>% filter(price > 0) airbnb_test <- anti_join(airbnb, airbnb_train, by = 'id') %>% filter(price > 0) nrow(airbnb_train) + nrow(airbnb_test) == nrow(airbnb %>% filter(price > 0)) 2. 由於Neighbourhood本來為字串變數,故這裡使用Dummy 技巧將它轉為Dummy Variable,或者稱為One hot Encoding DummyTable <- model.matrix( ~ neighbourhood + neighbourhood_group + room_type, data = air) new <- cbind(air , DummyTable[,-1]) new <- new[,-c(2,3,6)] id latitude longitude price minimum_nights number_of_reviews reviews_per_month 1 1 40.64749 -73.97237 149 1 9 0.21 2 2 40.75362 -73.98377 225 1 45 0.38 3 3 40.80902 -73.94190 150 3 0 0.00 4 4 40.68514 -73.95976 89 1 ...