Tree-based models such as decision trees, random forests, and boosted trees provide powerful predictions and are fast to compute. There are many different ways to fit these models in R, including the rpart, randomForest, and xgboost packages. During this talk, we'll examine numerous ways to fit each of these model types (and more!) and compare them based on user-friendliness, accuracy, and speed.
Jared P. Lander is Chief Data Scientist of Lander Analytics, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a master's from Columbia University in statistics and a bachelor's from Muhlenberg College in mathematics, he has experience in both academic research and industry. Jared oversees the long-term direction of the company and acts as Lead Data Scientist, researching the best strategy, models, and algorithms for modern data needs. This is in addition to his client-facing consulting and training. He specializes in data management, multilevel models, machine learning, generalized linear models, data management, visualization, and statistical computing. He is the author of R for Everyone, a book about R Programming geared toward Data Scientists and Non-Statisticians alike. The book is available from Amazon, Barnes & Noble, and InformIT. The material is drawn from the classes he teaches at Columbia and is incorporated into his corporate training. Very active in the data community, Jared is a frequent speaker at conferences, universities, and meetups around the world. He is a member of the Strata New York selection committee. His writings on statistics can be found at jaredlander.com.