As a Udemy Data Scientist Nanodegree Program student, I'm tasked with writing a blog post and a kernel following the CRISP-DM process. In my blog post, I'll take a fresh approach by adhering to the CRISP-DM process to address three fundamental questions often posed in the housing markets, using the Ames dataset as a case study.
The Kaggle House Prices - Advanced Regression Techniques competition is a fantastic playground for budding data scientists like myself. It challenges us to predict house prices in Ames, Iowa, leveraging 79 predictor variables through machine learning models. This well-analyzed dataset has received over 20,000 submissions, making it an excellent resource for developing and showcasing our skills.
Objectives
In my blog post, I'll take a fresh approach by adhering to the CRISP-DM process to address three fundamental questions often posed in the housing markets, using the Ames dataset as a case study.
What are the main house price ranges?
Identify the primary price ranges for houses in the dataset. Identifying the specific price ranges encompassing most homes and their distribution is essential. This information will help segment the housing market and tailor the analysis to the most relevant price ranges.
Which areas can you locate these price ranges?
Determine the areas or neighborhoods where these price ranges are concentrated. Identifying the geographic areas or neighborhoods associated with different price ranges is crucial. I can uncover patterns and identify undervalued or overvalued regions by mapping price ranges to specific areas.
What variables best predict the price range of each home?
Identify the key variables that best predict the price range of each home. Determining the most influential variables that accurately predict the price range for individual homes.
Following the CRISP-DM process, I'll systematically analyze and preprocess the data, build predictive models, and present the findings in a comprehensive blog post and notebook. This project will allow me to showcase my skills, including data exploration, feature engineering, model selection, and result interpretation.
The notebook and source code are available here: https://github.com/anibalsanchez/answering-house-prices-questions-based-on-advanced-regression-techniques