The Relationship between Weather and Yelp Reviews

1. Introduction


Does the weather affect people's moods so significantly that they will rate restaurants lower or higher? In this project, we use Yelp and NOAA weather data to investigate the relationship between the weather and restaurant reviews. In the previous report, we explained our data collection process - pulling from both Yelp and NOAA.gov. That process resulted in ~17,000 Yelp reviews from ~5,800 restaurants across the US, and daily weather measures (precipitation, snow and temperature) from 7,711 weather stations across the US.

In this project, we begin by merging the datasets (by latitude and longitude) to explore the relationship between the reviews and the weather on that the day of the review. From there, we compute sentiment analysis on the text reviews and perform exploratory and predictive analysis on the data. We do not find a strong correlation between Yelp reviews and weather. Using machine learning classification methods, we cannot predict the rating based on weather and location better than random chance. Although we cannot support our principle data science question, we find some interesting results.

The project is divided into the following sections. Section 2 explains the Preprocessing, followed by Binning in Section 3, Sentiment Analysis in Section 4, and Basic Statistical Analysis in Section 5. From there, we will examine Histograms and Outliers in Section 6 and review our Data Cleaning in Section 7. Next, we compute Correlations and examine Scatterplots of the data in Section 8. In Section 9 we discuss Clustering. In Section 10, we compute Association Rules. Finally, in Section 11, we compute Hypothesis Testing.