Need to level up your analysis skills? The best way is to get your hands dirty with actual data. But let's face it , sitting through the overused Titanic dataset won't exactly peak your interest.
That's why I've compiled a list of really interesting datasets you can dig into. Each of them has a couple of questions to consider in order to guide your analysis.
Coffee Shop's Sales Data
It’s the perfect dataset to put on your business analyst hat and find the patterns that could help a real coffee shop boost its profits.
Here are some questions you can answer with your EDA:
When are the peak business hours? .
What are the best-selling coffee drinks?
How do sales trends differ between weekdays and weekends? Is Monday a high-traffic day, or does the weekend bring in more customers?
Does demand change with the seasons?
What is the preferred payment method? Understanding the split between cash and card payments can inform decisions about payment processing systems.
The Sweet Science of Chocolate Bar Ratings
This dataset contains expert ratings of over 1,700 individual chocolate bars, including their origin, cocoa percentage, and the variety of bean used.
Here are some questions you can answer with your EDA:
Which countries of bean origin consistently produce the highest-rated chocolate? Is there a difference between where the beans are grown and where the bar is made?
What is the relationship between cocoa percentage and the chocolate's rating?
Which companies produce the most highly-rated chocolate bars? Do they specialize in a particular type of bean or origin?
How do ingredients like vanilla, lecithin, or salt affect the final rating of a chocolate bar?
Screen-Time of kids in India
This dataset provides a comprehensive view of children's screen time, connecting their usage habits directly to potential health outcomes and environmental factors.
Here are some questions you can answer with your EDA:
What are the most commonly reported Health_Impacts? Is there a strong correlation between the Avg_Daily_Screen_Time_hr and the likelihood of reporting a specific health impact like "eye strain" or "sleep issues"?
Does the Primary_Device used have an impact on the total screen time or the reported health effects?
The Learning Balance: What is the average Educational_to_Recreational_Ratio? Does this ratio differ by Age?
Urban vs. Rural Divide: How does the Avg_Daily_Screen_Time_hr differ between children in Urban_or_Rural areas?
Gender Dynamics: Are there any noticeable differences in the average screen time, primary device choice, or the educational-to-recreational ratio between genders?
Board Games Geek Dataset
This dataset contains information on over 20,000 board games, including their complexity , average rating, recommended number of players, and categories (like 'Strategy' or 'Party').
Here are some questions you can answer with your EDA:
Do more complex ("heavy") games receive higher or lower ratings on average?
What is the ideal number of players for the highest-rated games? Do games that support a wider range of players tend to be more popular?
Which game categories (e.g., Wargames, Family, Abstract) are the most common, and which have the highest average ratings?
How has the average complexity or rating of board games changed over the years? Are modern board games more or less complex than older ones
Student Stress Monitoring Dataset
This dataset contains physiological data collected from students, designed for the purpose of detecting their stress levels derived from surveys and includes 20 features
Here are some questions you can answer with your EDA:
Do students with high future_career_concerns also report higher levels of anxiety?
What is the relationship between academic performence and self-esteem?
Is there a difference in self esteem between students who have experienced bullying and those who have not?
Is there a relationship between sleep quality and anxiety level?
Do individuals with high depression scores report a higher frequency of headache or breathing problems?
Is there a connection between a high noise level and poor sleep quality?
Do students who engage in more extracurricular activities report lower anxiety_levels?
Mushroom Dataset
It includes over 8,000 samples of mushrooms, described by 22 different features like cap shape, color, gill size, and habitat.
Here are some questions you can answer with your EDA:
Are poisonous mushrooms more likely to be found in certain habitats (e.g., woods, meadows, urban areas)?
Is there a relationship between the cap color and the gill color of a mushroom? Do certain color combinations indicate toxicity?
Can you find any features that have almost no correlation with whether a mushroom is poisonous, debunking common myths?
Phone Price Dataset
This dataset tells you why a phone might be priced the way it is. It’s a perfect scenario for practicing feature analysis to understand value. The dataset is perfectly set up for you to explore correlations and uncover surprising relationships
Here are some questions you can answer with your EDA:
How strong is the relationship between the amount of RAM and the phone's price range?
Do more megapixels in the primary camera or front camera actually lead to a higher price?
Is there a clear trend where phones with a higher battery power command a higher price?
Does a larger screen height and width consistently place a phone in a more expensive category?
How does the amount of internal memory correlate with price? Is it as influential as RAM?
World Happiness Dataset
This dataset contains data from the annual World Happiness Report, which ranks countries based on factors like GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption.
Here are some questions you can answer with your EDA:
Which factor (e.g., GDP, social support, life expectancy) has the strongest correlation with a country's overall happiness score?
Are there specific regions of the world that consistently rank higher or lower?
How have happiness scores for different countries changed between 2015 and 2022? Which countries have seen the most significant increase or decrease in happiness?
These datasets are more than just rows and columns they're a playground for your curiosity, waiting for you to uncover the next great insight. Whether you're now inspired to predict a coffee shop's rush hour or highly rated chocolates, we hope this list has sparked your interest. So go on, fire up your favorite analysis tool, and pick your adventure.
Resources:
Get started with EDA using python : Link
Complete Data Analyst Roadmap: Link