Pandas Capstone Project: Visualizing apps from the Apple App Store
Pandas Capstone Project: Visualizing apps from the Apple App Store Data Science Project
Intro to Pandas for Data Analysis

Pandas Capstone Project: Visualizing apps from the Apple App Store

Explore Apple app store data with this EDA project. From loading and preparing data to creating insightful visualizations, you'll analyze app reviews, trends over time, and compare free vs paid apps. Dive into histograms, line plots, scatter plots, and bar charts while mastering pandas and matplotlib.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Create a column `Average Size in MB` using the column `Size_Bytes`

1024 bytes form a kilobyte. 1024 kilobytes form a Megabyte. So, a Megabyte is equal to 1,048,576 bytes, that is (1024 * 1024) bytes.

Here's an example of how your column should look like:

codevalidated

Find the top 20 apps by number of reviews

Create a dataframe top_20_apps_df that contains the Top 20 apps with most reviews, sorted in Descending order containing all the columns. The first 5 apps should be:

YouTube: Watch, Listen, Stream
Instagram
Spotify New Music and Podcasts
Venmo
DoorDash - Food Delivery

And your dataframe should look something like:

codevalidated

Create a histogram showing the number of reviews apps get

Most published apps get (sadly) just forgotten in the App Store without anybody ever downloading or reviewing them. Let's take a look at that. Create a histogram showing the distribution of Reviews.

Create your plot using pandas .plot method, set the title to Distributions of Reviews in the AppStore and a figure size of (14, 7).

Your plot should look something like:

codevalidated

Plot the distribution of Reviews with `100` bins and a Y limit to `100`

The previous distribution plot doesn't help much with the analysis. We see a big blue bar on the "low" side of reviews and not much more. So, let's zoom in.

Create another histogram showing the distribution of the reviews. Set the title to Distributions of Reviews in the AppStore (max 100 apps per bin), a figure size of (14, 7).

But IMPORTANT, this time, set the number of bins to 100 and the maximum value of the Y axis to 100 as well. That will show on the Y axis a bar of up to 100 Reviews, and it'll allow us to better see the other apps.

Your plot should look something like:

codevalidated

Plot the same distribution as before, but improve the X-axis format

If you look closely on the X-axis of our previous plot, it's a little bit hard to read because by default, pandas and matplotlib use scientific notation (1e7). Plot THE SAME distribution plot (same title, bins, everything), but this time, change the X axis to use a human-readable formatter using commas as thousands separators, and with no decimals. So for example, the number 1e5 will become 100,000.

Your plot should look something like:

codevalidated

Create a line plot that displays the total apps released by month

Using monthly_app_summary_df, create a line plot using the Month Released column in the x-axis, and the Total Apps Released in the Y axis.

The title of the plot should be "Total apps released per month".

Your plot should look something like:

codevalidated

Create a scatter plot showing the `Month Released` in the X-axis and `Average Size in MB` in the Y-axis

Now create a scatter plot to show the relationship between the Month the app was released and the Average size in MB, let's try to uncover any trends!

The title of your plot should be "Average size of apps over time".

Your plot should look something like:

codevalidated

Create a scatter plot that shows the number of reviews in the X axis, and the rating on the Y axis

Use the top_20_apps_df dataframe created in 2nd activity to create a scatter plot that shows the relationship between the number of reviews (in the X axis), and the Average user rating in the Y axis. The title of your plot should be "20 Most popular apps: Reviews vs Rating".

Your plot should look something like:

codevalidated

Modify the previous plot to show the size of each bubble based on the size of the app in Megabytes

Update your previous scatter plot to show the size of each marker (or bubble) based on the Average size in Megabytes column. Everything else should stay the same (including the title).

Your plot should look something like:

codevalidated

Create a bar chart comparing the total number of `Free` to `Paid` apps

The column Free indicates if an app is free or paid (with values True and False respectively). Create a bar chart that compares the total number of apps Free vs Paid. The title of your chart should be 'Free vs Paid apps on the AppStore'.

Your plot should look something like:

codevalidated

Create a bar chart showing the total number of apps by genre ascending order

Create a bar chart (titled "Total apps by Genre, in ascending order") showing how many apps per genre are there in the app store, in ascending order.

Your plot should look something like:

codevalidated

Create a multi-bar chart that compares the number of Free vs Paid apps per each Category/Genre

We'll now expand on our categorical bar chart, use the genre_free_apps dataframe to create a bar chart that shows the number of Free vs Paid apps per category/genre. The title of your chart should be "Apps per Genre: Free vs Paid".

Your plot should look something like:

codevalidated

Modify the previous bar chart and show the Free vs Paid apps in a stacked form

Instead of showing two bars per category, stack the Free vs Paid features into a single bar. Keep everything else, including the title, as it was.

multiplechoice

Have you tried the challenge?

Pandas Capstone Project: Visualizing apps from the Apple App StorePandas Capstone Project: Visualizing apps from the Apple App Store
Author

Santiago Basulto

This project is part of

Intro to Pandas for Data Analysis

Explore other projects