All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
1024
bytes form a kilobyte. 1024
kilobytes form a Megabyte. So, a Megabyte is equal to 1,048,576
bytes, that is (1024 * 1024)
bytes.
Here's an example of how your column should look like:
Create a dataframe top_20_apps_df
that contains the Top 20 apps with most reviews, sorted in Descending order containing all the columns. The first 5 apps should be:
YouTube: Watch, Listen, Stream
Instagram
Spotify New Music and Podcasts
Venmo
DoorDash - Food Delivery
And your dataframe should look something like:
Most published apps get (sadly) just forgotten in the App Store without anybody ever downloading or reviewing them. Let's take a look at that. Create a histogram showing the distribution of Reviews
.
Create your plot using pandas .plot
method, set the title to Distributions of Reviews in the AppStore
and a figure size of (14, 7)
.
Your plot should look something like:
The previous distribution plot doesn't help much with the analysis. We see a big blue bar on the "low" side of reviews and not much more. So, let's zoom in.
Create another histogram showing the distribution of the reviews. Set the title to Distributions of Reviews in the AppStore (max 100 apps per bin)
, a figure size of (14, 7)
.
But IMPORTANT, this time, set the number of bins to 100
and the maximum value of the Y axis to 100
as well. That will show on the Y axis a bar of up to 100 Reviews, and it'll allow us to better see the other apps.
Your plot should look something like:
If you look closely on the X-axis of our previous plot, it's a little bit hard to read because by default, pandas and matplotlib use scientific notation (1e7
). Plot THE SAME distribution plot (same title, bins, everything), but this time, change the X axis to use a human-readable formatter using commas as thousands separators, and with no decimals. So for example, the number 1e5
will become 100,000
.
Your plot should look something like:
Using monthly_app_summary_df
, create a line plot using the Month Released
column in the x-axis, and the Total Apps Released
in the Y axis.
The title of the plot should be "Total apps released per month"
.
Your plot should look something like:
Now create a scatter plot to show the relationship between the Month the app was released and the Average size in MB, let's try to uncover any trends!
The title of your plot should be "Average size of apps over time"
.
Your plot should look something like:
Use the top_20_apps_df
dataframe created in 2nd activity to create a scatter plot that shows the relationship between the number of reviews (in the X axis), and the Average user rating in the Y axis. The title of your plot should be "20 Most popular apps: Reviews vs Rating"
.
Your plot should look something like:
Update your previous scatter plot to show the size of each marker (or bubble) based on the Average size in Megabytes column. Everything else should stay the same (including the title).
Your plot should look something like:
The column Free
indicates if an app is free or paid (with values True
and False
respectively). Create a bar chart that compares the total number of apps Free vs Paid. The title of your chart should be 'Free vs Paid apps on the AppStore'
.
Your plot should look something like:
Create a bar chart (titled "Total apps by Genre, in ascending order"
) showing how many apps per genre are there in the app store, in ascending order.
Your plot should look something like:
We'll now expand on our categorical bar chart, use the genre_free_apps
dataframe to create a bar chart that shows the number of Free vs Paid apps per category/genre. The title of your chart should be "Apps per Genre: Free vs Paid"
.
Your plot should look something like:
Instead of showing two bars per category, stack the Free vs Paid features into a single bar. Keep everything else, including the title, as it was.