All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
This activity involves visualizing data related to avocado sales volume per region. Carry out the following steps:
Start by grouping the DataFrame by region
.
Calculate the sum of avocados sold for each region and then reset the index of the resulting DataFrame.
Order the regions by their total volume in descending order and exclude TotalUS
to focus on individual regions.
Represent the processed data in a bar chart where the x-axis denotes the regions and the y-axis indicates the total volume of sales.
Use the following parameters while plotting:
Figure size : 12 by 8
Color : teal
Title: Total Volume of Avocados Sold by Region
xticks rotation : 90
Group the DataFrame by year
to calculate the average of the AveragePrice
of avocados for each year. Once you have that, create a line plot with the years on the x-axis and the average AveragePrice
on the y-axis. Use the following information for your plot:
Figure size : 10 by 6
Colour : blue
Title: Average Price of Avocados by Year
markers : 'o'
linestyle : '-'
Finally, add gridlines to help visualize the changes in price more clearly, and then display your plot.
Begin by setting up your figure, ensuring to control the size of the plot to 10 by 6
. Next, create a histogram to visualize how the average price of avocados is distributed. It is important to specify the number of bins - in this case, 10
- to break down the data. Choose colors that make the bars stand out, skyblue
for the bars and black
for the edges. Be sure to add labels to your axes and a title - Distribution of Average Price
- so as to make clear what the histogram represents. Finally, include some gridlines along the y-axis for easier reading, with a linestyle of '--'
and alpha setting of 0.7
. Go ahead and display your plot to see how the prices are spread out.
First, calculate the total volume for each avocado type 4046
, 4225
, and 4770
by summing up their respective columns. Use these totals to create a pie chart that shows the proportion each type contributes to the overall volume using the information below:
Figure size : 8 by 8
Colours : #66c2a5, #fc8d62, #8da0cb
labels : '4046', '4225', '4770'
Title: Proportion of Total Volume for Each Avocado Type
autopct : %1.1f%%
startangle : 140
Finally, set the axis to be equal so your pie chart stays perfectly circular, then display it to see the breakdown of avocado types.
Figure size : 10 by 6
Colour : green
Title: Scatter Plot of Total Volume vs. Average Price
Group the DataFrame by the type
column and sum the Total Volume
for each avocado type (organic and conventional) . Then Plot a bar chart with the avocado types on the x-axis and their corresponding total sales volumes on the y-axis, using the following details:
Figure size : 10 by 6
Title : Total Volume of Avocados Sold by Type
colors : #1f77b4, #ff7f0e
Finally, use plt.show()
to display your completed chart.
First, extract the unique years from the dataset and define the width of the bars and their positions on the x-axis. You then create a figure and axis for plotting. Next, you plot three sets of bars, each representing the total sales of a different type of avocado bag (Small Bags, Large Bags, and XLarge Bags) for each year. To calculate these totals, you use the groupby function on the DataFrame, grouping the data by the year
column and summing the sales for each bag type. This ensures that the bars represent the aggregated sales for each year.
To distinguish between the bag types, the bars are color-coded: blue for Small Bags (left of center), green for Large Bags (centered), and red for XLarge Bags (right of center).
Adjust the positions of the bars to avoid overlap. Customize the plot with a title, x-axis and y-axis labels, and a legend. Finally, display the plot with a tight layout to ensure everything fits well.
Figure size : 10 by 6
Title : Volume of Different Bag Sizes Over the Years
colors : Small Bags ='blue', Large Bags='green', 'XLarge Bags='red'
Plot a time series chart showing daily average prices. Your task would include:
Start by grouping the data by year
and calculating the average price of avocados for each year. This creates a series where each year is associated with its average price. You then identify the years with the highest and lowest average prices to highlight these key points.
Then create a line graph of these yearly average prices, using the following information:
Figure size : 10 by 6
Colour : blue
label : 'Yearly Average Price'
marker: 'o'
linestyle: '-'
Horizontal line representing the overall average price
color: 'red'
linestyle : '--'
label : 'Overall Average Price'
Annotation of the maximum point
Text : {max_price:.2f}
Position: xy (max_year, max_price). Use xytext to position the text slightly above the point, (max_year, max_price + 0.05).
Arrow Properties: arrowprops=dict(facecolor='green', shrink=0.05)
Text Color: 'green' to match the text color with the arrow for consistency.
Text Alignment: ha='center' to center the text horizontally over the point.
Annotation of the minimum point
Text : {min_price:.2f}
Position: xy (min_year, min_price). Use xytext to position the text slightly below the point, (min_year, min_price - 0.05).
Arrow Properties: arrowprops=dict(facecolor='red', shrink=0.05)
Text Color: 'red' to match the text color with the arrow for consistency.
Text Alignment: ha='center' to center the text horizontally over the point.