All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Who's the chattiest among the F.R.I.E.N.D.S group? Let’s start simple: Identify the character who speaks the most throughout the series.
Want to know how talkative the group was each season? Using word_count
column, calculate the total number of words spoken by all characters in each season. Use the .groupby()
and .sum()
methods.
For this activity, code to calculate word_count
is already provided in the notebook.
Store the result in seasonal_word_sum
variable.
The result should match the following output:
Curious about the pace of the show? Calculate the average number of scenes per episode across all seasons. Use nunique()
to avoid counting repeated scenes.
Ever wondered who had the briefest or the most extended things to say? Find the shortest and longest dialogues spoken by any character using .min()
and .max()
Store the result in dialogue_lengths
variable.
The result should match the following output:
Dive deeper into dialogue details! Calculate multiple statistics (mean, standard deviation, minimum, maximum, and median) for dialogue lengths of each character using .agg()
.
Remember to use the previously calculated dialogue_length
column.
Store the result in char_stats
variable.
Why would you use the .groupby()
function when analyzing dialogue data from "F.R.I.E.N.D.S"?
Explore the vocabulary range of the characters. Define and use a custom aggregation function to count unique words spoken by each character using agg()
.
Store the result in unique_words_per_character
variable.
The result should match the following output:
Phoebe's family stories are as complex as they are entertaining. Use a custom aggregation function to summarize the most frequently mentioned family members.
Store the result in family_mentions
variable.
The result should match the following output:
What is the advantage of using .agg()
with multiple aggregation functions in a pandas DataFrame?
How much does Joey talk each season? Let's find out by counting the number of lines Joey speaks each season.
Store the result in the variable joey_lines
.
The result should match the following output:
Throughout the series, Chandler's job remained a subject of confusion and humor. Your task is to explore dialogues where Chandler or others try to describe his profession, highlighting how the confusion about his job role builds throughout the series.
Filter out for words like job
or work
using str.contains
method.
Don't forget to use size()
at the end to count the number of rows.
Store the result in chandler_job_explanations
variable.
The result should match the following output:
F.R.I.E.N.D.S often took us down memory lane with flashbacks. Identify episodes with the most references to past events.
Store the result in the variable flashback_mentions
.
Don't forget to use size()
at the end to count the number of rows.
The result should match the following output:
"how you doin'?" Joey's catchphrase is legendary. Find the number of times Joey uses his famous line compared to others.
Dont forget to use size()
at the end to count the number of rows.
Store the result in joey_catchphrases
variable.
The result should match the following output:
Store the result in variable : joey_catchphrases
Ah, Ross and his weddings—always a spectacle! Dive into the data to find out which wedding episode had the most dialogue. Was it Emily’s, Rachel’s, or perhaps Carol’s? Enter the Episode number.
How does the .filter()
method differ from .loc[]
in the context of pandas group operations?
Monica’s cleanliness is legendary. Calculate the number of times Monica mentions "clean," "dust," or "soap" in each season. Who knew cleaning could be this fun to analyze?
Store the result in cleaning_mentions
variable.
The result should match the following output:
Identify the season-wise longest dialogue that was witnessed. Was it during one of Ross's scientific explanations or Monica's detailed anecdotes?
Store the result in max_dialogue_length
variable.
The result should match the following output:
Ross and Monica’s routine dance is unforgettable. Find out which season had the most dance or music-related dialogues using the apply()
method.
Store the results in dance_music_dialogues
variable.
For this activity use the friends_info_df
dataset.
The result should match the following output:
When Chandler spends time in a box as penance, the conversation around him varies dramatically. Evaluate the average number of words spoken by each character in this episode to see who talks most while he's boxed up.
Note : For finding the solution of
chandler_box
use thefriends_info_df
dataset.
Filter with the exact season
and epsiode
number, that you get from the chandler_box
.
Store the result in avg_words_per_character
variable.
The result should match the following output:
Why is the .transform()
method important in data processing with pandas?
Normalize dialogue lengths for each character by subtracting the mean and dividing by the standard deviation using .transform()
.
Remember to use the previously calculated dialogue_length
column.
The result should match the following output:
Why is the .pivot_table()
function beneficial when analyzing dialogue frequency by characters and episodes in "F.R.I.E.N.D.S"?
Let's take a seat at Central Perk! Analyze how the dynamic of conversations at the famous coffee shop changes from season to season. Using a pivot table
, we'll summarize the number of dialogues each character has in Central Perk across all seasons.
Store the result in central_perk_pivot
variable.
Note
Scene Directions
is included as a character in the dataset and is therefore considered in the solution as you can see in the output image.
The result should match the following output:
F.R.I.E.N.D.S gave us some memorable Thanksgiving episodes. Dive into these special episodes to see how dialogue contributions vary among the characters. Create a pivot table
to display the count of dialogues per character for each Thanksgiving episode by season.
Store the result in thanksgiving_pivot
variable.
The result should match the following output: