Practice Data Cleaning and String Handling with City Bike data
Practice Data Cleaning and String Handling with City Bike data Data Science Project
Data Cleaning with Pandas

Practice Data Cleaning and String Handling with City Bike data

This lab focuses on New York City's Citi Bike network, cleaning and analyzing subscription data with Pandas' .str methods. We transform messy strings into usable formats using techniques like .str.capitalize(), .str.contains(), and more. By enhancing data quality, we reveal insights into the Citi Bike system, user behaviors, and usage patterns. Join us to master string data manipulation, unveiling the stories hidden within Citi Bike's anonymized data.
Start this project
Practice Data Cleaning and String Handling with City Bike dataPractice Data Cleaning and String Handling with City Bike data
Project Created by

Jawad haider

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Capitalize the column `first name`

If you explore the DataFrame, you'll see that the column first name is "inconsistent" with its capitalization. Some names are capitalized (Alexis, Jodi), but some others are not (misty, matrick).

Create a new series capital_first_name that contains the results of the column first name correctly capitalized.

codevalidated

Make the Column `last name` as lower case

Now we can see the Column last name has very messy format too. Some of the middle letters are capitalized like in HarRISs, DaniEEIs and some of the first names are not capitalize. So, convert all of them to lower case and store the result in the variable lower_last_name

codevalidated

Make `last name` as Upper case. Store your answer in the variable `last_name_upper`

codevalidated

How many users in the Column `usertype` are `Customer`

Lets count all the Customers in the column usertype and sum them up. Store your sum in the customer_counts variable.

multiplechoice

How many users in the Column `usetype` are `Subscribers`

As you got the total number of Customers in usertype from the previous question, then also find how many Subscribers are there in total.

You can subtract the number of Customers from the total lenght of the dataframe to find the remain - which are Subscribers.

codevalidated

Find the words in Column `pin` which contain the substring `lol` and store your selection in the variable `word_having_lols`

codevalidated

Find the names in the Column `first name` which start with the letter `Z`

Find all the names in the first name column that start with the latter "Z". Store the result in the variable starts_with_z.

Be careful! It's capital Z, not lowercase z.

input

How many first names start with the word 'Z'

codevalidated

Find the names in the Column `last name` which end with 't' and store your result in the variable `ends_with_t`

input

How many Values in the Column `last names` end with the word 't'

codevalidated

Join the `bikeid` in the Column `bikeid` by a `<space>

Use str.join() method to join the bikeid with in the Column bikeid and store the output in the variable spaced_bikeids

codevalidated

Create a new Column named `name length` having all the lengths of names from the Column `first name`

multiplechoice

Find if the Column `pin` is alpha numeric or it contains digits only

multiplechoice

Verify if the Column `tripduration` has any non-numeric values or it contains digits only

multiplechoice

Check if any name in the Column `first name` has digit(s) or number(s) in it

codevalidated

Split the emails in the `emails` column at `@` to find the Domain names and store them in the variable `email_domains`

Once you split all the emails in the Column emails on @ then the value at second index will be the domain of the email.

codevalidated

Replace the emails having `.edu` with `.org` and store the output in the variable `edu_to_org`

codevalidated

Repace the numeric and the St values in `end station name` Column with `<space>` so that we can filter the address without street numbers. Store your result in the variable `clean_address`

Practice Data Cleaning and String Handling with City Bike dataPractice Data Cleaning and String Handling with City Bike data
Project Created by

Jawad haider

This project is part of

Data Cleaning with Pandas

Explore other projects