Data Structures- Hashmaps, Sets, Hash Tables, Hashing and Collisions
Observing hashmaps with python dictionaries
What is a Hashtable/Hashmap?
A hashtable is a data structure that with a collection of key-value pairs, where each key maps to a value, and the keys must be unique and hashable.
- In Python there is a built in hashtable known as a dictionary.
The primary purpose of a hashtable is to provide efficient lookup, insertion, and deletion operations. When an element is to be inserted into the hashtable, a hash function is used to map the key to a specific index in the underlying array that is used to store the key-value pairs. The value is then stored at that index. When searching for a value, the hash function is used again to find the index where the value is stored.
The key advantage of a hashtable over other data structures like arrays and linked lists is its average-case time complexity for lookup, insertion, and deletion operations.
- The typical time complexity of a hashtable is O(1).
What is Hashing and Collision?
Hashing is the process of mapping a given key to a value in a hash table or hashmap, using a hash function. The hash function takes the key as input and produces a hash value or hash code, which is then used to determine the index in the underlying array where the value is stored. The purpose of hashing is to provide a quick and efficient way to access data, by eliminating the need to search through an entire data structure to find a value.
However, it is possible for two different keys to map to the same hash value, resulting in a collision. When a collision occurs, there are different ways to resolve it, depending on the collision resolution strategy used.
Python's dictionary implementation is optimized to handle collisions efficiently, and the performance of the dictionary is generally very good, even in the presence of collisions. However, if the number of collisions is very high, the performance of the dictionary can degrade, so it is important to choose a good hash function that minimizes collisions when designing a Python dictionary.
What is a Set?
my_set = set([1, 2, 3, 2, 1, 'b', '1'])
print(my_set)
# What do you notice in the output?
#
# The output only contains unique characters. For example, if 1 appears in the set twice, it will only get printed once. Also if there is 1 and "1" in the set, both will get printed because the first 1 is a integer while the second "1" is a string.
# Why do you think Sets are in the same tech talk as Hashmaps/Hashtables?
#
# Dictionaries cannot have duplicate keys either.
lover_album = {
"title": "Lover",
"artist": "Taylor Swift",
"year": 2019,
"genre": ["Pop", "Synth-pop"],
"tracks": {
1: "I Forgot That You Existed",
2: "Cruel Summer",
3: "Lover",
4: "The Man",
5: "The Archer",
6: "I Think He Knows",
7: "Miss Americana & The Heartbreak Prince",
8: "Paper Rings",
9: "Cornelia Street",
10: "Death By A Thousand Cuts",
11: "London Boy",
12: "Soon You'll Get Better (feat. Dixie Chicks)",
13: "False God",
14: "You Need To Calm Down",
15: "Afterglow",
16: "Me! (feat. Brendon Urie of Panic! At The Disco)",
17: "It's Nice To Have A Friend",
18: "Daylight"
}
}
# What data structures do you see?
#
# Strings, Integers, Lists, Dictionaries
# Printing the dictionary
print(lover_album)
print(lover_album.get('tracks'))
# or
print(lover_album['tracks'])
print(lover_album.get('tracks')[4])
# or
print(lover_album['tracks'][4])
lover_album["producer"] = set(['Taylor Swift', 'Jack Antonoff', 'Joel Little', 'Taylor Swift', 'Louis Bell', 'Frank Dukes'])
# What can you change to make sure there are no duplicate producers?
#
# you can add set() around the value to remove duplicate producers
# Printing the dictionary
print(lover_album)
lover_album["tracks"].update({19: "All Of The Girls You Loved Before"})
# How would add an additional genre to the dictionary, like electropop?
#
lover_album["genre"].append("electropop")
# Printing the dictionary
print(lover_album)
# for k,v in lover_album.items(): # iterate using a for loop for key and value
# if k == "tracks":
# print("tracks:")
# print(str(k) + ": " + str(v))
# Write your own code to print tracks in readable format
for k,v in lover_album["tracks"].items():
print(k , ":" , v)
# def search():
# search = input("What would you like to know about the album?")
# if lover_album.get(search.lower()) == None:
# print("Invalid Search")
# else:
# print(lover_album.get(search.lower()))
# search()
# This is a very basic code segment, how can you improve upon this code?
def search_v2():
search = input("What would you like to know about the album? Type 'title' to get the title of the album. Type 'artist' to get the name of the artist. Type 'year' to get the album release date. Type 'genre' to get the genre of the album. Type 'tracks' to get the list of songs on the album. Type 'tracks number' to get a specific song from the album. Type 'producer' to get a list of producers on the album.")
if search == "tracks number":
song = input("Give a number for witch song you want from the album.")
print(lover_album['tracks'][int(song)])
elif search == "add track":
new_track = input("Give a new track to add to the album.")
lover_album["tracks"].update({len(lover_album["tracks"])+1: new_track})
print(new_track, "was added to the album at position", len(lover_album["tracks"]))
for k,v in lover_album["tracks"].items():
print(k, ":", v)
else:
if lover_album.get(search.lower()) == None:
print("Invalid Search")
elif search == "title":
print("The name of this album is", lover_album.get("title"))
elif search == "artist":
print("The artist who created this album is", lover_album.get("artist"))
elif search == "year":
print("The year this album was released was", lover_album.get("year"))
elif search == "genre":
print("The genres of this album are", ', '.join(lover_album.get("genre")))
elif search == "tracks":
print("Here are the tracks from the album:")
for k,v in lover_album["tracks"].items():
print(k, ":", v)
elif search == "producer":
print("The people who produced this album are", ', '.join(lover_album.get("producer")))
search_v2()
search_v2()
Hacks
- Answer ALL questions in the code segments
- Create a diagram or comparison illustration (Canva).
- What are the pro and cons of using this data structure?
Pros | Cons |
---|---|
Allows insertion of key value pair. | Allows insertion of key value pair. |
HashMap is non synchronized. | HashMap cannot be shared between multiple threads without proper synchronization. |
HashMap is a fail-fast iterator. | HashMap is a fail-fast iterator. |
Faster access of elements due to hashing technology. | HashMap is a fail-fast iterator. |
Lists | Dictionaries |
---|---|
Can work with any data | Dictionaries use key, value pairs |
Index starts at 0 | Keys can't be changed |
Can append, delete, etc | Dictionaries don't have to be ordered |
Lists use [] | Dictionaries use {} |
- Expand upon the code given to you, possible improvements in comments
-
Build your own album showing features of a python dictionary
-
For Mr. Yeung's class: Justify your favorite Taylor Swift song, answer may effect seed
heros_and_villains_album = {
"title": "Heros and Villains",
"artist": "Metro Boomin",
"year": 2022,
"genre": ["Hip-Hop", "Rap"],
"producers": ["Metro Boomin", "TM88", "Johan Lenox", "DaHeala", "Allen Ritter", "Honorable C.N.O.T.E.", "Oz"],
"tracks": {
1: "Trance (feat. Young Thug, Travis Scott)",
2: "Superhero (feat. Chris Brown, Future)",
3: "Feel The Fiyaaaah (feat. Takeoff, A$AP Rocky)",
4: "I Can't Save You (feat. Future, Don Toliver)",
5: "Metro Spider (feat. Young Thug)",
6: "Niagara Falls (feat. 21 Savage, Travis Scott)",
7: "Lock On Me (feat. Future, Travis Scott)",
8: "Umbrella (feat. Young Nudy, 21 Savage)",
9: "Creepin' (feat. 21 Savage, The Weeknd)",
10: "Walk Em Down (feat. Mustafa the Poet, 21 Savage)",
11: "Raindrops (feat. Future, Don Toliver)",
12: "All the Money (feat. Gunna)",
13: "On Time (feat. John Legend)",
14: "Around Me (feat. Don Toliver)"
},
}
def search_v3():
# asks for what details you would like to know
search = input("What would you like to know about the album? Type 'title' to get the title of the album. Type 'artist' to get the name of the artist. Type 'year' to get the album release date. Type 'genre' to get the genre of the album. Type 'tracks' to get the list of songs on the album. Type 'tracks number' to get a specific song from the album. Type 'producers' to get a list of producers on the album.")
# if the input is blank, it stops the code
if search != "":
if search == "tracks number":
# asks for an input
song = input("Give a number for which song you want from the album.")
# if input is valid, prints the song name at that position. If it isn't valid, it exits and re-runs the function.
if song <= len(heros_and_villains_album["tracks"]):
print("The song at position", song, "is", heros_and_villains_album['tracks'][int(song)])
else:
search_v3()
elif search == "add track":
# asks for an input
new_track = input("Give a new track to add to the album.")
# adds new track to the album
heros_and_villains_album["tracks"].update({len(heros_and_villains_album["tracks"])+1: new_track})
print(new_track, "was added to the album at position", len(heros_and_villains_album["tracks"]))
else:
# checks if the input is not blank, but not valid either.
if heros_and_villains_album.get(search.lower()) == None:
print("Invalid Search")
# if the user is asking for the title, it prints it
elif search == "title":
print("The name of this album is", heros_and_villains_album.get("title"))
# if the user is asking for the artist, it prints it
elif search == "artist":
print("The artist who created this album is", heros_and_villains_album.get("artist"))
# if the user is asking for the year the album was released, it prints it
elif search == "year":
print("The year this album was released was", heros_and_villains_album.get("year"))
# if the user is asking for the genres of the album, it prints it
elif search == "genre":
print("The genres of this album are", ', '.join(heros_and_villains_album.get("genre")))
# if the user is asking for the tracks in the album, it prints it
elif search == "tracks":
print("Here are the tracks from the album:")
# uses a for loop to print the keys (song position) and the values (song name)
for k,v in heros_and_villains_album["tracks"].items():
print(k, ":", v)
# if the user is asking for the producers, it prints it
elif search == "producers":
print("The people who produced this album are", ', '.join(heros_and_villains_album.get("producers")))
# repeats the function
search_v3()
search_v3()
My favorite Taylor Swift song is "You Belong With Me" because of the catchy beat and amazing lyrics. My favorite part is
She wears high heels
I wear sneakers
She's Cheer Captain, and I'm on the bleachers
Dreaming about the day when you wake up and find
That what you're looking for has been here the whole time
I love this song and I listen to it all the time.