Hanukkah of Data 5784-Day 8 The Collector
The Task
We're given the clue that the Collector has an entire set of Noah's collectibles.
Solution
import re
import pandas as pd
def solve():
customers = pd.read_csv('noahs-customers.csv')
orders = pd.read_csv('noahs-orders.csv')
order_items = pd.read_csv('noahs-orders_items.csv')
products = pd.read_csv('noahs-products.csv')
data = customers.merge(orders).merge(order_items).merge(products)
is_collectible = data['sku'].str.startswith('COL')
grouped = (data[is_collectible][['customerid','sku','phone']].
drop_duplicates().
groupby(['customerid','phone']).
count())
return (grouped[grouped['sku'] == grouped['sku'].max()].
reset_index().
iloc[0]['phone'])
# ---------------------------------------------------------------------------------------------
assert solve() == '212-547-3518'
Create a predicate for collectibles:
Group by customer and count the number of collectibles per customer:
grouped = (data[is_collectible][['customerid','sku','phone']].
drop_duplicates().
groupby(['customerid','phone']).
count())
Return the phone number of the customer with the most collectibles:
Conclusion
That's the end of Hanukkah of Data 5784. I had a lot of fun, and
learned quite a bit about pandas, but I have a long way to go to be
proficient with Python's data science ecosystem!