Skip to content

Hanukkah of Data 5784-Day 8 The Collector

The Task

We're given the clue that the Collector has an entire set of Noah's collectibles.

Solution

import re
import pandas as pd

def solve():
    customers   = pd.read_csv('noahs-customers.csv')
    orders      = pd.read_csv('noahs-orders.csv')
    order_items = pd.read_csv('noahs-orders_items.csv')
    products    = pd.read_csv('noahs-products.csv')
    data        = customers.merge(orders).merge(order_items).merge(products)

    is_collectible = data['sku'].str.startswith('COL')

    grouped = (data[is_collectible][['customerid','sku','phone']].
               drop_duplicates().
               groupby(['customerid','phone']).
               count())

    return (grouped[grouped['sku'] == grouped['sku'].max()].
            reset_index().
            iloc[0]['phone'])

# ---------------------------------------------------------------------------------------------

assert solve() == '212-547-3518'

Create a predicate for collectibles:

is_collectible = data['sku'].str.startswith('COL')

Group by customer and count the number of collectibles per customer:

grouped = (data[is_collectible][['customerid','sku','phone']].
           drop_duplicates().
           groupby(['customerid','phone']).
           count())

Return the phone number of the customer with the most collectibles:

return (grouped[grouped['sku'] == grouped['sku'].max()].
        reset_index().
        iloc[0]['phone'])

Conclusion

That's the end of Hanukkah of Data 5784. I had a lot of fun, and learned quite a bit about pandas, but I have a long way to go to be proficient with Python's data science ecosystem!