Hanukkah of Data 5784 - Day 8 The Collector

:: programming, python, puzzle

The Task

We’re given the clue that the Collector has an entire set of Noah’s collectibles.

Solution

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import re
import pandas as pd

def solve():
    customers   = pd.read_csv('noahs-customers.csv')
    orders      = pd.read_csv('noahs-orders.csv')
    order_items = pd.read_csv('noahs-orders_items.csv')
    products    = pd.read_csv('noahs-products.csv')
    data        = customers.merge(orders).merge(order_items).merge(products)

    is_collectible = data['sku'].str.startswith('COL')

    grouped = (data[is_collectible][['customerid','sku','phone']].
               drop_duplicates().
               groupby(['customerid','phone']).
               count())

    return (grouped[grouped['sku'] == grouped['sku'].max()].
            reset_index().
            iloc[0]['phone'])

# ---------------------------------------------------------------------------------------------

assert solve() == '212-547-3518'

Create a predicate for collectibles:

1
is_collectible = data['sku'].str.startswith('COL')

Group by customer and count the number of collectibles per customer:

1
2
3
4
grouped = (data[is_collectible][['customerid','sku','phone']].
           drop_duplicates().
           groupby(['customerid','phone']).
           count())

Return the phone number of the customer with the most collectibles:

1
2
3
return (grouped[grouped['sku'] == grouped['sku'].max()].
        reset_index().
        iloc[0]['phone'])

Conclusion

That’s the end of Hanukkah of Data 5784. I had a lot of fun, and learned quite a bit about pandas, but I have a long way to go to be proficient with Python’s data science ecosystem!