Hanukkah of Data 5784 - Day 4 The Early Bird

:: programming, python, puzzle

The Task

We’re given a clue that the customer likes to “claim the first pastries that came out of the oven”. I looked at the products file, and noticed SKUs starting with “BKY”. I assume these are bakery items, and didn’t worry about whether, for example, a bagel is a “pastry”.

>>> products[products['sku'].str.startswith('BKY')].head()
         sku                     desc  wholesale_cost        dims_cm
211  BKY1573             Sesame Bagel            1.02   11.9|4.7|0.9
221  BKY1679             Sesame Twist            1.02    6.9|6.2|1.6
280  BKY2114     Poppyseed Mandelbrot            1.05   13.4|0.7|0.4
345  BKY2596  Poppyseed Linzer Cookie            1.04   16.0|7.5|5.0
516  BKY4004      Raspberry Sufganiah            1.02  18.7|10.6|3.3

The Solution

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from advent import Counter
import pandas as pd

def solve():
    customers   = pd.read_csv('noahs-customers.csv')
    orders      = pd.read_csv('noahs-orders.csv', parse_dates=['shipped'])
    order_items = pd.read_csv('noahs-orders_items.csv')
    data        = orders.merge(order_items)

    bakery_orders = (data[data['sku'].str.startswith('BKY')][['orderid','customerid','ordered','shipped']].
                     drop_duplicates().
                     sort_values('shipped'))

    prev_date      = None
    first_customer = Counter()

    for index, row in bakery_orders.iterrows():
        shipped_date = row['shipped'].date()

        if shipped_date != prev_date:
            first_customer[row['customerid']] += 1
            prev_date = shipped_date

    return customers[customers['customerid'] == first_customer.most_common()[0][0]].iloc[0]['phone']

# ---------------------------------------------------------------------------------------------

assert solve() == '607-231-3605'

First, let’s get order items that are from the bakery:

1
2
3
bakery_orders = (data[data['sku'].str.startswith('BKY')][['orderid','customerid','ordered','shipped']].
                 drop_duplicates().
                 sort_values('shipped'))

I discovered there are quite a few customers that picked up the first bakery order of the day, so my idea was to count those occurrences, and choose the customer who most frequently got the first bakery order.

The algorithm is as follows:

  • Iterate over the bakery_orders (which are in order of shipped timestamp)
  • Each time we encounter a new date, add the customer ID to the counter
  • After iterating, lookup the customer by id, and report the phone number
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
prev_date      = None
first_customer = Counter()

for index, row in bakery_orders.iterrows():
    shipped_date = row['shipped'].date()

    if shipped_date != prev_date:
        first_customer[row['customerid']] += 1
        prev_date = shipped_date

return customers[customers['customerid'] == first_customer.most_common()[0][0]].iloc[0]['phone']