Hanukkah of Data 5784-Day 4 The Early Bird
The Task
We're given a clue that the customer likes to "claim the first pastries that came out of the oven". I looked at the products file, and noticed SKUs starting with "BKY". I assume these are bakery items, and didn't worry about whether, for example, a bagel is a "pastry".
>>> products[products['sku'].str.startswith('BKY')].head()
sku desc wholesale_cost dims_cm
211 BKY1573 Sesame Bagel 1.02 11.9|4.7|0.9
221 BKY1679 Sesame Twist 1.02 6.9|6.2|1.6
280 BKY2114 Poppyseed Mandelbrot 1.05 13.4|0.7|0.4
345 BKY2596 Poppyseed Linzer Cookie 1.04 16.0|7.5|5.0
516 BKY4004 Raspberry Sufganiah 1.02 18.7|10.6|3.3
The Solution
from advent import Counter
import pandas as pd
def solve():
customers = pd.read_csv('noahs-customers.csv')
orders = pd.read_csv('noahs-orders.csv', parse_dates=['shipped'])
order_items = pd.read_csv('noahs-orders_items.csv')
data = orders.merge(order_items)
bakery_orders = (data[data['sku'].str.startswith('BKY')][['orderid','customerid','ordered','shipped']].
drop_duplicates().
sort_values('shipped'))
prev_date = None
first_customer = Counter()
for index, row in bakery_orders.iterrows():
shipped_date = row['shipped'].date()
if shipped_date != prev_date:
first_customer[row['customerid']] += 1
prev_date = shipped_date
return customers[customers['customerid'] == first_customer.most_common()[0][0]].iloc[0]['phone']
# ---------------------------------------------------------------------------------------------
assert solve() == '607-231-3605'
First, let's get order items that are from the bakery:
bakery_orders = (data[data['sku'].str.startswith('BKY')][['orderid','customerid','ordered','shipped']].
drop_duplicates().
sort_values('shipped'))
I discovered there are quite a few customers that picked up the first bakery order of the day, so my idea was to count those occurrences, and choose the customer who most frequently got the first bakery order.
The algorithm is as follows:
- Iterate over the
bakery_orders(which are in order of shipped timestamp) - Each time we encounter a new date, add the customer ID to the counter
- After iterating, lookup the customer by id, and report the phone number
prev_date = None
first_customer = Counter()
for index, row in bakery_orders.iterrows():
shipped_date = row['shipped'].date()
if shipped_date != prev_date:
first_customer[row['customerid']] += 1
prev_date = shipped_date
return customers[customers['customerid'] == first_customer.most_common()[0][0]].iloc[0]['phone']