Hanukkah of Data 5784 - Day 4 The Early Bird
The Task
We’re given a clue that the customer likes to “claim the first pastries that came out of the oven”. I looked at the products file, and noticed SKUs starting with “BKY”. I assume these are bakery items, and didn’t worry about whether, for example, a bagel is a “pastry”.
>>> products[products['sku'].str.startswith('BKY')].head()
sku desc wholesale_cost dims_cm
211 BKY1573 Sesame Bagel 1.02 11.9|4.7|0.9
221 BKY1679 Sesame Twist 1.02 6.9|6.2|1.6
280 BKY2114 Poppyseed Mandelbrot 1.05 13.4|0.7|0.4
345 BKY2596 Poppyseed Linzer Cookie 1.04 16.0|7.5|5.0
516 BKY4004 Raspberry Sufganiah 1.02 18.7|10.6|3.3
The Solution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
from advent import Counter import pandas as pd def solve(): customers = pd.read_csv('noahs-customers.csv') orders = pd.read_csv('noahs-orders.csv', parse_dates=['shipped']) order_items = pd.read_csv('noahs-orders_items.csv') data = orders.merge(order_items) bakery_orders = (data[data['sku'].str.startswith('BKY')][['orderid','customerid','ordered','shipped']]. drop_duplicates(). sort_values('shipped')) prev_date = None first_customer = Counter() for index, row in bakery_orders.iterrows(): shipped_date = row['shipped'].date() if shipped_date != prev_date: first_customer[row['customerid']] += 1 prev_date = shipped_date return customers[customers['customerid'] == first_customer.most_common()[0][0]].iloc[0]['phone'] # --------------------------------------------------------------------------------------------- assert solve() == '607-231-3605' |
First, let’s get order items that are from the bakery:
1 2 3 |
bakery_orders = (data[data['sku'].str.startswith('BKY')][['orderid','customerid','ordered','shipped']]. drop_duplicates(). sort_values('shipped')) |
I discovered there are quite a few customers that picked up the first bakery order of the day, so my idea was to count those occurrences, and choose the customer who most frequently got the first bakery order.
The algorithm is as follows:
- Iterate over the
bakery_orders
(which are in order of shipped timestamp) - Each time we encounter a new date, add the customer ID to the counter
- After iterating, lookup the customer by id, and report the phone number
1 2 3 4 5 6 7 8 9 10 11 |
prev_date = None first_customer = Counter() for index, row in bakery_orders.iterrows(): shipped_date = row['shipped'].date() if shipped_date != prev_date: first_customer[row['customerid']] += 1 prev_date = shipped_date return customers[customers['customerid'] == first_customer.most_common()[0][0]].iloc[0]['phone'] |