Over the Christmas break I went on a holiday with a group of friends, and we stumbled on a bit of a dilemma:

Who owes what?

Of course its easy to determine that before your trip, with the plane tickets and accommodation. But what about during your trip? The taxi fares, the meals, souvenirs, etc. You wouldn’t want to pick it through with a fine comb whilst trying to enjoy your holiday!

As with these kind of problems we used spreadsheet, and I’ll share here the logic that we did using Pandas. Now in hindsight doing this using Pandas may not be the wisest way. However with the requirements I had in mind, it was by far the most straight forward solution I could think of.

Requirements:

  • Information presented in a flat file (in this case csv) for easy manipulation on a phone or tablet.
  • Should be “obvious” for a general user to stare at the flat file and understand the format, and how to populate new entries.

Again, this wasn’t designed to be used by your average Joe, more for myself and perhaps if other people for some inexplicable reason are interested in this. As a side note, if I was to implement this in a production setting, the library would probably look similar to Advanced Python Scheduler; though this would violate the requirements I set for myself.

Data Format

Since I demanded that the data be a flat file, its quite simple.

description, amount, creditor, debitor
deposit, 38, Chapman, Tim
breakfast, 10, Dave, Chapman
Taxi, 20, Josh
breakfast, 25, Josh
Sea Kayaking, 250, Chapman
ATV tour, 350, Chapman
Taxi, 5, Josh, Tim
Groceries, 23.5, Dave

Where the creditor would be the person to paid for the item, and debitor the person who owes money. I’ve placed it so that if debitor field is blank, then everyone owes money (and is to be split evenly). In future, I would increase the debitor field to accept any number of people.

Massaging the Data

Next would be completely fill in the table to have full information. Since when debitor is omitted it means everyone else is required to be a debitor, and that the amount is split amongst everyone. The secret here is to get a list of all persons and use permutations from the module itertools to “fill in” the DataFrame using an outer join/merge.

{% highlight python %} s = info.creditor.append(info.debitor).dropna() person = list(set(s)) pls = pd.DataFrame(list(permutations(person,2)), columns=[‘creditor’,‘debitor’])

infonan = info[-info[‘debitor’].isin(person)].drop(‘debitor’, 1) infonan[‘amount’] = infonan[‘amount’]/len(person) #split the amounts evenly ppt = info[info[‘debitor’].isin(person)].append(pd.merge(infonan, pls, how=‘outer’, on=[‘creditor’])) {% endhighlight %}

But we’re not done yet! After this is done, we would want a final DataFrame that consists of the aggregated amounts of what each person owes to everyone. This is done using a self-join. In SQL-like code it would be:

select 
    coalesce(a.creditor, b.debitor) as creditor,
    coalesce(b.creditor, a.debitor) as debitor,
    coalesce(a.amount, 0) as cr,
    coalesce(b.amount, 0) as dr
from 
    data as a 
    full outer join data as b 
        on a.creditor = b.debitor and a.debitor = b.creditor

Which would result in something like this:

creditor debitor cr dr
0 Chapman Dave 150.000 15.875
1 Chapman Josh 150.000 11.250
2 Chapman Tim 188.000 0.000
3 Dave Chapman 15.875 150.000
4 Dave Josh 5.875 11.250

Determining Who Owes What

Finally how do we determine who owes what? Well the pseudo code is like this:

p = largest creditor (value)
Do for every row
    If the creditor or debitor is not p then
        subtract the creditor's amount from amount owing to p
        add the debitor's amount to the amount owing to p
    Else
        Remove any debits amounts from p's credit amounts
    End If
End Do

The ending result would look like this:

creditor debitor cr
0 Chapman Dave 134.625
1 Chapman Josh 160.375
2 Chapman Tim 165.875

And thats it, you’ve figured out who owes what.

Notebook and gist