Connecting Tech Pros Worldwide Help | Site Map

tricky time interval billing problem

pruebauno@latinmail.com
Guest
 
Posts: n/a
#1: Dec 5 '05
I am currently working on a tricky problem at work. I googled around a
bit, but "time intervals" did not come up with anything useful.
Although I have some rough idea of how I could solve it, I still would
value some input.

I have information of (It has only couple dozen entries.)
ServiceNum, DollarCost

and a input database table in the form (several GBytes):
ClientNumber (CN), BeginDate(BD), EndDate(ED),
ServiceNumber_Purchased(SN)
--Date intervals are always [closed, closed]

The output is:
ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)

With the following requirements:
1) The input dates can be overlapping dates.
The output has to be non-overlapping "broken up" dates

Example: (assuming SN 1=$10 and SN 2=$20)
input (CN, BD, ED ,SN):
10, 1/1/2001,1/1/2005,1
10, 1/1/2002,1/1/2004,2

should result in:
output (CN, BD, ED, DC):
10, 1/1/2001, 12/31/2001, $10
10, 1/1/2002, 1/1/2004, $30
10, 1/2,2004, 1/1/2005, $10

2) if the same service is purchased twice for an interval it should be
billed only once
Example:
input:
10, 1/1/2001, 1/1/2005, 1
10, 1/1/2004, 1/1/2007, 1

output:
10, 1/1/2001, 1/1/2007, $10

Here are my thoughts about a solution so far:

1. fetch the input data sorted by clientNumber, Begindate, Enddate and
ServiceNumber

2. read the row, store as temporary good interval, then read another
row

3. if new begin-date is bigger than previous good interval end-date (or
previously saved end-date), output previous good interval, start new
good interval

4. else create new good interval entry with good interval begin-date
and current row begin-date, store good interval end-date into a list
with bisect module or something (so we always compare against smallest
end-date).

5. Use "bitwise or" on a service bitfield to add services and caluate
the total

As you can see my algorithm is a bit scetchy right now, but at this
point I am more interested to determine if the bisect module would be
the best way to approach the date interval problem or if I should use
some other data structure instead (stack, queue, set,...). I am working
on a Python proof of concept now. It is likely that the company will
want a C++ version of it later (yikes).

nes

Steve Holden
Guest
 
Posts: n/a
#2: Dec 6 '05

re: tricky time interval billing problem


pruebauno@latinmail.com wrote:[color=blue]
> I am currently working on a tricky problem at work. I googled around a
> bit, but "time intervals" did not come up with anything useful.
> Although I have some rough idea of how I could solve it, I still would
> value some input.
>
> I have information of (It has only couple dozen entries.)
> ServiceNum, DollarCost
>
> and a input database table in the form (several GBytes):
> ClientNumber (CN), BeginDate(BD), EndDate(ED),
> ServiceNumber_Purchased(SN)
> --Date intervals are always [closed, closed]
>
> The output is:
> ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)
>
> With the following requirements:
> 1) The input dates can be overlapping dates.
> The output has to be non-overlapping "broken up" dates
>
> Example: (assuming SN 1=$10 and SN 2=$20)
> input (CN, BD, ED ,SN):
> 10, 1/1/2001,1/1/2005,1
> 10, 1/1/2002,1/1/2004,2
>
> should result in:
> output (CN, BD, ED, DC):
> 10, 1/1/2001, 12/31/2001, $10
> 10, 1/1/2002, 1/1/2004, $30
> 10, 1/2,2004, 1/1/2005, $10
>
> 2) if the same service is purchased twice for an interval it should be
> billed only once
> Example:
> input:
> 10, 1/1/2001, 1/1/2005, 1
> 10, 1/1/2004, 1/1/2007, 1
>
> output:
> 10, 1/1/2001, 1/1/2007, $10
>
> Here are my thoughts about a solution so far:
>
> 1. fetch the input data sorted by clientNumber, Begindate, Enddate and
> ServiceNumber
>
> 2. read the row, store as temporary good interval, then read another
> row
>
> 3. if new begin-date is bigger than previous good interval end-date (or
> previously saved end-date), output previous good interval, start new
> good interval
>
> 4. else create new good interval entry with good interval begin-date
> and current row begin-date, store good interval end-date into a list
> with bisect module or something (so we always compare against smallest
> end-date).
>
> 5. Use "bitwise or" on a service bitfield to add services and caluate
> the total
>
> As you can see my algorithm is a bit scetchy right now, but at this
> point I am more interested to determine if the bisect module would be
> the best way to approach the date interval problem or if I should use
> some other data structure instead (stack, queue, set,...). I am working
> on a Python proof of concept now. It is likely that the company will
> want a C++ version of it later (yikes).
>
> nes
>[/color]
First of all, you need to use ordering to ensure that the database gives
you the most convenient order for processing, as this will make your
computations much easier. So I'd suggest sorting by clientNumber,
ServiceNumber, Begindate and Enddate. That way you can consider each
service separately.

Then all you need to do is accumulate the entries where the clientNumber
and ServiceNumber are the same and the start date of the second is less
than or equal to the end date of the first, repeatedly until either
there's no date overlap or a new service and/or client is started.

This shouldn't need any intermediate storage of results: if the row
you've just read can be used to extend the current range then extend it,
otherwise emit the current range and replace it with the new row.

Or is there something I've failed to understand about how you want to
process the data?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

pruebauno@latinmail.com
Guest
 
Posts: n/a
#3: Dec 6 '05

re: tricky time interval billing problem


> First of all, you need to use ordering to ensure that the database gives[color=blue]
> you the most convenient order for processing, as this will make your
> computations much easier. So I'd suggest sorting by clientNumber,
> ServiceNumber, Begindate and Enddate. That way you can consider each
> service separately.
>
> Then all you need to do is accumulate the entries where the clientNumber
> and ServiceNumber are the same and the start date of the second is less
> than or equal to the end date of the first, repeatedly until either
> there's no date overlap or a new service and/or client is started.
>
> This shouldn't need any intermediate storage of results: if the row
> you've just read can be used to extend the current range then extend it,
> otherwise emit the current range and replace it with the new row.
>
> Or is there something I've failed to understand about how you want to
> process the data?
>
> regards
> Steve[/color]

Hi Steve, (btw I am the one that sent you the quotes at Pycon)

Hm, that would be an algorithm for requirement number 2. I do not
understand how it would help with respect to requirement 1. Notice that
by coincidence, in my examples the input data is already sorted as you
specified.

The real data is of course more messy than my analogy and I will post
it here so that nobody can accuse me of "feature creep". I hope I don't
totally confuse you now. Feel free to ignore it. The real output is not
100% correct either (that is why I am rewriting the program). Some of
the database in all its gory, ..eh glory:

INPUT
43756352|D|01/01/1999|09/30/2003|DCUD2B00|DCUD2B00|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|D|01/01/1999|09/30/2003|DCUD2B00|DCUD2B00|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756351|D|01/01/1999|09/02/2002|DCUD2B00|DCUD2B00|Y|A|43756350|M|83516
|00374 |9048327561|0001|
43756354|D|04/02/1999|09/30/2003|DCUD2B00|DCUD2B00|Y|A|43756350|W|83516
|00374 |9048327561|0001|
43756351|M|01/01/1999|03/31/1999|MARTPPG2|MARTPPG2|Y|A|43756350|M|83516
|00374 |9048327561|0001|
43756352|M|01/01/1999|03/31/1999|MARTPPG2|MARTPPG2|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|M|01/01/1999|03/31/1999|MARTPPG2|MARTPPG2|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756351|M|04/01/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|M|83516
|00374 |9048327561|0001|
43756352|M|04/01/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|M|04/01/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756354|M|04/02/1999|07/31/2002|MBCPG002|MBCPG002|Y|A|43756350|W|83516
|00374 |9048327561|0001|
43756352|M|08/01/2002|09/30/2003|MBP07305|MBP07305|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756353|M|08/01/2002|09/30/2003|MBP07305|MBP07305|Y|A|43756350|D|83516
|00374 |9048327561|0001|
43756354|M|08/01/2002|09/30/2003|MBP07305|MBP07305|Y|A|43756350|W|83516
|00374 |9048327561|0001|
43756351|M|08/01/2002|09/02/2002|MBP07305|MBP07305|Y|A|43756350|M|83516
|00374 |9048327561|0001|

OUTPUT
43756350|9048327561|DCUD2B00|D|A|01/01/1999|04/01/1999|83516 |00374
|0001|Y|A|DCUD2B00|
43756350|9048327561|DCUD2B00|D|E|04/02/1999|09/30/2003|83516 |00374
|0001|Y|A|DCUD2B00|
43756350|9048327561|MARTPPG2|M|A|01/01/1999|03/31/1999|83516 |00374
|0001|Y|A|MARTPPG2|
43756350|9048327561|MBCPG002|M|A|04/01/1999|07/31/2002|83516 |00374
|0001|Y|A|MBCPG002|
43756350|9048327561|MBP07305|M|A|08/01/2002|09/02/2002|83516 |00374
|0001|Y|A|MBP07305|
43756350|9048327561|MBP07305|M|E|09/03/2002|09/30/2003|83516 |00374
|0001|Y|A|MBP07305|

CHEAT SHEET:
| (M) | (H,W) | (D,O,S) ||
+==============================
| - | - | - || O |
+-----------------------------------------+
| - | - | X || G |
+-----------------------------------------+
| - | X | - || F |
+-----------------------------------------+
| - | X | X || E |
+-----------------------------------------+
| X | - | - || C |
+-----------------------------------------+
| X | - | X || D |
+-----------------------------------------+
| X | X | - || B |
+-----------------------------------------+
| X | X | X || A |
+-----------------------------------------+

regards,
Nes

Closed Thread