472,142 Members | 1,232 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,142 software developers and data experts.

How to find anomalous usage

My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
reading down incorrectly or made a factor of 10 error when entering the
reading, etc. At other times the usage is zero or somehow was entered as a
negative number.

So I'm thinking about adding functionality to search for such anomalies. For
instance, show months where the meter reading is 25% higher than the average
for the prior 12 months. Or show months for a particular meter where there
is a difference of 20% between adjacent monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and probably
should be 90. The 75 usage in Mar-06 would show up on a search where there
is a difference between adjacent months of 25% or more. We'll probably also
code the functionality to search for zero usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality. Searching row
by row through the tables would probably take a very long time. Is there a
clever way to handle this through SQL alone or mostly through SQL? Or does
anyone have any other suggestions? It would seem that this could be a very
slow process.
Thanks.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #1
10 1932
On Wed, 04 Oct 2006 14:23:13 GMT, "rdemyan via AccessMonster.com"
<u6836@uwewrote:

I would compare the readings against a scaled version of the common
trend. The trend would be an average over all meters, showing for
example that the usage in winter months is higher than in summer
months. The scaling is to account for a larger home putting up higher
numbers than a smaller one.

I would not worry about speed until it's proven to be an issue.

-Tom.

>My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
reading down incorrectly or made a factor of 10 error when entering the
reading, etc. At other times the usage is zero or somehow was entered as a
negative number.

So I'm thinking about adding functionality to search for such anomalies. For
instance, show months where the meter reading is 25% higher than the average
for the prior 12 months. Or show months for a particular meter where there
is a difference of 20% between adjacent monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and probably
should be 90. The 75 usage in Mar-06 would show up on a search where there
is a difference between adjacent months of 25% or more. We'll probably also
code the functionality to search for zero usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality. Searching row
by row through the tables would probably take a very long time. Is there a
clever way to handle this through SQL alone or mostly through SQL? Or does
anyone have any other suggestions? It would seem that this could be a very
slow process.
Thanks.
Oct 4 '06 #2
"rdemyan via AccessMonster.com" <u6836@uwewrote in
news:6743ad47a3f7b@uwe:
My app contains utility meter usage. One of the things we have to
deal with is when a usage is clearly incorrect. Perhaps someone wrote
the meter reading down incorrectly or made a factor of 10 error when
entering the reading, etc. At other times the usage is zero or
somehow was entered as a negative number.

So I'm thinking about adding functionality to search for such
anomalies. For instance, show months where the meter reading is 25%
higher than the average for the prior 12 months. Or show months for
a particular meter where there is a difference of 20% between adjacent
monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and
probably should be 90. The 75 usage in Mar-06 would show up on a
search where there is a difference between adjacent months of 25% or
more. We'll probably also code the functionality to search for zero
usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality.
Searching row by row through the tables would probably take a very
long time. Is there a clever way to handle this through SQL alone or
mostly through SQL? Or does anyone have any other suggestions? It
would seem that this could be a very slow process.
Thanks.
OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Reading) AS Average, StDev(Meter.Reading) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.

--
Lyle Fairfield
Oct 4 '06 #3
Interesting, Lyle. I'll see what I can do with this and report back. You
show 2 but that can be easily changed by the user on the form (however, I'll
have to think about what that really means in terms us mere mortals can
understand).

One definate thing I will want to add is the ability to select a specific
time frame.

Lyle Fairfield wrote:
>My app contains utility meter usage. One of the things we have to
deal with is when a usage is clearly incorrect. Perhaps someone wrote
[quoted text clipped - 34 lines]
>>
Thanks.

OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Reading) AS Average, StDev(Meter.Reading) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.
--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #4
Tom:

It's a good point about summer and winter months. This is also a function of
geographic area. In Seattle, electricity usage is fairly constant throughout
the year (no summer air conditioning). Water is also fairly constant (not
much irrigation needed in the Pacific Northwest). Heating, though will vary
substnatially.

In Hawaii, cooling occurs year round but will vary with the cooling degree
days. No heating degree days there so heating is not an issue.

In writing this, I realize that I may want to incorporate weather data in
determining what an "anomaly" is for those utilities that show variance due
to weather in the particular geographical area. My data tables contain all
the necessary weather data so this should be doable.

Tom van Stiphout wrote:
>I would compare the readings against a scaled version of the common
trend. The trend would be an average over all meters, showing for
example that the usage in winter months is higher than in summer
months. The scaling is to account for a larger home putting up higher
numbers than a smaller one.

I would not worry about speed until it's proven to be an issue.

-Tom.
>>My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
[quoted text clipped - 32 lines]
>>
Thanks.
--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #5
On Wed, 04 Oct 2006 15:43:31 GMT, Lyle Fairfield
<ly***********@aim.comwrote:

I'll have to study this some more. The way I'm calculating z-scores
for a project is quite a bit more involved.

I think Abs(Score) >= 2 is worth another look.

-Tom.

<clip>
>
OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Reading) AS Average, StDev(Meter.Reading) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.
Oct 5 '06 #6
Tom van Stiphout <no*************@cox.netwrote in
news:vf********************************@4ax.com:
I think Abs(Score) >= 2 is worth another look.

-Tom.
I think you are right. Abs() is a good idea.

--
Lyle Fairfield
Oct 5 '06 #7
Lyle:

I got the following to produce results, but I need to test it further.

SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
FROM [MONTHLY_METER_USAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
StDev(USAGE) AS StDev
FROM [MONTHLY_METER_USAGE]
WHERE METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
ON m.USAGE <sq.Average
WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
AND m.METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;


Lyle Fairfield wrote:
>I think Abs(Score) >= 2 is worth another look.

-Tom.

I think you are right. Abs() is a good idea.
--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 5 '06 #8
rdemyan via AccessMonster.com wrote:
Lyle:

I got the following to produce results, but I need to test it further.

SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
FROM [MONTHLY_METER_USAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
StDev(USAGE) AS StDev
FROM [MONTHLY_METER_USAGE]
WHERE METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
ON m.USAGE <sq.Average
WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
AND m.METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;
If you use Tom's revision:
WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
you will identify scores that are unusually low as well as scores that
are unusually high,

Oct 5 '06 #9
Got this to work nicely but had to add a Having clause because there is the
possibility that StDev can be zero and dividing by zero, of course, leads to
an error.

Lyle Fairfield wrote:
>Lyle:
[quoted text clipped - 11 lines]
>AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;

If you use Tom's revision:
WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
you will identify scores that are unusually low as well as scores that
are unusually high,
--
Message posted via http://www.accessmonster.com

Oct 7 '06 #10

rdemyan via AccessMonster.com wrote:
Got this to work nicely but had to add a Having clause because there is the
possibility that StDev can be zero and dividing by zero, of course, leads to
an error.
Good point!

Oct 7 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Peter Hansen | last post: by
108 posts views Thread by Bryan Olson | last post: by
50 posts views Thread by sabarish | last post: by
4 posts views Thread by isha123 | last post: by
58 posts views Thread by sh.vipin | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.