473,404 Members | 2,187 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

How to find anomalous usage

My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
reading down incorrectly or made a factor of 10 error when entering the
reading, etc. At other times the usage is zero or somehow was entered as a
negative number.

So I'm thinking about adding functionality to search for such anomalies. For
instance, show months where the meter reading is 25% higher than the average
for the prior 12 months. Or show months for a particular meter where there
is a difference of 20% between adjacent monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and probably
should be 90. The 75 usage in Mar-06 would show up on a search where there
is a difference between adjacent months of 25% or more. We'll probably also
code the functionality to search for zero usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality. Searching row
by row through the tables would probably take a very long time. Is there a
clever way to handle this through SQL alone or mostly through SQL? Or does
anyone have any other suggestions? It would seem that this could be a very
slow process.
Thanks.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #1
10 2058
On Wed, 04 Oct 2006 14:23:13 GMT, "rdemyan via AccessMonster.com"
<u6836@uwewrote:

I would compare the readings against a scaled version of the common
trend. The trend would be an average over all meters, showing for
example that the usage in winter months is higher than in summer
months. The scaling is to account for a larger home putting up higher
numbers than a smaller one.

I would not worry about speed until it's proven to be an issue.

-Tom.

>My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
reading down incorrectly or made a factor of 10 error when entering the
reading, etc. At other times the usage is zero or somehow was entered as a
negative number.

So I'm thinking about adding functionality to search for such anomalies. For
instance, show months where the meter reading is 25% higher than the average
for the prior 12 months. Or show months for a particular meter where there
is a difference of 20% between adjacent monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and probably
should be 90. The 75 usage in Mar-06 would show up on a search where there
is a difference between adjacent months of 25% or more. We'll probably also
code the functionality to search for zero usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality. Searching row
by row through the tables would probably take a very long time. Is there a
clever way to handle this through SQL alone or mostly through SQL? Or does
anyone have any other suggestions? It would seem that this could be a very
slow process.
Thanks.
Oct 4 '06 #2
"rdemyan via AccessMonster.com" <u6836@uwewrote in
news:6743ad47a3f7b@uwe:
My app contains utility meter usage. One of the things we have to
deal with is when a usage is clearly incorrect. Perhaps someone wrote
the meter reading down incorrectly or made a factor of 10 error when
entering the reading, etc. At other times the usage is zero or
somehow was entered as a negative number.

So I'm thinking about adding functionality to search for such
anomalies. For instance, show months where the meter reading is 25%
higher than the average for the prior 12 months. Or show months for
a particular meter where there is a difference of 20% between adjacent
monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and
probably should be 90. The 75 usage in Mar-06 would show up on a
search where there is a difference between adjacent months of 25% or
more. We'll probably also code the functionality to search for zero
usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality.
Searching row by row through the tables would probably take a very
long time. Is there a clever way to handle this through SQL alone or
mostly through SQL? Or does anyone have any other suggestions? It
would seem that this could be a very slow process.
Thanks.
OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Reading) AS Average, StDev(Meter.Reading) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.

--
Lyle Fairfield
Oct 4 '06 #3
Interesting, Lyle. I'll see what I can do with this and report back. You
show 2 but that can be easily changed by the user on the form (however, I'll
have to think about what that really means in terms us mere mortals can
understand).

One definate thing I will want to add is the ability to select a specific
time frame.

Lyle Fairfield wrote:
>My app contains utility meter usage. One of the things we have to
deal with is when a usage is clearly incorrect. Perhaps someone wrote
[quoted text clipped - 34 lines]
>>
Thanks.

OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Reading) AS Average, StDev(Meter.Reading) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.
--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #4
Tom:

It's a good point about summer and winter months. This is also a function of
geographic area. In Seattle, electricity usage is fairly constant throughout
the year (no summer air conditioning). Water is also fairly constant (not
much irrigation needed in the Pacific Northwest). Heating, though will vary
substnatially.

In Hawaii, cooling occurs year round but will vary with the cooling degree
days. No heating degree days there so heating is not an issue.

In writing this, I realize that I may want to incorporate weather data in
determining what an "anomaly" is for those utilities that show variance due
to weather in the particular geographical area. My data tables contain all
the necessary weather data so this should be doable.

Tom van Stiphout wrote:
>I would compare the readings against a scaled version of the common
trend. The trend would be an average over all meters, showing for
example that the usage in winter months is higher than in summer
months. The scaling is to account for a larger home putting up higher
numbers than a smaller one.

I would not worry about speed until it's proven to be an issue.

-Tom.
>>My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
[quoted text clipped - 32 lines]
>>
Thanks.
--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #5
On Wed, 04 Oct 2006 15:43:31 GMT, Lyle Fairfield
<ly***********@aim.comwrote:

I'll have to study this some more. The way I'm calculating z-scores
for a project is quite a bit more involved.

I think Abs(Score) >= 2 is worth another look.

-Tom.

<clip>
>
OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Reading) AS Average, StDev(Meter.Reading) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.
Oct 5 '06 #6
Tom van Stiphout <no*************@cox.netwrote in
news:vf********************************@4ax.com:
I think Abs(Score) >= 2 is worth another look.

-Tom.
I think you are right. Abs() is a good idea.

--
Lyle Fairfield
Oct 5 '06 #7
Lyle:

I got the following to produce results, but I need to test it further.

SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
FROM [MONTHLY_METER_USAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
StDev(USAGE) AS StDev
FROM [MONTHLY_METER_USAGE]
WHERE METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
ON m.USAGE <sq.Average
WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
AND m.METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;


Lyle Fairfield wrote:
>I think Abs(Score) >= 2 is worth another look.

-Tom.

I think you are right. Abs() is a good idea.
--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 5 '06 #8
rdemyan via AccessMonster.com wrote:
Lyle:

I got the following to produce results, but I need to test it further.

SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
FROM [MONTHLY_METER_USAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
StDev(USAGE) AS StDev
FROM [MONTHLY_METER_USAGE]
WHERE METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
ON m.USAGE <sq.Average
WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
AND m.METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;
If you use Tom's revision:
WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
you will identify scores that are unusually low as well as scores that
are unusually high,

Oct 5 '06 #9
Got this to work nicely but had to add a Having clause because there is the
possibility that StDev can be zero and dividing by zero, of course, leads to
an error.

Lyle Fairfield wrote:
>Lyle:
[quoted text clipped - 11 lines]
>AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;

If you use Tom's revision:
WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
you will identify scores that are unusually low as well as scores that
are unusually high,
--
Message posted via http://www.accessmonster.com

Oct 7 '06 #10

rdemyan via AccessMonster.com wrote:
Got this to work nicely but had to add a Having clause because there is the
possibility that StDev can be zero and dividing by zero, of course, leads to
an error.
Good point!

Oct 7 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Peter Hansen | last post by:
Greetings. Im trying to write a program that can be run from the command line. If I want to search for example after a file with the ending .pdf, I should be able to write in the command line:...
2
by: davidw | last post by:
I want to check chose performance related parameters in my code. Thanks!
108
by: Bryan Olson | last post by:
The Python slice type has one method 'indices', and reportedly: This method takes a single integer argument /length/ and computes information about the extended slice that the slice object would...
50
by: sabarish | last post by:
Hi to all. find out the biggest among two numbers without using any conditional statements and any relational operators.
7
by: Nadav | last post by:
Hi, 1. I am writing some kind of a CLI Linker 2. I am using the unmanaged meta-data API. I wonder browsing "cor.h" I have encountered the 'ICeeGen' this interface is retrieved by the...
23
by: thebjorn | last post by:
For the purpose of finding someone's age I was looking for a way to find how the difference in years between two dates, so I could do something like: age = (date.today() - born).year but that...
4
by: isha123 | last post by:
how can i find the cpu usage of my windows XP SP2 machine from command line? And also how can i execute this command with Perl? or is there anyother way to find CPU usage for XP machine using Perl?
3
by: dittytwo | last post by:
Hi all I have been looking around the web and can't seem to find a solution the solution that i have found and manipulated seems to bring back the whole list of currently running process's...
58
by: sh.vipin | last post by:
is there any way to find out number of bytes freed on a particular free() call in C
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.