473,770 Members | 1,901 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to find anomalous usage

My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
reading down incorrectly or made a factor of 10 error when entering the
reading, etc. At other times the usage is zero or somehow was entered as a
negative number.

So I'm thinking about adding functionality to search for such anomalies. For
instance, show months where the meter reading is 25% higher than the average
for the prior 12 months. Or show months for a particular meter where there
is a difference of 20% between adjacent monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and probably
should be 90. The 75 usage in Mar-06 would show up on a search where there
is a difference between adjacent months of 25% or more. We'll probably also
code the functionality to search for zero usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality. Searching row
by row through the tables would probably take a very long time. Is there a
clever way to handle this through SQL alone or mostly through SQL? Or does
anyone have any other suggestions? It would seem that this could be a very
slow process.
Thanks.

--
Message posted via AccessMonster.c om
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #1
10 2099
On Wed, 04 Oct 2006 14:23:13 GMT, "rdemyan via AccessMonster.c om"
<u6836@uwewrote :

I would compare the readings against a scaled version of the common
trend. The trend would be an average over all meters, showing for
example that the usage in winter months is higher than in summer
months. The scaling is to account for a larger home putting up higher
numbers than a smaller one.

I would not worry about speed until it's proven to be an issue.

-Tom.

>My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
reading down incorrectly or made a factor of 10 error when entering the
reading, etc. At other times the usage is zero or somehow was entered as a
negative number.

So I'm thinking about adding functionality to search for such anomalies. For
instance, show months where the meter reading is 25% higher than the average
for the prior 12 months. Or show months for a particular meter where there
is a difference of 20% between adjacent monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and probably
should be 90. The 75 usage in Mar-06 would show up on a search where there
is a difference between adjacent months of 25% or more. We'll probably also
code the functionality to search for zero usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality. Searching row
by row through the tables would probably take a very long time. Is there a
clever way to handle this through SQL alone or mostly through SQL? Or does
anyone have any other suggestions? It would seem that this could be a very
slow process.
Thanks.
Oct 4 '06 #2
"rdemyan via AccessMonster.c om" <u6836@uwewro te in
news:6743ad47a3 f7b@uwe:
My app contains utility meter usage. One of the things we have to
deal with is when a usage is clearly incorrect. Perhaps someone wrote
the meter reading down incorrectly or made a factor of 10 error when
entering the reading, etc. At other times the usage is zero or
somehow was entered as a negative number.

So I'm thinking about adding functionality to search for such
anomalies. For instance, show months where the meter reading is 25%
higher than the average for the prior 12 months. Or show months for
a particular meter where there is a difference of 20% between adjacent
monthly usage. Here's a data example

Meter 5678

Jan-06 100
Feb-06 105
Mar-06 75
Apr-06 90
May-06 101
Jun-06 900
Jul-06 89
So you can see from this data that 900 is clearly incorrect and
probably should be 90. The 75 usage in Mar-06 would show up on a
search where there is a difference between adjacent months of 25% or
more. We'll probably also code the functionality to search for zero
usage and negative usage.

Bear in mind that we have several thousand meters and around a 100,000
monthly meter usages spanning several years.

I'm looking for an approach to implement this functionality.
Searching row by row through the tables would probably take a very
long time. Is there a clever way to handle this through SQL alone or
mostly through SQL? Or does anyone have any other suggestions? It
would seem that this could be a very slow process.
Thanks.
OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Readi ng) AS Average, StDev(Meter.Rea ding) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.

--
Lyle Fairfield
Oct 4 '06 #3
Interesting, Lyle. I'll see what I can do with this and report back. You
show 2 but that can be easily changed by the user on the form (however, I'll
have to think about what that really means in terms us mere mortals can
understand).

One definate thing I will want to add is the ability to select a specific
time frame.

Lyle Fairfield wrote:
>My app contains utility meter usage. One of the things we have to
deal with is when a usage is clearly incorrect. Perhaps someone wrote
[quoted text clipped - 34 lines]
>>
Thanks.

OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Readi ng) AS Average, StDev(Meter.Rea ding) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.
--
Message posted via AccessMonster.c om
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #4
Tom:

It's a good point about summer and winter months. This is also a function of
geographic area. In Seattle, electricity usage is fairly constant throughout
the year (no summer air conditioning). Water is also fairly constant (not
much irrigation needed in the Pacific Northwest). Heating, though will vary
substnatially.

In Hawaii, cooling occurs year round but will vary with the cooling degree
days. No heating degree days there so heating is not an issue.

In writing this, I realize that I may want to incorporate weather data in
determining what an "anomaly" is for those utilities that show variance due
to weather in the particular geographical area. My data tables contain all
the necessary weather data so this should be doable.

Tom van Stiphout wrote:
>I would compare the readings against a scaled version of the common
trend. The trend would be an average over all meters, showing for
example that the usage in winter months is higher than in summer
months. The scaling is to account for a larger home putting up higher
numbers than a smaller one.

I would not worry about speed until it's proven to be an issue.

-Tom.
>>My app contains utility meter usage. One of the things we have to deal with
is when a usage is clearly incorrect. Perhaps someone wrote the meter
[quoted text clipped - 32 lines]
>>
Thanks.
--
Message posted via AccessMonster.c om
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 4 '06 #5
On Wed, 04 Oct 2006 15:43:31 GMT, Lyle Fairfield
<ly***********@ aim.comwrote:

I'll have to study this some more. The way I'm calculating z-scores
for a project is quite a bit more involved.

I think Abs(Score) >= 2 is worth another look.

-Tom.

<clip>
>
OTTOMH

SELECT m.Reading, (m.Reading-sq.Average)/sq.StDev AS ZScore FROM Meter m
LEFT JOIN
[SELECT Avg(Meter.Readi ng) AS Average, StDev(Meter.Rea ding) AS StDev
FROM Meter]. sq
ON m.Reading*1000 <sq.Average
WHERE ((m.Reading-sq.Average)/sq.StDev)>=2
ORDER BY (m.Reading-sq.Average)/sq.StDev

You, of course, would have to modify this for your own situation. I have
suggested that a Score >= 2 would be suspect but your own experience
would be the best guide here.

No, I don't really expect that you will be able to use this, but hope
springs eternal.
Oct 5 '06 #6
Tom van Stiphout <no************ *@cox.netwrote in
news:vf******** *************** *********@4ax.c om:
I think Abs(Score) >= 2 is worth another look.

-Tom.
I think you are right. Abs() is a good idea.

--
Lyle Fairfield
Oct 5 '06 #7
Lyle:

I got the following to produce results, but I need to test it further.

SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
FROM [MONTHLY_METER_U SAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
StDev(USAGE) AS StDev
FROM [MONTHLY_METER_U SAGE]
WHERE METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
ON m.USAGE <sq.Average
WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
AND m.METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;


Lyle Fairfield wrote:
>I think Abs(Score) >= 2 is worth another look.

-Tom.

I think you are right. Abs() is a good idea.
--
Message posted via AccessMonster.c om
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 5 '06 #8
rdemyan via AccessMonster.c om wrote:
Lyle:

I got the following to produce results, but I need to test it further.

SELECT m.USAGE, (m.USAGE-sq.Average)/sq.StDev AS ZScore
FROM [MONTHLY_METER_U SAGE] AS m LEFT JOIN (SELECT Avg(USAGE) AS Average,
StDev(USAGE) AS StDev
FROM [MONTHLY_METER_U SAGE]
WHERE METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#) AS sq
ON m.USAGE <sq.Average
WHERE ((m.USAGE-sq.Average)/sq.StDev)>=2
AND m.METER_ID = '000001'
AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;
If you use Tom's revision:
WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
you will identify scores that are unusually low as well as scores that
are unusually high,

Oct 5 '06 #9
Got this to work nicely but had to add a Having clause because there is the
possibility that StDev can be zero and dividing by zero, of course, leads to
an error.

Lyle Fairfield wrote:
>Lyle:
[quoted text clipped - 11 lines]
>AND USAGE_END_DATE >= #03/01/2005# AND USAGE_END_DATE <= #02/28/2006#
ORDER BY (m.USAGE-sq.Average)/sq.StDev;

If you use Tom's revision:
WHERE Abs(((m.USAGE-sq.Average)/sq.StDev))>=2
you will identify scores that are unusually low as well as scores that
are unusually high,
--
Message posted via http://www.accessmonster.com

Oct 7 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1780
by: Peter Hansen | last post by:
Greetings. Im trying to write a program that can be run from the command line. If I want to search for example after a file with the ending .pdf, I should be able to write in the command line: python name of my program / the libary to search and what kind of file it is example a .pdf file So if my program name was test.py and the library name was library1 and the test type i wanted to find was, a .pdf file I should write python...
2
6310
by: davidw | last post by:
I want to check chose performance related parameters in my code. Thanks!
108
6459
by: Bryan Olson | last post by:
The Python slice type has one method 'indices', and reportedly: This method takes a single integer argument /length/ and computes information about the extended slice that the slice object would describe if applied to a sequence of length items. It returns a tuple of three integers; respectively these are the /start/ and /stop/ indices and the /step/ or stride length of the slice. Missing or out-of-bounds indices are handled in a manner...
50
4210
by: sabarish | last post by:
Hi to all. find out the biggest among two numbers without using any conditional statements and any relational operators.
7
1941
by: Nadav | last post by:
Hi, 1. I am writing some kind of a CLI Linker 2. I am using the unmanaged meta-data API. I wonder browsing "cor.h" I have encountered the 'ICeeGen' this interface is retrieved by the 'ICorModule' interface, I didn't find any API that provide me the means to get an 'ICorModule' interface, how should I get this interface? What object should i create? what method should I call? Any comments remarks or pointers will be appreciated. --...
23
14062
by: thebjorn | last post by:
For the purpose of finding someone's age I was looking for a way to find how the difference in years between two dates, so I could do something like: age = (date.today() - born).year but that didn't work (the timedelta class doesn't have a year accessor). I looked in the docs and the cookbook, but I couldn't find anything, so
4
13041
by: isha123 | last post by:
how can i find the cpu usage of my windows XP SP2 machine from command line? And also how can i execute this command with Perl? or is there anyother way to find CPU usage for XP machine using Perl?
3
5926
by: dittytwo | last post by:
Hi all I have been looking around the web and can't seem to find a solution the solution that i have found and manipulated seems to bring back the whole list of currently running process's which hogs a lot of CPU time in its self.. basic idea of my program is i have multiple threads doing things and i have a thread that gathers data within this thread i can currently get all sorts of information using ...
58
4931
by: sh.vipin | last post by:
is there any way to find out number of bytes freed on a particular free() call in C
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10259
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10101
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10038
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8933
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7456
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5482
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3609
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2849
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.