Okay, I have a table with the following fields (amongst others):
[ID] (an autonumber)
[Date] (YYYY-MM-DD HH:MM:SS)
[Title] (255 character text)
[xcount] (Numeric)
What I want to do is delete the duplicate posts. They are duplicate ONLY if [title] is the same, and the [Date] is within 3 days of each other. I want to delete the lower numbered posts, or those with the lowest [xcount].
Anyone have a good idea for me?
Thanks!
TIger
7 2120
I was just going to give you a suggestion of how you might go about this, but then I realize there is a specific problem in your problem specification.
You are saying that two rows are "duplicate" if the ID is the same and the date field is within 3 days of each other. I have a problem deciding when to decide that rows are duplicates.
Take this case: Let us say you have 10 rows in your table with the same title, and with different values in the date field, but where these values are 2 days apart from each other. In other words, the total date range here spans about 20 days. Now, which rows can be considered duplicates to which other rows?
So your basic problem is not really well defined.
Sorry.
I get about 500 records per day. These are press releases.
Sometimes, I have a person who has family members in 2 seperate states, so I get 2 news releases. I only want to keep one. I don't care WHICH one. Those would be duplicates.
Sometimes, the news agency has an issue and re-releases the same news 2 days in a row. Those would be duplicates.
Sometimes "Mike Smith" graduates school, then 2 months later he gets promoted, those would not be duplicates, but would have the same title.
Here's an example. I would like the system to look at this and come up with 2 posts, one for "arrived for duty" and one for "graduated BMT": http://www.bmtgrad.com/HOMETOWNNEWS/...ip+D.+Smith%22
[xcount] is if they have any comments. If someone has commented, I would like to keep THAT post.
That make more sense?
There is still the conceptual problem of knowing when to delete what, since even as you describe it, you can still have a long series of rows in your table that have adjacent entries with dates within 3 days of each other, making it difficult to know what is a duplicate or not.
I think what you might better want to do is to not think in terms of "deleting" what you consider to be duplicate, but to instead refuse to "insert" a new row if it can be considered a duplicate. This would remove the problem.
Let us say a new entry is to be added and it has the same title to another entry that was made 2 days ago, and does not include comments or whatever that would make it a non-duplicate. Then you simply do not do the insert. It is never entered and does not need to be deleted. Then 2 days later, another entry is to be made with the same title. But now a simple query finds that the latest entry with the same title is now from 4 days ago, and so you decide this is not a duplicate, and you do the insert.
Maybe this helps? It sometimes is easier to "not insert in the first place" than to decide what to "delete afterwards".
I appreciate the help on this. I'm down to only about 2,000 duplicate posts by deleting a few a day! :-P
The problem on this solution is that I use an automated script to make the [TITLE] Field. All records are inserted with the same title. [NEWS RELEASE], as that is how they arrive via email.
I then run a script which renames them based upon the first several words of the release. "Mr John B Smith has graduated from Johnson College", "Mr Micheal Jones has been promoted to senior sales engineer" My script goes through and finds everything after "Mr, Mrs or Ms" and takes the words up to "Has" so the title would be "John B Smith", "Michael Jones", etc.
Arrgh, If my head wasn't shaven, I'd pull my hair out.
@r035198x
That does help a lot, but how would I write in the part about them being posted within 72 hours.
Tony
First transform the problem to that of selecting records such that their date is within three days of another record. You then run the delete against that result set.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Iain |
last post by:
Hi
I have inherited a web app with the following table structure, and need to
produce a table without any duplicates. Email seems like the best unique
identifier - so only one of each e-mail...
|
by: Rubinho |
last post by:
I've a list with duplicate members and I need to make each entry
unique.
I've come up with two ways of doing it and I'd like some input on what
would be considered more pythonic (or at least...
|
by: M B HONG 20 |
last post by:
Hi all -
I was wondering if Javascript has a way to easily remove duplicates
from a string. For example, if I had a string:
"car truck car truck truck tree post post tree"
it should turn...
|
by: Drew |
last post by:
I have a permission tracking app that I am working on, and I have made the
insert page for it. I am having issues on how to prevent duplicates from
getting entered.
Currently the interface for...
|
by: makthar |
last post by:
In your query use DISTINCT
SELECT DISTINCT CITY FROM <tablename> WHERE STATE='<state
name>'. This will bring only one of each city from the
table.
>-----Original Message-----
>I'm getting a...
|
by: tyrfboard |
last post by:
I've been searching for awhile now on how to remove duplicates from a
table within an Access db and have found plenty of articles on finding
or deleting duplicates. All I want to do is remove them...
|
by: Niyazi |
last post by:
Hi all,
What is fastest way removing duplicated value from string array using vb.net?
Here is what currently I am doing but the the array contains over 16000
items. And it just do it in 10 or...
|
by: sjlung |
last post by:
I apologise if this is a trivial question but I have appended three
tables in access and within this table, there are duplicate entries. I
have tried to set my reference number for this table to be...
|
by: vsgdp |
last post by:
I have a container of pointers. It is possible for two pointers to point to
the same element. I want to remove duplicates.
I am open to which container is best for this.
I thought of using...
|
by: Mokita |
last post by:
Hello,
I am working with Taverna to build a workflow. Taverna has a beanshell where I can program in java. I am having some problems in writing a script, where I want to eliminate the duplicates...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |