473,480 Members | 1,891 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Matching line with 3 or more repetitions of the same date

numberwhun
3,509 Recognized Expert Moderator Specialist
Hello everyone!

I have a data file that contains miscellaneous information on each line. (Unfortunately, I cannot go into detail of the file layout as it is sensitive information), but I can say that on each line are dates in multiple positions. Some areas of a line contain multiple dates, strung together because there is no separation between the fields (ie: no space(s)):

ie: 05/26/200706/03/200707/24/2007

As you can see, there are 3 dates above. Some have fields following each other that are as above, only there is at least one or more spaces between the dates:

ie: 05/26/2007 06/03/2007 07/24/2007

This is where it gets a little hairy. In some files, there is an arbitrary string of dates. The string of dates would look as follows:

ie: 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007

There may be 2, 5, 7, or more dates, all strung together, but they are all the same date through the string on that line.

I am trying to write a regex that will match this arbitrarty string of identical dates, but unfortunately, it matches any string of multiple dates. Here is what I have so far:

Expand|Select|Wrap|Line Numbers
  1. if($line =~ m/(\d+\/\d+\/\d+\s*){2,}/)
  2. {
  3.     print("Line Number $. ==>  $line \n")
  4. }
  5.  
Can anyone please tell me how I can match this string format:

07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007

and not any other, whether there are two dates or a dozen or so?

Regards,

Jeff
Jul 25 '07 #1
13 1351
miller
1,089 Recognized Expert Top Contributor
Expand|Select|Wrap|Line Numbers
  1. while (<DATA>) {
  2.     if (m{(\d{2}/\d{2}/\d{4})(?:\s*\1){2,}}) {
  3.         print $_;
  4.     }
  5. }
  6.  
  7. __DATA__
  8. 05/26/200706/03/200707/24/2007
  9. 05/26/2007 06/03/2007 07/24/2007
  10. 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  11.  
- Miller
Jul 25 '07 #2
numberwhun
3,509 Recognized Expert Moderator Specialist
Miller,

WOW!! I have been at this all day and my brain is sore. Thank you! Thank you!

Now, any chance of you telling me which part of that regex says that the dates have to be identical?

Regards,

Jeff
Jul 25 '07 #3
numberwhun
3,509 Recognized Expert Moderator Specialist
Ok, ran this against some data I put together really quickly and I think there may still be an issue.

The dates aren't the only thing on each line. There is text before and after the dates. They are just some of the fields. I say fields as the file being examined is kind of like a flat file db where each line is a record.

The test data I just ran against is:

### Begin data###

This is a line of text.
This is another line of text.
This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007 07/24/2007
07/24/2007 07/24/2007 07/24/2007 07/24/2007
This is the fifth line.
07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007

### End Data ###

Now, when I ran the regex against this, it printed:

Line Number 6 ==> 07/24/2007 07/24/2007 07/24/2007 07/24/2007

Problem is, the data printed is from line #4.

Regards,

Jeff
Jul 25 '07 #4
miller
1,089 Recognized Expert Top Contributor
Now, any chance of you telling me which part of that regex says that the dates have to be identical?
no problem at all.

Expand|Select|Wrap|Line Numbers
  1. while (<DATA>) {
  2.     if (m{
  3.         (\d{2}/\d{2}/\d{4})   # Capture date of the format MM/DD/YYYY or DD/MM/YYYY
  4.         (?:                   # Non-capturing group
  5.             \s*               # Arbitrary Spacing
  6.             \1                # Repetition of \1 date
  7.         ){2,}                 # 2 or more copies of group.  Implies 3 or more repetitions of \1
  8.     }xms) {
  9.         print $_;
  10.     }
  11. }
  12.  
  13. __DATA__
  14. 05/26/200706/03/200707/24/2007
  15. 05/26/2007 06/03/2007 07/24/2007
  16. 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  17.  
- Miller
Jul 25 '07 #5
miller
1,089 Recognized Expert Top Contributor
Now, when I ran the regex against this, it printed:
Line Number 6 ==> 07/24/2007 07/24/2007 07/24/2007 07/24/2007
Problem is, the data printed is from line #4.
Can't really help without knowing how you're calculating the line numbers of your data.

Expand|Select|Wrap|Line Numbers
  1. while (<DATA>) {
  2.     if (m{(\d{2}/\d{2}/\d{4})(?:\s*\1){2,}}xms) {
  3.         print "Line number $. => $_";
  4.     }
  5. }
  6.  
  7. __DATA__
  8. This is a line of text.
  9. This is another line of text.
  10. This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007
  11. 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  12. This is the fifth line.
  13. 07/24/2007 07/24/2007 07/24/2007
  14.  
Outputs:
Expand|Select|Wrap|Line Numbers
  1. >perl scratch.pl
  2. Line number 3 => This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007
  3. Line number 4 => 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  4. Line number 6 => 07/24/2007 07/24/2007 07/24/2007
  5.  
- Miller
Jul 25 '07 #6
numberwhun
3,509 Recognized Expert Moderator Specialist
Believe it or not, I am letting Perl do that for me with the $. variable, which is supposed to contain the line number of the most recently read string of data.

Actually, in looking at what you did, I see you added an "xms" as options to the regex. I didn't think of that, but that seemed to work somewhat. Now it outputs the 3 lines, he he he, but all with "Line Number 6". I am so disturbed right now. I have to jet home shortly, but will post my findings later. Thanks for your help Miller!!!

Regards,

Jeff
Jul 25 '07 #7
numberwhun
3,509 Recognized Expert Moderator Specialist
I changed something. I was cycling through the file using a foreach loop:

Expand|Select|Wrap|Line Numbers
  1. foreach my $line (<FILE>)
  2. {
  3.     code from above
  4. }
  5.  
On a hunch, I changed from using the foreach to the while loop instead and it seems to work fine. Now I am curious as to why the strange behavior with the foreach. I know that with a foreach I have seen inheritance issues and that has cause some issues, but with line numbers? That's just odd. If you have any thoughts, I would be open to them.

Thanks for all your assistance!

Regards,

Jeff
Jul 25 '07 #8
miller
1,089 Recognized Expert Top Contributor
The xms option shouldn't effect your line numbering. I simply included that from my commented version of the regex.

My guess is that you're using a foreach for your file interator instead of a while loop. This would cause the entire file to be slurped the first time it was called, and therefore the $. variable would be stuck at the last line value.

Always use while loops for file loop processing. If you need filtering, then use the next command inside the while loop.

- Miller
Jul 25 '07 #9
miller
1,089 Recognized Expert Top Contributor
I changed something. I was cycling through the file using a foreach loop:
Exactly. :) See explanation above.

- M
Jul 25 '07 #10
numberwhun
3,509 Recognized Expert Moderator Specialist
I have said it before and I will say it again. I learn something new every day being a Perl coder, and I LOVE IT!!

Thanks again Miller!!

Regards,

Jeff
Jul 25 '07 #11
KevinADC
4,059 Recognized Expert Specialist
I didn't read the entire thread so I may have missed something, but this regexp looks like it need to use the string anchors ^$ to match the repeated date and only the repeated date from start to finish of the line:

Expand|Select|Wrap|Line Numbers
  1. if (m{^(\d{2}/\d{2}/\d{4})(?:\s*\1){2,}$}) {
otherwise it is only matching a sub string and a different date at the beginning or end will still return true .
Jul 26 '07 #12
miller
1,089 Recognized Expert Top Contributor
Yes, if you read the entire thread, you'll see that anchors are not desired. He only wishes to match a substring with the given properties.

- Miller
Jul 26 '07 #13
KevinADC
4,059 Recognized Expert Specialist
Yes, if you read the entire thread, you'll see that anchors are no desired. He only wishes to match a substring with the given properties.

- Miller
OK, gotcha. :)

......
Jul 26 '07 #14

Sign in to post your reply or Sign up for a free account.

Similar topics

4
5476
by: | last post by:
Hi, I'm fairly new to regular expressions, and this may be a rather dumb question, but so far I haven't found the answer in any tutorial or reference yet... If I have f.i. the string "The...
3
1730
by: Greg Lindstrom | last post by:
Hello- I'm running Python 2.2.3 on Windows XP "Professional" and am reading a file wit 1 very long line of text (the line consists of multiple records with no cr/lf). What I would like to do is...
17
14017
by: Andrew McLean | last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success. Basically, I have two databases containing lists of...
2
1744
by: brice | last post by:
Hello, I am using using the following code to transform a memo xml file. I am using Internet Explorer 6.0 to transform and view the file as HTML. <!-- ***** BEGIN XML ***** --> <?xml...
10
4952
by: bpontius | last post by:
The GES Algorithm A Surprisingly Simple Algorithm for Parallel Pattern Matching "Partially because the best algorithms presented in the literature are difficult to understand and to implement,...
2
4760
by: William | last post by:
The script below runs correctly in ASP but not ASPX. Im not sure why? Please help. Description: An error occurred during the compilation of a resource required to service this request. Please...
7
1686
by: matteosartori | last post by:
Hi all, I've spent all morning trying to work this one out: I've got the following string: ...
7
3256
by: Kevin CH | last post by:
Hi, I'm currently running into a confusion on regex and hopefully you guys can clear it up for me. Suppose I have a regular expression (0|(1(01*0)*1))* and two test strings: 110_1011101_ and...
8
3891
by: 116Rohan | last post by:
I came across a question in one of the Computing olympiad regarding string pattern matching. Write a program that will accept a fraction of the form N/D, where N is the numerator and D is the...
0
6908
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7048
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7088
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6741
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
6956
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5342
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
2986
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1300
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
183
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.