473,789 Members | 2,668 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Matching line with 3 or more repetitions of the same date

numberwhun
3,509 Recognized Expert Moderator Specialist
Hello everyone!

I have a data file that contains miscellaneous information on each line. (Unfortunately, I cannot go into detail of the file layout as it is sensitive information), but I can say that on each line are dates in multiple positions. Some areas of a line contain multiple dates, strung together because there is no separation between the fields (ie: no space(s)):

ie: 05/26/200706/03/200707/24/2007

As you can see, there are 3 dates above. Some have fields following each other that are as above, only there is at least one or more spaces between the dates:

ie: 05/26/2007 06/03/2007 07/24/2007

This is where it gets a little hairy. In some files, there is an arbitrary string of dates. The string of dates would look as follows:

ie: 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007

There may be 2, 5, 7, or more dates, all strung together, but they are all the same date through the string on that line.

I am trying to write a regex that will match this arbitrarty string of identical dates, but unfortunately, it matches any string of multiple dates. Here is what I have so far:

Expand|Select|Wrap|Line Numbers
  1. if($line =~ m/(\d+\/\d+\/\d+\s*){2,}/)
  2. {
  3.     print("Line Number $. ==>  $line \n")
  4. }
  5.  
Can anyone please tell me how I can match this string format:

07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007

and not any other, whether there are two dates or a dozen or so?

Regards,

Jeff
Jul 25 '07 #1
13 1386
miller
1,089 Recognized Expert Top Contributor
Expand|Select|Wrap|Line Numbers
  1. while (<DATA>) {
  2.     if (m{(\d{2}/\d{2}/\d{4})(?:\s*\1){2,}}) {
  3.         print $_;
  4.     }
  5. }
  6.  
  7. __DATA__
  8. 05/26/200706/03/200707/24/2007
  9. 05/26/2007 06/03/2007 07/24/2007
  10. 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  11.  
- Miller
Jul 25 '07 #2
numberwhun
3,509 Recognized Expert Moderator Specialist
Miller,

WOW!! I have been at this all day and my brain is sore. Thank you! Thank you!

Now, any chance of you telling me which part of that regex says that the dates have to be identical?

Regards,

Jeff
Jul 25 '07 #3
numberwhun
3,509 Recognized Expert Moderator Specialist
Ok, ran this against some data I put together really quickly and I think there may still be an issue.

The dates aren't the only thing on each line. There is text before and after the dates. They are just some of the fields. I say fields as the file being examined is kind of like a flat file db where each line is a record.

The test data I just ran against is:

### Begin data###

This is a line of text.
This is another line of text.
This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007 07/24/2007
07/24/2007 07/24/2007 07/24/2007 07/24/2007
This is the fifth line.
07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007

### End Data ###

Now, when I ran the regex against this, it printed:

Line Number 6 ==> 07/24/2007 07/24/2007 07/24/2007 07/24/2007

Problem is, the data printed is from line #4.

Regards,

Jeff
Jul 25 '07 #4
miller
1,089 Recognized Expert Top Contributor
Now, any chance of you telling me which part of that regex says that the dates have to be identical?
no problem at all.

Expand|Select|Wrap|Line Numbers
  1. while (<DATA>) {
  2.     if (m{
  3.         (\d{2}/\d{2}/\d{4})   # Capture date of the format MM/DD/YYYY or DD/MM/YYYY
  4.         (?:                   # Non-capturing group
  5.             \s*               # Arbitrary Spacing
  6.             \1                # Repetition of \1 date
  7.         ){2,}                 # 2 or more copies of group.  Implies 3 or more repetitions of \1
  8.     }xms) {
  9.         print $_;
  10.     }
  11. }
  12.  
  13. __DATA__
  14. 05/26/200706/03/200707/24/2007
  15. 05/26/2007 06/03/2007 07/24/2007
  16. 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  17.  
- Miller
Jul 25 '07 #5
miller
1,089 Recognized Expert Top Contributor
Now, when I ran the regex against this, it printed:
Line Number 6 ==> 07/24/2007 07/24/2007 07/24/2007 07/24/2007
Problem is, the data printed is from line #4.
Can't really help without knowing how you're calculating the line numbers of your data.

Expand|Select|Wrap|Line Numbers
  1. while (<DATA>) {
  2.     if (m{(\d{2}/\d{2}/\d{4})(?:\s*\1){2,}}xms) {
  3.         print "Line number $. => $_";
  4.     }
  5. }
  6.  
  7. __DATA__
  8. This is a line of text.
  9. This is another line of text.
  10. This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007
  11. 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  12. This is the fifth line.
  13. 07/24/2007 07/24/2007 07/24/2007
  14.  
Outputs:
Expand|Select|Wrap|Line Numbers
  1. >perl scratch.pl
  2. Line number 3 => This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007
  3. Line number 4 => 07/24/2007 07/24/2007 07/24/2007 07/24/2007
  4. Line number 6 => 07/24/2007 07/24/2007 07/24/2007
  5.  
- Miller
Jul 25 '07 #6
numberwhun
3,509 Recognized Expert Moderator Specialist
Believe it or not, I am letting Perl do that for me with the $. variable, which is supposed to contain the line number of the most recently read string of data.

Actually, in looking at what you did, I see you added an "xms" as options to the regex. I didn't think of that, but that seemed to work somewhat. Now it outputs the 3 lines, he he he, but all with "Line Number 6". I am so disturbed right now. I have to jet home shortly, but will post my findings later. Thanks for your help Miller!!!

Regards,

Jeff
Jul 25 '07 #7
numberwhun
3,509 Recognized Expert Moderator Specialist
I changed something. I was cycling through the file using a foreach loop:

Expand|Select|Wrap|Line Numbers
  1. foreach my $line (<FILE>)
  2. {
  3.     code from above
  4. }
  5.  
On a hunch, I changed from using the foreach to the while loop instead and it seems to work fine. Now I am curious as to why the strange behavior with the foreach. I know that with a foreach I have seen inheritance issues and that has cause some issues, but with line numbers? That's just odd. If you have any thoughts, I would be open to them.

Thanks for all your assistance!

Regards,

Jeff
Jul 25 '07 #8
miller
1,089 Recognized Expert Top Contributor
The xms option shouldn't effect your line numbering. I simply included that from my commented version of the regex.

My guess is that you're using a foreach for your file interator instead of a while loop. This would cause the entire file to be slurped the first time it was called, and therefore the $. variable would be stuck at the last line value.

Always use while loops for file loop processing. If you need filtering, then use the next command inside the while loop.

- Miller
Jul 25 '07 #9
miller
1,089 Recognized Expert Top Contributor
I changed something. I was cycling through the file using a foreach loop:
Exactly. :) See explanation above.

- M
Jul 25 '07 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

4
5495
by: | last post by:
Hi, I'm fairly new to regular expressions, and this may be a rather dumb question, but so far I haven't found the answer in any tutorial or reference yet... If I have f.i. the string "The {{{{power of {{{{regular expressions}}}} comes from}}}} the ability to include alternatives and repetitions in the pattern." from which I want to extract chunks starting with "{{{{" and ending with "}}}}".
3
1761
by: Greg Lindstrom | last post by:
Hello- I'm running Python 2.2.3 on Windows XP "Professional" and am reading a file wit 1 very long line of text (the line consists of multiple records with no cr/lf). What I would like to do is scan for the occurrence of a specific pattern of characters which I expect to repeat many times in the file. Suppose I want to search for "Start: mm/dd/yy" and capture the mm/dd/yyyy data for processing each time I find it. This is the type of...
17
14074
by: Andrew McLean | last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success. Basically, I have two databases containing lists of postal addresses and need to look for matching addresses in the two databases. More precisely, for each address in database A I want to find a single matching address in database B. I'm 90% of the way there, in the sense that I have a simplistic...
2
1766
by: brice | last post by:
Hello, I am using using the following code to transform a memo xml file. I am using Internet Explorer 6.0 to transform and view the file as HTML. <!-- ***** BEGIN XML ***** --> <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="test.xsl"?> <MEMO ID="1"> <AUTHOR ID="1">
10
4985
by: bpontius | last post by:
The GES Algorithm A Surprisingly Simple Algorithm for Parallel Pattern Matching "Partially because the best algorithms presented in the literature are difficult to understand and to implement, knowledge of fast and practical algorithms is not commonplace." Hume and Sunday, "Fast String Searching", Software - Practice and Experience, Vol. 21 # 11, pp 1221-48
2
4783
by: William | last post by:
The script below runs correctly in ASP but not ASPX. Im not sure why? Please help. Description: An error occurred during the compilation of a resource required to service this request. Please review the following specific error details and modify your source code appropriately. Compiler Error Message: BC30081: 'If' must end with a matching 'End If'. Source Error:
7
1703
by: matteosartori | last post by:
Hi all, I've spent all morning trying to work this one out: I've got the following string: <td>04/01/2006</td><td>Wednesday</td><td>&nbsp;</td><td>09:14</td><td>12:44</td><td>12:50</td><td>17:58</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>08:14</td> from which I'm attempting to extract the date, and the five times from into a list. Only the very last time is guaranteed to be there so it
7
3275
by: Kevin CH | last post by:
Hi, I'm currently running into a confusion on regex and hopefully you guys can clear it up for me. Suppose I have a regular expression (0|(1(01*0)*1))* and two test strings: 110_1011101_ and _101101_1. (The underscores are not part of the string. They are added to show that both string has a substring that matches the pattern.) Applying a match() function on the first string returns true while false for the second. The difference...
8
3911
by: 116Rohan | last post by:
I came across a question in one of the Computing olympiad regarding string pattern matching. Write a program that will accept a fraction of the form N/D, where N is the numerator and D is the denominator, that prints out the decimal representation. If the decimal representation has a repeating sequence of digits, it should be indicated by enclosing it in brackets. For example, 1/3 = .33333333...is denoted as .(3), and 41/333 =...
0
9666
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9511
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10412
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10200
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10142
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9986
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9021
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5551
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2909
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.