numberwhun 3,509
Recognized Expert Moderator Specialist
Hello everyone!
I have a data file that contains miscellaneous information on each line. (Unfortunately, I cannot go into detail of the file layout as it is sensitive information), but I can say that on each line are dates in multiple positions. Some areas of a line contain multiple dates, strung together because there is no separation between the fields (ie: no space(s)):
ie: 05/26/200706/03/200707/24/2007
As you can see, there are 3 dates above. Some have fields following each other that are as above, only there is at least one or more spaces between the dates:
ie: 05/26/2007 06/03/2007 07/24/2007
This is where it gets a little hairy. In some files, there is an arbitrary string of dates. The string of dates would look as follows:
ie: 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
There may be 2, 5, 7, or more dates, all strung together, but they are all the same date through the string on that line.
I am trying to write a regex that will match this arbitrarty string of identical dates, but unfortunately, it matches any string of multiple dates. Here is what I have so far: -
if($line =~ m/(\d+\/\d+\/\d+\s*){2,}/)
-
{
-
print("Line Number $. ==> $line \n")
-
}
-
Can anyone please tell me how I can match this string format:
07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
and not any other, whether there are two dates or a dozen or so?
Regards,
Jeff
13 1386 miller 1,089
Recognized Expert Top Contributor -
while (<DATA>) {
-
if (m{(\d{2}/\d{2}/\d{4})(?:\s*\1){2,}}) {
-
print $_;
-
}
-
}
-
-
__DATA__
-
05/26/200706/03/200707/24/2007
-
05/26/2007 06/03/2007 07/24/2007
-
07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
-
- Miller
numberwhun 3,509
Recognized Expert Moderator Specialist
Miller,
WOW!! I have been at this all day and my brain is sore. Thank you! Thank you!
Now, any chance of you telling me which part of that regex says that the dates have to be identical?
Regards,
Jeff
numberwhun 3,509
Recognized Expert Moderator Specialist
Ok, ran this against some data I put together really quickly and I think there may still be an issue.
The dates aren't the only thing on each line. There is text before and after the dates. They are just some of the fields. I say fields as the file being examined is kind of like a flat file db where each line is a record.
The test data I just ran against is:
### Begin data###
This is a line of text.
This is another line of text.
This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007 07/24/2007
07/24/2007 07/24/2007 07/24/2007 07/24/2007
This is the fifth line.
07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
### End Data ###
Now, when I ran the regex against this, it printed:
Line Number 6 ==> 07/24/2007 07/24/2007 07/24/2007 07/24/2007
Problem is, the data printed is from line #4.
Regards,
Jeff
miller 1,089
Recognized Expert Top Contributor
Now, any chance of you telling me which part of that regex says that the dates have to be identical?
no problem at all. -
while (<DATA>) {
-
if (m{
-
(\d{2}/\d{2}/\d{4}) # Capture date of the format MM/DD/YYYY or DD/MM/YYYY
-
(?: # Non-capturing group
-
\s* # Arbitrary Spacing
-
\1 # Repetition of \1 date
-
){2,} # 2 or more copies of group. Implies 3 or more repetitions of \1
-
}xms) {
-
print $_;
-
}
-
}
-
-
__DATA__
-
05/26/200706/03/200707/24/2007
-
05/26/2007 06/03/2007 07/24/2007
-
07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007 07/24/2007
-
- Miller
miller 1,089
Recognized Expert Top Contributor
Now, when I ran the regex against this, it printed:
Line Number 6 ==> 07/24/2007 07/24/2007 07/24/2007 07/24/2007
Problem is, the data printed is from line #4.
Can't really help without knowing how you're calculating the line numbers of your data. -
while (<DATA>) {
-
if (m{(\d{2}/\d{2}/\d{4})(?:\s*\1){2,}}xms) {
-
print "Line number $. => $_";
-
}
-
}
-
-
__DATA__
-
This is a line of text.
-
This is another line of text.
-
This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007
-
07/24/2007 07/24/2007 07/24/2007 07/24/2007
-
This is the fifth line.
-
07/24/2007 07/24/2007 07/24/2007
-
Outputs: -
>perl scratch.pl
-
Line number 3 => This is the thirs line of text. 07/24/2007 07/24/2007 07/24/2007
-
Line number 4 => 07/24/2007 07/24/2007 07/24/2007 07/24/2007
-
Line number 6 => 07/24/2007 07/24/2007 07/24/2007
-
- Miller
numberwhun 3,509
Recognized Expert Moderator Specialist
Believe it or not, I am letting Perl do that for me with the $. variable, which is supposed to contain the line number of the most recently read string of data.
Actually, in looking at what you did, I see you added an "xms" as options to the regex. I didn't think of that, but that seemed to work somewhat. Now it outputs the 3 lines, he he he, but all with "Line Number 6". I am so disturbed right now. I have to jet home shortly, but will post my findings later. Thanks for your help Miller!!!
Regards,
Jeff
numberwhun 3,509
Recognized Expert Moderator Specialist
I changed something. I was cycling through the file using a foreach loop: -
foreach my $line (<FILE>)
-
{
-
code from above
-
}
-
On a hunch, I changed from using the foreach to the while loop instead and it seems to work fine. Now I am curious as to why the strange behavior with the foreach. I know that with a foreach I have seen inheritance issues and that has cause some issues, but with line numbers? That's just odd. If you have any thoughts, I would be open to them.
Thanks for all your assistance!
Regards,
Jeff
miller 1,089
Recognized Expert Top Contributor
The xms option shouldn't effect your line numbering. I simply included that from my commented version of the regex.
My guess is that you're using a foreach for your file interator instead of a while loop. This would cause the entire file to be slurped the first time it was called, and therefore the $. variable would be stuck at the last line value.
Always use while loops for file loop processing. If you need filtering, then use the next command inside the while loop.
- Miller
miller 1,089
Recognized Expert Top Contributor
I changed something. I was cycling through the file using a foreach loop:
Exactly. :) See explanation above.
- M
Sign in to post your reply or Sign up for a free account.
Similar topics |
by: |
last post by:
Hi,
I'm fairly new to regular expressions, and this may be a rather dumb
question, but so far I haven't found the answer in any tutorial or reference
yet...
If I have f.i. the string "The {{{{power of {{{{regular expressions}}}}
comes from}}}} the ability to include alternatives and repetitions in the
pattern." from which I want to extract chunks starting with "{{{{" and
ending with "}}}}".
|
by: Greg Lindstrom |
last post by:
Hello-
I'm running Python 2.2.3 on Windows XP "Professional" and am reading a file
wit 1 very long line of text (the line consists of multiple records with no
cr/lf). What I would like to do is scan for the occurrence of a specific
pattern of characters which I expect to repeat many times in the file.
Suppose I want to search for "Start: mm/dd/yy" and capture the mm/dd/yyyy
data for processing each time I find it. This is the type of...
|
by: Andrew McLean |
last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if
there is any code available to help. I've Googled without success.
Basically, I have two databases containing lists of postal addresses and
need to look for matching addresses in the two databases. More
precisely, for each address in database A I want to find a single
matching address in database B.
I'm 90% of the way there, in the sense that I have a simplistic...
|
by: brice |
last post by:
Hello,
I am using using the following code to transform a memo xml file. I am
using Internet Explorer 6.0 to transform and view the file as HTML.
<!-- ***** BEGIN XML ***** -->
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<MEMO ID="1">
<AUTHOR ID="1">
|
by: bpontius |
last post by:
The GES Algorithm
A Surprisingly Simple Algorithm for Parallel Pattern Matching
"Partially because the best algorithms presented in the literature
are difficult to understand and to implement, knowledge of fast and
practical algorithms is not commonplace."
Hume and Sunday, "Fast String Searching", Software - Practice
and Experience, Vol. 21 # 11, pp 1221-48
| |
by: William |
last post by:
The script below runs correctly in ASP but not ASPX. Im not sure why?
Please help.
Description: An error occurred during the compilation of a resource required
to service this request. Please review the following specific error details
and modify your source code appropriately.
Compiler Error Message: BC30081: 'If' must end with a matching 'End If'.
Source Error:
|
by: matteosartori |
last post by:
Hi all,
I've spent all morning trying to work this one out:
I've got the following string:
<td>04/01/2006</td><td>Wednesday</td><td> </td><td>09:14</td><td>12:44</td><td>12:50</td><td>17:58</td><td> </td><td> </td><td> </td><td> </td><td>08:14</td>
from which I'm attempting to extract the date, and the five times from
into a list. Only the very last time is guaranteed to be there so it
|
by: Kevin CH |
last post by:
Hi,
I'm currently running into a confusion on regex and hopefully you guys
can clear it up for me.
Suppose I have a regular expression (0|(1(01*0)*1))* and two test
strings: 110_1011101_ and _101101_1. (The underscores are not part of
the string. They are added to show that both string has a substring
that matches the pattern.) Applying a match() function on the first
string returns true while false for the second. The difference...
|
by: 116Rohan |
last post by:
I came across a question in one of the Computing olympiad regarding
string pattern matching.
Write a program that will accept a fraction of the form N/D, where N is
the numerator and D is the denominator, that prints out the decimal
representation. If the decimal representation has a repeating sequence
of digits, it should be indicated by enclosing it in brackets. For
example, 1/3 = .33333333...is denoted as .(3), and 41/333 =...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |