473,396 Members | 2,036 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

more regex question how to avoid capturing leading empty lines

GS
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]
abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]

"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJMNOS][aceopu][
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s* (?<RefNbr
>\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s\ w*)*)\s\[(?<FromAcctNbr>\d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count 0) {
string strMatchGrpVarName = textBoxGroupName.Text.Replace(",", "
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";
//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}

thank you for your time and expertise
Aug 9 '07 #1
2 2800
If you use the caret (^) character with RegexOptions.MultiLine, it will
match at the beginning of a line. You can use that in your individual
matches to specify the start of a line before the match.

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uy****************@TK2MSFTNGP04.phx.gbl...
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]
abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]

"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJMNOS][aceopu][
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s* (?<RefNbr
>>\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s \w*)*)\s\[(?<FromAcctNbr>\d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count 0) {
string strMatchGrpVarName = textBoxGroupName.Text.Replace(",",
"
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";
//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}

thank you for your time and expertise


Aug 10 '07 #2
GS
thank you . I tried
but I still get the extra empty or blank line

^(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s( ?:\[)(?<AcctNbr>\d*)\].{4,
8}^\s*(?:amount):\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJM
NOS][aceopu][bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\
:\s*(?<RefNbr>\d*)\s*.{2,4}^\s*(?:from\:\s)(?<From AcctName>\w{1,}(\s\w*)*)\s
\[(?<FromAcctNbr>\d*)\]

Right now I kluge by allowing user the option of removing all empty and
blank lines. when user check the Remove Blank Line check box, the
application will perform one more match result to remove any blank/empty
lines. It is klugy and crude and works

"Kevin Spencer" <un**********@nothinks.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
If you use the caret (^) character with RegexOptions.MultiLine, it will
match at the beginning of a line. You can use that in your individual
matches to specify the start of a line before the match.

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uy****************@TK2MSFTNGP04.phx.gbl...
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]
abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]

"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white
space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJMNOS][aceopu][
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s* (?<RefNbr
>
>>\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s \w*)*)\s\[(?<FromAcctNbr>\
d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName}
${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of
regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count 0) {
string strMatchGrpVarName =
textBoxGroupName.Text.Replace(",",
"
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";
//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}

thank you for your time and expertise


Aug 11 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Andrew Dixon | last post by:
Hi Everyone. Ok I have a problem getting the following regex to work in Java. <script*>(.|\r|\n)+?</script> It works fine in EditPad Pro but in Java it causes the following error message...
354
by: Montrose... | last post by:
After working in c# for a year, the only conclusion I can come to is that I wish I knew c. All I need is Linux, the gnu c compiler and I can do anything. Web services are just open sockets...
33
by: Joerg Schuster | last post by:
Hello, Python regular expressions must not have more than 100 capturing groups. The source code responsible for this reads as follows: # XXX: <fl> get rid of this limitation! if...
2
by: rjb | last post by:
Hi there I have a text file which looks like the one below. I'm trying to write a program which will go through each of those lines (for each section separated by an empty line) and give me: ...
1
by: kevin | last post by:
I am trying to strip the outermost html tag by capturing this tag with regex and then using the string replace function to replace it with an empty string. while stepping through the code, RegEx...
5
by: JackRazz | last post by:
Anyone know the regular expression to match a blank line where the byte sequence is "0D 0A 0D 0A" ive tried "\r\n\r\n+", "^$+" "\n\r" with no success. Any Ideas? Thanks - JackRazz This is...
6
by: Mike Davies | last post by:
Hi Everyone, Is there a better way of doing the following? I have 2 lists. List 1 is a list of MAC addresses and List 2 is a list of regular expressions. A user is only allowed to view devices...
6
by: Lubomir | last post by:
Hi, I am using the following pattern: "\\b" + MySttring + "\\b" If MyString is "one", this should pick up whole words like "one". The problem is, it will pick up also the word: "one.two"...
8
by: sherifffruitfly | last post by:
Hi, I've been searching as best I can for this - coming up with little. I have a file that is full of lines fitting this pattern: (?<year>\d{4}),(?<amount>\d{6,7}) I'm likely to get a...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.