By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,679 Members | 2,749 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,679 IT Pros & Developers. It's quick & easy.

more regex question how to avoid capturing leading empty lines

P: n/a
GS
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]
abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]

"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJMNOS][aceopu][
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s* (?<RefNbr
>\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s\ w*)*)\s\[(?<FromAcctNbr>\d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count 0) {
string strMatchGrpVarName = textBoxGroupName.Text.Replace(",", "
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";
//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}

thank you for your time and expertise
Aug 9 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
If you use the caret (^) character with RegexOptions.MultiLine, it will
match at the beginning of a line. You can use that in your individual
matches to specify the start of a line before the match.

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uy****************@TK2MSFTNGP04.phx.gbl...
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]
abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]

"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJMNOS][aceopu][
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s* (?<RefNbr
>>\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s \w*)*)\s\[(?<FromAcctNbr>\d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count 0) {
string strMatchGrpVarName = textBoxGroupName.Text.Replace(",",
"
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";
//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}

thank you for your time and expertise


Aug 10 '07 #2

P: n/a
GS
thank you . I tried
but I still get the extra empty or blank line

^(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s( ?:\[)(?<AcctNbr>\d*)\].{4,
8}^\s*(?:amount):\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJM
NOS][aceopu][bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\
:\s*(?<RefNbr>\d*)\s*.{2,4}^\s*(?:from\:\s)(?<From AcctName>\w{1,}(\s\w*)*)\s
\[(?<FromAcctNbr>\d*)\]

Right now I kluge by allowing user the option of removing all empty and
blank lines. when user check the Remove Blank Line check box, the
application will perform one more match result to remove any blank/empty
lines. It is klugy and crude and works

"Kevin Spencer" <un**********@nothinks.comwrote in message
news:%2****************@TK2MSFTNGP04.phx.gbl...
If you use the caret (^) character with RegexOptions.MultiLine, it will
match at the beginning of a line. You can use that in your individual
matches to specify the start of a line before the match.

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

"GS" <gs**********************@msnews.Nomail.comwrote in message
news:uy****************@TK2MSFTNGP04.phx.gbl...
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]
abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]

"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white
space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJMNOS][aceopu][
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s* (?<RefNbr
>
>>\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s \w*)*)\s\[(?<FromAcctNbr>\
d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName}
${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of
regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count 0) {
string strMatchGrpVarName =
textBoxGroupName.Text.Replace(",",
"
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";
//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}

thank you for your time and expertise


Aug 11 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.