By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,910 Members | 1,036 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,910 IT Pros & Developers. It's quick & easy.

Regular expression to read multiple groups of text from a text file.

P: 1
I have an EDI file whose structure is given below. This file has multiple records, each record contains a header(e.g EDI.DD.0000000001.20130809), then contents (i.e multiple paragraphs of text) and then footer (e.g End of Report/No EDI Activity). I have to read that entire file using regular expression using three groups.

I am using following regular expression to read the file.

(?<header>[A-Z]{3}.[A-Z]{2}.[0-9]{10}.[0-9]{8}) | (?<footer> \b(End\sof\sReport|No\sEDI\sActivity)\b) |

(?<content>(?<=\k<header>).*(?=\k<footer>))

That expression reads the "header" and "footer" in respective groups properly but didn't pick the contents between header and footer in "contents" group.

I have changed the font of header and footer in below file to help to understand the format. I am using asp.net 3.5 framework.

Thanks for your help in advance.

//------------------Start of EDI File---------------------//

EDI.DD.0000000001.20130809

ORIGINATOR INFORMATION Company Name: UNITED HEALTHCAR Identification: 9024125001 Originating DFI: 002100002

RECEIVER INFORMATION Receiver Name: HEALTH & WELLNESS DFI Account Number: 0000000000000001 Receiving DFI ID: 434343430 ID Number: Transaction Type: 22 Deposit

ORIGINATOR INFORMATION Company Name: BLUE CHOICE Identification: 9024125001

End of Report

EDI.DD.0006578987.20130809

No EDI Activity

EDI.SV.0000000555.20130809

ORIGINATOR INFORMATION Company Name: Univ of Florida Identification: A426004813 Originating DFI: 004200001

TRANSACTION INFORMATION
Entry Description: vndr pymnt Entry Class Code: CTX Service Class Code: ACH Entries Mixed

REMITTANCE ADVICE ACCOUNTS
RECEIVABLE OPEN ITEM REFERENCE
Seller's Invoice Number: 10016 Pmt Action Code: Amount Paid: $800.00 Amount of Invoice: Amount of Discount:

End of Report

//--------------------End of file------------------------//
Mar 14 '14 #1
Share this Question
Share on Google+
1 Reply


P: 2
Note: (?<contents>.*?) -- This is a non-greedy match meaning the .* does not try to match the whole string. The RegexOptions.Singleline option treats the input to the Matches function as one long string instead of multiple lines.

Expand|Select|Wrap|Line Numbers
  1. Regex re = new Regex(
  2. "(?<header>[A-Z]{3}.[A-Z]{2}.[0-9]{10}.[0-9]{8})" +
  3. "(?<contents>.*?)" +
  4. "(?<footer>(End of Report|No EDI Activity))",
  5. RegexOptions.Singleline);
  6.  
  7. MatchCollection matches = re.Matches(fileData);
  8.  
  9. foreach(Match m in matches) {
  10.    Console.WriteLine("Header: {0}, Body Length: {1}, Footer: {2}",
  11.       m.Groups["header"].Value,
  12.       m.Groups["contents"].Value.Length,
  13.       m.Groups["footer"].Value);
  14. }
Run results:
Expand|Select|Wrap|Line Numbers
  1. Header: EDI.DD.0000000001.20130809, Body Length: 356, Footer: End of Report
  2. Header: EDI.DD.0006578987.20130809, Body Length: 4, Footer: No EDI Activity
  3. Header: EDI.SV.0000000555.20130809, Body Length: 403, Footer: End of Report
Mar 19 '14 #2

Post your reply

Sign in to post your reply or Sign up for a free account.