473,385 Members | 1,185 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Regex: Could this pattern be more efficient?

I have an Intel hex file I need to parse. I want to run a regex on each
line to get the separate sections.
the format is like this:
:llaaaatt[d...]cc
where:
: - starts the record
ll - is the length of the data section([d...]) in hex
aaaa - is the address of the data in hex
tt - is the type in hex
[d...] are the data bytes in hex, this is a variable length section
cc - checksum in hex

So I need a pattern that will separate all the sections. I can get most of
them, but the variable data section I'm not sure. it basically will start
at index 9 and be ll long.

I'm thinking something like this (*note: I don't need section tt):
@":(?<ll>(\w{2}))(?<aaaa>(\w{4}))\w{2}(?<d>(\w+))( ?<cc>(\w{2}))";

It works, but I'm very new to Regex and not sure if I could do this a better
way. Do you see any improvements that could be made?
Thanks for reading!
Steve
Apr 18 '06 #1
2 2197
In article <O7**************@TK2MSFTNGP04.phx.gbl>,
sklett <sk****@mddirect.com> wrote:

: I have an Intel hex file I need to parse. I want to run a regex on each
: line to get the separate sections.
: the format is like this:
: :llaaaatt[d...]cc
: where:
: : - starts the record
: ll - is the length of the data section([d...]) in hex
: aaaa - is the address of the data in hex
: tt - is the type in hex
: [d...] are the data bytes in hex, this is a variable length section
: cc - checksum in hex
:
: So I need a pattern that will separate all the sections. I can get
: most of them, but the variable data section I'm not sure. it basically
: will start at index 9 and be ll long.
:
: I'm thinking something like this (*note: I don't need section tt):
: @":(?<ll>(\w{2}))(?<aaaa>(\w{4}))\w{2}(?<d>(\w+))( ?<cc>(\w{2}))";
:
: It works, but I'm very new to Regex and not sure if I could do this a
: better way. Do you see any improvements that could be made?

If you upcase your input, you could use

Regex pattern = new Regex(
@"
^
:
(?<ll> [\dA-F][\dA-F])
(?<aaaa> [\dA-F][\dA-F][\dA-F][\dA-F])
(?<tt> 0[0124])
(?<dd> ([\dA-F][\dA-F])+)
(?<cc> [\dA-F][\dA-F])
$
",
RegexOptions.IgnorePatternWhitespace |
RegexOptions.ExplicitCapture);

Note that you'd still need to verify the checksum.

The technique here is to specify "bookends" to bracket the portion
whose length you don't know ahead of time, and the data field has
to be whatever is in between.

The left bookend is the beginning of string, the colon, the length,
the address, and the type -- all with known lengths.

Then the plus quantifier in the dd subpattern (which matches one or
more of the preceding pattern -- pairs of hex digits in this case)
allows enough elasticity to grab only the variable-length portion of
the record.

Finally, the right bookend is the last byte in the record.

I hope this helps.

Greg
Apr 20 '06 #2
Very cool, Greg! Thank you for this thorough explanation and example, I
appreciate it!
Have a great weekend.
"Greg Bacon" <gb****@hiwaay.net> wrote in message
news:12*************@corp.supernews.com...
In article <O7**************@TK2MSFTNGP04.phx.gbl>,
sklett <sk****@mddirect.com> wrote:

: I have an Intel hex file I need to parse. I want to run a regex on each
: line to get the separate sections.
: the format is like this:
: :llaaaatt[d...]cc
: where:
: : - starts the record
: ll - is the length of the data section([d...]) in hex
: aaaa - is the address of the data in hex
: tt - is the type in hex
: [d...] are the data bytes in hex, this is a variable length section
: cc - checksum in hex
:
: So I need a pattern that will separate all the sections. I can get
: most of them, but the variable data section I'm not sure. it basically
: will start at index 9 and be ll long.
:
: I'm thinking something like this (*note: I don't need section tt):
: @":(?<ll>(\w{2}))(?<aaaa>(\w{4}))\w{2}(?<d>(\w+))( ?<cc>(\w{2}))";
:
: It works, but I'm very new to Regex and not sure if I could do this a
: better way. Do you see any improvements that could be made?

If you upcase your input, you could use

Regex pattern = new Regex(
@"
^
:
(?<ll> [\dA-F][\dA-F])
(?<aaaa> [\dA-F][\dA-F][\dA-F][\dA-F])
(?<tt> 0[0124])
(?<dd> ([\dA-F][\dA-F])+)
(?<cc> [\dA-F][\dA-F])
$
",
RegexOptions.IgnorePatternWhitespace |
RegexOptions.ExplicitCapture);

Note that you'd still need to verify the checksum.

The technique here is to specify "bookends" to bracket the portion
whose length you don't know ahead of time, and the data field has
to be whatever is in between.

The left bookend is the beginning of string, the colon, the length,
the address, and the type -- all with known lengths.

Then the plus quantifier in the dd subpattern (which matches one or
more of the preceding pattern -- pairs of hex digits in this case)
allows enough elasticity to grab only the variable-length portion of
the record.

Finally, the right bookend is the last byte in the record.

I hope this helps.

Greg

Apr 21 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Sebastian Araya | last post by:
Hello, I have a string like this: var1: value1...valueI var2: value1...valueJ ... varN: value1...valueK this is an example: breakfast: coffee eggs lunch: sandwich apple dinner: chicken...
16
by: Andrew Baker | last post by:
I am trying to write a function which provides my users with a file filter. The filter used to work just using the VB "Like" comparision, but I can't find the equivilant in C#. I looked at...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
4
by: Tom | last post by:
I have string that is 2.5 million bytes long. I tried using Regular Expressions to look for patterns and replace the pattern found with a pre-defined text. This works great on some computers...
6
by: Jake Barnes | last post by:
This function has always worked for me just fine: function nl2br_js(myString){ // 02-18-06 - this function imitates the PHP command nl2br, which finds newlines in a string // and replaces them...
5
by: redamazon200 | last post by:
I am looking for a way to copy a pattern (letter 'A' in the following example) to another string. string str1 = "1111AAAA111111AA"; string str2 = "1111000000001111"; After the copy str2...
10
by: igor.kulkin | last post by:
I have a small utility program written in Python which works pretty slow so I've decided to implement it in C. I did some benchmarking of Python's code performance. One of the parts of the program...
16
by: Mark Chambers | last post by:
Hi there, I'm seeking opinions on the use of regular expression searching. Is there general consensus on whether it's now a best practice to rely on this rather than rolling your own (string)...
7
by: Nightcrawler | last post by:
Hi all, I am trying to use regular expressions to parse out mp3 titles into three different groups (artist, title and remix). I currently have three ways to name a mp3 file: Artist - Title ...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.