473,386 Members | 2,114 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Regular Expressions Help

Hi,

I have this single line of text, and need to extract data from it

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE ((("TEXT" "PLAIN" ("CHARSET"
"ISO-8859-1" "FORMAT" "flowed") NIL NIL "7BIT" 44 1 NIL NIL NIL
NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 330 10 NIL
NIL NIL NIL) "ALTERNATIVE" ("BOUNDARY"
"------------040101030402070003040902") NIL NIL NIL)("IMAGE" "JPEG"
("NAME" "Blue hills.jpg") NIL NIL "BASE64" 39084 NIL ("INLINE"
("FILENAME" "Blue hills.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME"
"Sunset.jpg") NIL NIL "BASE64" 97556 NIL ("INLINE" ("FILENAME"
"Sunset.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL
NIL "BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)("IMAGE" "JPEG" ("NAME" "Winter.jpg") NIL NIL "BASE64" 144632 NIL
("INLINE" ("FILENAME" "Winter.jpg")) NIL NIL) "MIXED" ("BOUNDARY"
"------------090206040706060704050905") NIL NIL NIL))

This line can be divided in similar parts

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE
(
(
("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)
("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIVE" ("BOUNDARY" "------------040101030402070003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060704050905") NIL
NIL NIL))

Now I need to from 'each line', I need to extract some
data(arguments). These include the argument numbers 1, 2, 3, 6.

In the data given these are 1=TEXT, 2=PLAIN, 3=("CHARSET" ISO-88591-1"
FORMAT" "flowed"), and 6=7BIT
1-IMAGE, 2=JPEG, 3=("NAME" "BLUE hills.jpg"), 6=BASe64
etc

Is this possible to do with regex?
Can someone help me out
Thanks in Advance
Nov 17 '05 #1
3 1723

I would do two string splits.

string[] lines = YourLine.Split(")(");

Then for each element in lines do another string .Split(" ") ( because to
me each line looks like a row of fields delimited by a space ).

You now have a two dimensional array that you can easily extract the values
from. If you want to make it cleaner, delete all the remaining (s and )s
from the array values.

Xarky wrote:
Hi,

I have this single line of text, and need to extract data from it

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE ((("TEXT" "PLAIN" ("CHARSET"
"ISO-8859-1" "FORMAT" "flowed") NIL NIL "7BIT" 44 1 NIL NIL NIL
NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 330 10 NIL
NIL NIL NIL) "ALTERNATIVE" ("BOUNDARY"
"------------040101030402070003040902") NIL NIL NIL)("IMAGE" "JPEG"
("NAME" "Blue hills.jpg") NIL NIL "BASE64" 39084 NIL ("INLINE"
("FILENAME" "Blue hills.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME"
"Sunset.jpg") NIL NIL "BASE64" 97556 NIL ("INLINE" ("FILENAME"
"Sunset.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL
NIL "BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)("IMAGE" "JPEG" ("NAME" "Winter.jpg") NIL NIL "BASE64" 144632 NIL
("INLINE" ("FILENAME" "Winter.jpg")) NIL NIL) "MIXED" ("BOUNDARY"
"------------090206040706060704050905") NIL NIL NIL))

This line can be divided in similar parts

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE
(
(
("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)
("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIVE" ("BOUNDARY" "------------040101030402070003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060704050905") NIL
NIL NIL))

Now I need to from 'each line', I need to extract some
data(arguments). These include the argument numbers 1, 2, 3, 6.

In the data given these are 1=TEXT, 2=PLAIN, 3=("CHARSET" ISO-88591-1"
FORMAT" "flowed"), and 6=7BIT
1-IMAGE, 2=JPEG, 3=("NAME" "BLUE hills.jpg"), 6=BASe64
etc

Is this possible to do with regex?
Can someone help me out
Thanks in Advance


--
Texeme
http://www.texeme.com
Nov 17 '05 #2
ps -- someone who knows LISP could probably do it much easier ;)

John Bailo wrote:

I would do two string splits.

string[] lines = YourLine.Split(")(");

Then for each element in lines do another string .Split(" ") ( because to
me each line looks like a row of fields delimited by a space ).

You now have a two dimensional array that you can easily extract the
values
from. If you want to make it cleaner, delete all the remaining (s and )s
from the array values.

Xarky wrote:
Hi,

I have this single line of text, and need to extract data from it

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE ((("TEXT" "PLAIN" ("CHARSET"
"ISO-8859-1" "FORMAT" "flowed") NIL NIL "7BIT" 44 1 NIL NIL NIL
NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 330 10 NIL
NIL NIL NIL) "ALTERNATIVE" ("BOUNDARY"
"------------040101030402070003040902") NIL NIL NIL)("IMAGE" "JPEG"
("NAME" "Blue hills.jpg") NIL NIL "BASE64" 39084 NIL ("INLINE"
("FILENAME" "Blue hills.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME"
"Sunset.jpg") NIL NIL "BASE64" 97556 NIL ("INLINE" ("FILENAME"
"Sunset.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL
NIL "BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)("IMAGE" "JPEG" ("NAME" "Winter.jpg") NIL NIL "BASE64" 144632 NIL
("INLINE" ("FILENAME" "Winter.jpg")) NIL NIL) "MIXED" ("BOUNDARY"
"------------090206040706060704050905") NIL NIL NIL))

This line can be divided in similar parts

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE
(
(
("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)
("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIVE" ("BOUNDARY" "------------040101030402070003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060704050905") NIL
NIL NIL))

Now I need to from 'each line', I need to extract some
data(arguments). These include the argument numbers 1, 2, 3, 6.

In the data given these are 1=TEXT, 2=PLAIN, 3=("CHARSET" ISO-88591-1"
FORMAT" "flowed"), and 6=7BIT
1-IMAGE, 2=JPEG, 3=("NAME" "BLUE hills.jpg"), 6=BASe64
etc

Is this possible to do with regex?
Can someone help me out
Thanks in Advance


--
Texeme
http://www.texeme.com
Nov 17 '05 #3
In message <bc*************************@posting.google.com> , Xarky
<be*********@yahoo.com> writes
Hi,

I have this single line of text, and need to extract data from it
Is this possible to do with regex?


Yes.

No doubt a regex guru could write a single monster expression which
would pull all of the values out in a useful way.

I'm not a regex guru, so I'll tell you how I'd approach it. You seem to
have repeating groups, each group containing a set of data you want to
extract. As a first step, I'd work out a regex which matches each of
those. i.e.

("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)

("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIVE" ("BOUNDARY" "------------040101030402070003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060704050905") NIL NIL
NIL))

I would then iterate through those matches and use another regex to
parse the values out of each of them.

The difficult bit is working out how to match the start and end of each
group, which needs more knowledge of what can occur in the file. The
obvious thing that occurs to me is to match ("TEXT" | ("IMAGE" followed
by any sequence of characters which are not ("TEXT" | ("IMAGE".

So, and this is air code, you want something along the lines of

class Groups
{
ArrayList groupsCollection = new ArrayList();
const string GROUP_PATTERN = "";
public Groups(string sourceText)
{
foreach(Match m in Regex.Matches(sourceText, GROUP_PATTERN))
{
Group group = new Group(m.Value);
this.groupsCollection.Add(group);
}
}
}
class Group
{
string arg1;
string arg2;
string arg3;
string arg6;
const string PARAM_PATTERN = "";
public Group(string groupText)
{
MatchCollection matches = Regex.Matches(
groupText, PARAM_PATTERN);
this.arg1 = matches[0];
this.arg2 = matches[1];
this.arg3 = matches[2];
this.arg6 = matches[5];
}
}

GROUP_PATTERN needs to be something along the lines of "x[^x]*" where x
matches the start of a group. PARAM_PATTERN needs to match groups of
quoted text or the string "NIL".

That's how I'd do it, anyway.

--
Steve Walker
Nov 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being...
4
by: GenoJoe | last post by:
If you are not new to VB.NET but are new to regular expressions, you need to get a free copy of "Pragmatic Guide to Regular Expressions for VB.NET Programmers". I wrote this guide because all of the...
2
by: cleo | last post by:
I'm experimenting with Regular Expressions and Windows Forms. Frequently I want a value to be either a valid pattern or empty. For example a Zip code must be 5 digits or may be empty. I know that...
7
by: norton | last post by:
Hello, Does any one know how to extact the following text into 4 different groups(namely Date, Artist, Album and Quality)? - Artist - Album Artist - Album - Artist - Album - Artist -...
4
by: lucky | last post by:
hi there!! i'm looking for a code snipett wich help me to search some words into a particular string and replace with a perticular word. i got a huge data string in which searching traditional...
4
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
3
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
3
by: Zeba | last post by:
Hi guys, I need some help regarding regular expressions. Consider the following statement : System.Text.RegularExpressions.Match match =...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.