473,883 Members | 2,116 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regular Expressions Help

Hi,

I have this single line of text, and need to extract data from it

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE ((("TEXT" "PLAIN" ("CHARSET"
"ISO-8859-1" "FORMAT" "flowed") NIL NIL "7BIT" 44 1 NIL NIL NIL
NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 330 10 NIL
NIL NIL NIL) "ALTERNATIV E" ("BOUNDARY"
"------------040101030402070 003040902") NIL NIL NIL)("IMAGE" "JPEG"
("NAME" "Blue hills.jpg") NIL NIL "BASE64" 39084 NIL ("INLINE"
("FILENAME" "Blue hills.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME"
"Sunset.jpg ") NIL NIL "BASE64" 97556 NIL ("INLINE" ("FILENAME"
"Sunset.jpg ")) NIL NIL)("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL
NIL "BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)("IMAGE" "JPEG" ("NAME" "Winter.jpg ") NIL NIL "BASE64" 144632 NIL
("INLINE" ("FILENAME" "Winter.jpg ")) NIL NIL) "MIXED" ("BOUNDARY"
"------------090206040706060 704050905") NIL NIL NIL))

This line can be divided in similar parts

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE
(
(
("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)
("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIV E" ("BOUNDARY" "------------040101030402070 003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg ") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg ")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg ") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg ")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060 704050905") NIL
NIL NIL))

Now I need to from 'each line', I need to extract some
data(arguments) . These include the argument numbers 1, 2, 3, 6.

In the data given these are 1=TEXT, 2=PLAIN, 3=("CHARSET" ISO-88591-1"
FORMAT" "flowed"), and 6=7BIT
1-IMAGE, 2=JPEG, 3=("NAME" "BLUE hills.jpg"), 6=BASe64
etc

Is this possible to do with regex?
Can someone help me out
Thanks in Advance
Nov 17 '05 #1
3 1759

I would do two string splits.

string[] lines = YourLine.Split( ")(");

Then for each element in lines do another string .Split(" ") ( because to
me each line looks like a row of fields delimited by a space ).

You now have a two dimensional array that you can easily extract the values
from. If you want to make it cleaner, delete all the remaining (s and )s
from the array values.

Xarky wrote:
Hi,

I have this single line of text, and need to extract data from it

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE ((("TEXT" "PLAIN" ("CHARSET"
"ISO-8859-1" "FORMAT" "flowed") NIL NIL "7BIT" 44 1 NIL NIL NIL
NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 330 10 NIL
NIL NIL NIL) "ALTERNATIV E" ("BOUNDARY"
"------------040101030402070 003040902") NIL NIL NIL)("IMAGE" "JPEG"
("NAME" "Blue hills.jpg") NIL NIL "BASE64" 39084 NIL ("INLINE"
("FILENAME" "Blue hills.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME"
"Sunset.jpg ") NIL NIL "BASE64" 97556 NIL ("INLINE" ("FILENAME"
"Sunset.jpg ")) NIL NIL)("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL
NIL "BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)("IMAGE" "JPEG" ("NAME" "Winter.jpg ") NIL NIL "BASE64" 144632 NIL
("INLINE" ("FILENAME" "Winter.jpg ")) NIL NIL) "MIXED" ("BOUNDARY"
"------------090206040706060 704050905") NIL NIL NIL))

This line can be divided in similar parts

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE
(
(
("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)
("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIV E" ("BOUNDARY" "------------040101030402070 003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg ") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg ")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg ") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg ")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060 704050905") NIL
NIL NIL))

Now I need to from 'each line', I need to extract some
data(arguments) . These include the argument numbers 1, 2, 3, 6.

In the data given these are 1=TEXT, 2=PLAIN, 3=("CHARSET" ISO-88591-1"
FORMAT" "flowed"), and 6=7BIT
1-IMAGE, 2=JPEG, 3=("NAME" "BLUE hills.jpg"), 6=BASe64
etc

Is this possible to do with regex?
Can someone help me out
Thanks in Advance


--
Texeme
http://www.texeme.com
Nov 17 '05 #2
ps -- someone who knows LISP could probably do it much easier ;)

John Bailo wrote:

I would do two string splits.

string[] lines = YourLine.Split( ")(");

Then for each element in lines do another string .Split(" ") ( because to
me each line looks like a row of fields delimited by a space ).

You now have a two dimensional array that you can easily extract the
values
from. If you want to make it cleaner, delete all the remaining (s and )s
from the array values.

Xarky wrote:
Hi,

I have this single line of text, and need to extract data from it

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE ((("TEXT" "PLAIN" ("CHARSET"
"ISO-8859-1" "FORMAT" "flowed") NIL NIL "7BIT" 44 1 NIL NIL NIL
NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 330 10 NIL
NIL NIL NIL) "ALTERNATIV E" ("BOUNDARY"
"------------040101030402070 003040902") NIL NIL NIL)("IMAGE" "JPEG"
("NAME" "Blue hills.jpg") NIL NIL "BASE64" 39084 NIL ("INLINE"
("FILENAME" "Blue hills.jpg")) NIL NIL)("IMAGE" "JPEG" ("NAME"
"Sunset.jpg ") NIL NIL "BASE64" 97556 NIL ("INLINE" ("FILENAME"
"Sunset.jpg ")) NIL NIL)("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL
NIL "BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)("IMAGE" "JPEG" ("NAME" "Winter.jpg ") NIL NIL "BASE64" 144632 NIL
("INLINE" ("FILENAME" "Winter.jpg ")) NIL NIL) "MIXED" ("BOUNDARY"
"------------090206040706060 704050905") NIL NIL NIL))

This line can be divided in similar parts

* 15 FETCH (FLAGS (\Seen) BODYSTRUCTURE
(
(
("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)
("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIV E" ("BOUNDARY" "------------040101030402070 003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg ") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg ")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg ") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg ")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060 704050905") NIL
NIL NIL))

Now I need to from 'each line', I need to extract some
data(arguments) . These include the argument numbers 1, 2, 3, 6.

In the data given these are 1=TEXT, 2=PLAIN, 3=("CHARSET" ISO-88591-1"
FORMAT" "flowed"), and 6=7BIT
1-IMAGE, 2=JPEG, 3=("NAME" "BLUE hills.jpg"), 6=BASe64
etc

Is this possible to do with regex?
Can someone help me out
Thanks in Advance


--
Texeme
http://www.texeme.com
Nov 17 '05 #3
In message <bc************ *************@p osting.google.c om>, Xarky
<be*********@ya hoo.com> writes
Hi,

I have this single line of text, and need to extract data from it
Is this possible to do with regex?


Yes.

No doubt a regex guru could write a single monster expression which
would pull all of the values out in a useful way.

I'm not a regex guru, so I'll tell you how I'd approach it. You seem to
have repeating groups, each group containing a set of data you want to
extract. As a first step, I'd work out a regex which matches each of
those. i.e.

("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1" "FORMAT" "flowed") NIL NIL
"7BIT" 44 1 NIL NIL NIL
NIL)

("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL
"7BIT" 330 10 NIL NIL NIL
NIL) "ALTERNATIV E" ("BOUNDARY" "------------040101030402070 003040902")
NIL NIL NIL)
("IMAGE" "JPEG" ("NAME" "Blue hills.jpg") NIL NIL
"BASE64" 39084 NIL ("INLINE" ("FILENAME" "Blue hills.jpg")) NIL NIL)
("IMAGE" "JPEG" ("NAME" "Sunset.jpg ") NIL NIL
"BASE64" 97556 NIL ("INLINE" ("FILENAME" "Sunset.jpg ")) NIL
NIL)
("IMAGE" "JPEG" ("NAME" "Water lilies.jpg") NIL NIL
"BASE64" 114830 NIL ("INLINE" ("FILENAME" "Water lilies.jpg")) NIL NIL)
("IMAGE" "JPEG" ("NAME" "Winter.jpg ") NIL NIL
"BASE64" 144632 NIL ("INLINE" ("FILENAME" "Winter.jpg ")) NIL
NIL) "MIXED" ("BOUNDARY" "------------090206040706060 704050905") NIL NIL
NIL))

I would then iterate through those matches and use another regex to
parse the values out of each of them.

The difficult bit is working out how to match the start and end of each
group, which needs more knowledge of what can occur in the file. The
obvious thing that occurs to me is to match ("TEXT" | ("IMAGE" followed
by any sequence of characters which are not ("TEXT" | ("IMAGE".

So, and this is air code, you want something along the lines of

class Groups
{
ArrayList groupsCollectio n = new ArrayList();
const string GROUP_PATTERN = "";
public Groups(string sourceText)
{
foreach(Match m in Regex.Matches(s ourceText, GROUP_PATTERN))
{
Group group = new Group(m.Value);
this.groupsColl ection.Add(grou p);
}
}
}
class Group
{
string arg1;
string arg2;
string arg3;
string arg6;
const string PARAM_PATTERN = "";
public Group(string groupText)
{
MatchCollection matches = Regex.Matches(
groupText, PARAM_PATTERN);
this.arg1 = matches[0];
this.arg2 = matches[1];
this.arg3 = matches[2];
this.arg6 = matches[5];
}
}

GROUP_PATTERN needs to be something along the lines of "x[^x]*" where x
matches the start of a group. PARAM_PATTERN needs to match groups of
quoted text or the string "NIL".

That's how I'd do it, anyway.

--
Steve Walker
Nov 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
3924
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being parsed can completely wreck it. The string I am trying to parse is as follows: commandText=insert into (Text) values (@message + N': ' + @category);commandType=StoredProcedure; message=@message; category=@category I am looking to retrive name value...
4
1959
by: GenoJoe | last post by:
If you are not new to VB.NET but are new to regular expressions, you need to get a free copy of "Pragmatic Guide to Regular Expressions for VB.NET Programmers". I wrote this guide because all of the sources that I researched for information on this topic, including Microsoft Help pages, did not properly address it from the viewpoint of someone new to regular expressions. If you send me an email, I will return you a zipped file that includes...
2
2480
by: cleo | last post by:
I'm experimenting with Regular Expressions and Windows Forms. Frequently I want a value to be either a valid pattern or empty. For example a Zip code must be 5 digits or may be empty. I know that I can use the Regular Expression "\d{5}" to test for exactly 5 digits. How can I add the option for the string to be empty or must I always test the value before calling the Regular Expression? Thanks
7
371
by: norton | last post by:
Hello, Does any one know how to extact the following text into 4 different groups(namely Date, Artist, Album and Quality)? - Artist - Album Artist - Album - Artist - Album - Artist - Album- i have try this syntax but it failed
4
4845
by: lucky | last post by:
hi there!! i'm looking for a code snipett wich help me to search some words into a particular string and replace with a perticular word. i got a huge data string in which searching traditional way mean to secrife lots of time in asp.net. can any one give me such a expression in which i pass a data string and search word string and replace word string? if so plz help me out. i'm in badly need.
4
5198
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or adate LIKE "2004.01.10 __:15") .... into something like this: .... where adate LIKE "2004.01.10 __:(30/15)" ...
7
3836
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way to avoid multiple copies of a string. Any help will be highly appreciated
3
3034
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular Expressions used in Perl identical to the Regular Expressions in PHP?
25
5193
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How do I gain access to the expression (not the matches) at runtime? Thanks, Mike
3
2765
by: Zeba | last post by:
Hi guys, I need some help regarding regular expressions. Consider the following statement : System.Text.RegularExpressions.Match match = System.Text.RegularExpressions.Regex.Match(requestPath, "(*?\ \.ashx)"); (where requestPath is a string)
0
11160
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10766
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10863
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10422
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9588
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5807
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4622
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4230
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3241
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.