473,761 Members | 9,864 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Extract text with C# and RegExp

Hello,
I have a html text with custom tags which looks like html comment,
such:

"text text text <p>text</ptext test test
text text text <p>text</ptext test test
<!-- @MyTag@ -->extract this<!-- /@MyTag@ -->
text text text <p>text</ptext test test
<!-- @MyTag@ -->and this<!-- /@MyTag@ -->
text text text <p>text</ptext test test"

My regexp should extract the first part of text till first opening tag
(<!-- @MyTag@ -->), then the text between tags (extract this, and
this). I had headache by finding the right pattern. Any help? thanks!

Alberto

Jul 18 '07 #1
1 4791
* Alberto Sartori wrote, On 18-7-2007 15:10:
Hello,
I have a html text with custom tags which looks like html comment,
such:

"text text text <p>text</ptext test test
text text text <p>text</ptext test test
<!-- @MyTag@ -->extract this<!-- /@MyTag@ -->
text text text <p>text</ptext test test
<!-- @MyTag@ -->and this<!-- /@MyTag@ -->
text text text <p>text</ptext test test"

My regexp should extract the first part of text till first opening tag
(<!-- @MyTag@ -->), then the text between tags (extract this, and
this). I had headache by finding the right pattern. Any help? thanks!

Alberto


<!-- @(?<tagname>\w+ )@ -->(?<content>.*? )<!-- /@\k<tagname>@ -->

should do the trick.

<!-- @ looks for the beginning of a tag
(?<tagname>\w+) looks for the name of the tag and captures it
@ -- end of the opening tag

(?<content>.*?) Capture the contents of the tag

<!-- /@ looks for the beginning of an end tag
\k<tagname ensures it's the same tagname as the one before
@ -- end of the end tag

The tagname is captured in a group named 'tagname' and the content of
the tag in a group named 'content'.

Once you've gotten a match in your text you can reference the contents
like this:

Match m = Regex.Match(... );
if (m.Success)
{
string tagname = m.Groups["tagname"].Value;
string content = m.Groups["content"].Value;
}
Kind regards,

Jesse
Jul 18 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
4807
by: Eddie | last post by:
I need to validate a text input field. I just want to say if user enters 93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109 or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or 93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401 or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or 93410 or 93412
7
3130
by: A Causal | last post by:
I'm an experienced C programmer, but I have never worked with any sort of internet programming. I would like to write a program to search for certain character strings in a currently displayed web page, and then get the string that immediatly follows the one that I searched for. It seems like an easy thing to do, after all the stuff that I want is staring me right in the face, but I have no idea where that stuff is stored or how to access...
4
21065
by: Guogang | last post by:
Hi, I need to extract plain text from HTML page (i.e. do not show images, html formatting, ...) Is there some C# class/function that can help me on this? Thanks, Guogang
3
6033
by: ksr | last post by:
Hi, I am looking for a regular expression that would extract UNC paths from a given string and place that inside a href. Currently the expression fails if there is a space in the path.. eg. \\server\my doc.doc is there a regular expression to do this? TIA,
7
2896
by: teo | last post by:
hallo, I need to extract a word and few text that precedes and follows it (about 30 + 30 chars) from a long textual document. Like the description that Google returns when it has found a given word. In example from:
9
7308
by: trihanhcie | last post by:
Hi, I would like to extract the text in an HTML file For the moment, I'm trying to get all text between <tdand </td>. I used a regular expression because i don't know the "format between <tdand </td> It can be : <tdtext1 </td> or
6
5957
by: Dave | last post by:
Hope someone can help! I have a memo fiels in which there are a few numbers including dates but what I want to do is extract a number which is 6 figures long. Can anyone help me? Thanks Dave
4
2177
by: boris.smirnov | last post by:
Hallo all, I have tried for a couple of hours to solve my problem with re but I have no success. I have a string containing: "+abc_cde.fgh_jkl\n" and what I need to become is "abc_cde.fgh_jkl". Could anybody be so kind and write me a code of how to extract this text from that string? thank you very much in advance
0
743
by: Ciaran | last post by:
Hi what's the best way to extract a var from a string based on a regexp? I can't seem to find the right function. I want to get the domain extension from any url. examples: http://www.domain.com/test.php $extension="com"; http://www.domain.co.uk/test.php $extension="co.uk"; Thanks Ciarán
0
9345
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10115
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9957
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9775
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8780
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7332
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6609
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5373
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3456
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.