473,725 Members | 2,424 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Shelton/C# should be able to match my HTM_TXT.EXE .


Hi Tom, You showed: <<
private const string PHONE_LIST =
"495.1000__424. 1111___(206)564-5555_1.800.325. 3333";

static void Main( string[] args ) {
foreach (string phoneNumber in Regex.Split (PHONE_LIST, "_+")) {
Console.WriteLi ne (phoneNumber); } }

Output:
495.1000
424.1111
(206)564-5555
1.800.325.3333 >>

Thanks Tom, that's very interesting,
but not enough to switch me away from LoopTo(),
RegEx simply isn't as flexible.

#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

It's a very simple matter to convert HTML to plain text,
following these rules:

These are valid HTML tags: &#x20; <! Comment --> <Alpha> </Alpha>
But, due to the leading space, < Alpha> is not.
Things like &Unknown are sent through untranslated, for obvious reasons.

Pass HTM_TXT.EXE a .HTML file and it spits out a .TXT file.
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

If RegEx is as powerful as you say,
you should be able to produce something that works at least as well,
and which is just as readable, or more, to me.

Jul 21 '05
58 4680

Hi Tom_Shelton, You showed me five things,

1. You can do a simple RegEx_Search_an d_Replace.
2. Like Kelsey, You can't remove lines with just tags.
3. Like Kelsey, You can't handle cases like &#x20; and &#o40;
4. Like Kelsey, You couldn't do a decent job of translating my home page.
( to say nothing of translating HTML_Only e-mails )
5. Unlike Kelsey, You can't ftp my page.

Jul 21 '05 #11
Jeff_Relf wrote:
Hi Tom_Shelton, You showed me five things,

1. You can do a simple RegEx_Search_an d_Replace.
2. Like Kelsey, You can't remove lines with just tags.
3. Like Kelsey, You can't handle cases like &#x20; and &#o40;
4. Like Kelsey, You couldn't do a decent job of translating my home page.
( to say nothing of translating HTML_Only e-mails )
5. Unlike Kelsey, You can't ftp my page.

http://www.codeproject.com/asp/removehtml.asp
Removing HTML from the text in ASP
Jul 21 '05 #12

Hi Mogul ( Bailo and Tom_Shelton ), Re:
http://www.developer.com/net/csharp/...0918_2230091_1
<< The HTML parser consists of the following four classes... >>

You told me: << Bottom line:
I can take those classes and manipulate the HTML
-- live from the Web -- to do almost anything. >>

Reeeally... Prove it, translate this
( Download it by: View_Source --> File --> Save_Page_As ):
http://www.Cotse.NET/users/jeffrelf/index.htm
to produce something as good as this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

Here's a Much tougher test file for you, AA.HTM:
http://www.Cotse.NET/users/jeffrelf/AA.HTM
Notice how HTM_TXT.EXE removed the lines with just whitespace and tags:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

Jul 21 '05 #13

Hi Mogul ( Bailo and Tom_Shelton, Kelsey, Spooky ), Re:
Shelton being unable to handle chevrons, >, quoted inside tags,
or remove lines with just whitespace and tags,
or translate tags like &#x20; and &#o40;,
....Even though he claims he could do it with one hand tied behind his back,

You showed this link: << Removing HTML from the text in ASP >>
http://www.codeproject.com/asp/removehtml.asp

Reeeally... Prove it then, translate this: AA.HTM:
http://www.Cotse.NET/users/jeffrelf/AA.HTM

Notice how, in my result, AA.TXT,
HTM_TXT.EXE handled chevrons, >, quoted inside tags,
and removed lines with just whitespace and tags:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

And remember, HTM_TXT.CPP simply demos 38 lines of code in X.CPP,
it's not as trivial as Shelton's RegEx_Search_an d_Replace
nor is it as complicated as your Big_Ass SourceForge project:
http://www.Cotse.NET/users/jeffrelf/X.EXE
http://www.Cotse.NET/users/jeffrelf/X.CPP
http://www.Cotse.NET/users/jeffrelf/X.VCPROJ

Long live #define, long live LoopTo(), death to C# !

#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

Jul 21 '05 #14
Jeff_Relf wrote:
Hi Mogul ( Bailo and Tom_Shelton, Kelsey, Spooky ), Re:
Shelton being unable to handle chevrons, >, quoted inside tags,
or remove lines with just whitespace and tags,
or translate tags like &#x20; and &#o40;,


Jeff,

Your arguments are pointless.

In the worse case, c# can implement your code structure as is.

In other cases, we're merely showing you models that may achieve what
you are doing more efficiently.
Jul 21 '05 #15

Hi Mogul, Re: Shelton and Kelsey being unable to match my code,

You told me: << Your arguments are pointless.
In the worse case, c# can implement your code structure as is.
In other cases, we're merely showing you models that
may achieve what you are doing more efficiently. >>

C# can Not implement my code as is, because it has no #define,
( nor memmove(), natively ).

Kelsey and Shelton claim to have superior methods, e.g. RegEx, String, STL.
Yet they can't match the 38 lines of X.CPP code I demo in HTM_TXT.CPP

The challenge is simple, given AA.HTM, View_Source --> File --> Save_Page_As
http://www.Cotse.NET/users/jeffrelf/AA.HTM
Produce a result this good or better:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

That means handling <> chevrons quoted inside tags,
removing lines with just whitespace and tags,
and preserving blank lines that didn't have tags, i.e. preserving whitespace.

Jul 21 '05 #16
Jeff_Relf wrote:
C# can Not implement my code as is, because it has no #define,
( nor memmove(), natively ).


http://msdn.microsoft.com/library/de...clrfdefine.asp

C# Programmer's Reference
#define

#define lets you define a symbol, such that, by using the symbol as the
expression passed to the #if directive, the expression will evaluate to
true.

#define symbol

where:

symbol
The name of the symbol to define.
Jul 21 '05 #17
Jeff_Relf wrote:
That means handling <> chevrons quoted inside tags,
removing lines with just whitespace and tags,
and preserving blank lines that didn't have tags, i.e. preserving whitespace.


The bottom line is that any of the other designs are far more flexible
in adding these addtional requirements.

Yes, anyone can code a very optimized program to do one very specific thing.

But it's hard to build a program flexible enough to handle more and more
cases and requirements.

In the case of using Regex for this problem, it's far easier to provide
a system that allows the passing in a regex string as needs change.

All you ( Relf ) do is say:

"Oh, but c# can't do x".

Then it's shown that yes, it can't do x.

Then you say "oh, I meant x(2)". and so on.

The bottom line is:

Q: can c# implement a fast, flexible string parser in seven easy steps,
with far more robustness than whatever it is your doing?

A: YES!!!!!

Jul 21 '05 #18

Hi Mogul, Re: This link of yours: << C# Programmer's Reference
#define lets you define a symbol, such that,
by using the symbol as the expression passed to the #if directive,
the expression will evaluate to true. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp

Get serious, for once, Bailo: <<
While the compiler does not have a separate preprocessor,
the directives described in this section are processed as if there was one;
these directives are used to aid in conditional compilation.
Unlike C and C++ directives,
you cannot use these directives to create macros. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp

Jul 21 '05 #19
Jeff_Relf wrote:
Hi Mogul, Re: This link of yours: << C# Programmer's Reference
#define lets you define a symbol, such that,
by using the symbol as the expression passed to the #if directive,
the expression will evaluate to true. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp

Get serious, for once, Bailo: <<
While the compiler does not have a separate preprocessor,
the directives described in this section are processed as if there was one;
these directives are used to aid in conditional compilation.
Unlike C and C++ directives,
you cannot use these directives to create macros. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp


http://www.codeproject.com/csharp/prepro.asp

The Code Project - A Macro Preprocessor in C# - C# Programming
This library supplies the same macro substitution facilities as the
C/C++ preprocessor.
Jul 21 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1483
by: elmlish | last post by:
Hello all, I'm currently befuddled as to how to efficiently test for a positive re. match then use the results of that match in a function. Mostly what I've seen people do is to first test for the match, and then try matching again to get the results. This would seem to be pretty inefficient to me. I've tried making the match, then sending it to a variable, then testing
3
2531
by: bdwise | last post by:
I have this in my body tag: something();something(); document.thisForm.textBox1.focus();something(); And I want to find a part between the semicolons that ends in focus() and remove the entire value between the semicolons. My Regular Expression looks like this but it is not matching, can anyone help?
6
2321
by: Matt Wette | last post by:
Over the last few years I have converted from Perl and Scheme to Python. There one task that I do often that is really slick in Perl but escapes me in Python. I read in a text line from a file and check it against several regular expressions and do something once I find a match. For example, in perl ... if ($line =~ /struct {/) { do something } elsif ($line =~ /typedef struct {/) { do something else
6
2309
by: Duane Morin | last post by:
I've inherited an XSL transform that I need to squeeze every last millisecond out of (since it's running several hundred thousand times). I've noticed that there are 26 match clauses in the file. They are 13 pairs that each check the same condition, like this: <xsl:template match="A/foo"> .... <xsl:template match="B/foo"> .... <xsl:template match="A/bar">
3
5302
by: Jeff McPhail | last post by:
I am using Regex.Match in a large application and the memory is growing out of control. I have tried several ways to try and release the memory and none of them work. Here are some similar examples of what I have tried... string testString = "lkf slkdjflksd sdfjlksdjff fsdjlsdfj flk;sjkf"; while(true) { Regex .Match(testString,@"(\w)"); } ---------------------------------------------------------------------- string testString = "lkf...
63
3275
by: Jeff_Relf | last post by:
Hi Tom, You showed: << private const string PHONE_LIST = "495.1000__424.1111___(206)564-5555_1.800.325.3333"; static void Main( string args ) { foreach (string phoneNumber in Regex.Split (PHONE_LIST, "_+")) { Console.WriteLine (phoneNumber); } } Output: 495.1000
15
1271
by: John Dann | last post by:
I'm looking at switching from VB6 to .Net but there's one key aspect that I can't get my head around: As I understand it, anyone wanting to run an app developed under VB.Net needs the .Net framework installed. But only WinXP currently supplies the framework. So how do 98/ME/2K users get a copy? Well of course business users will very likely have a high speed link for downloading from MS and those major app developers who distribute...
2
6220
dlite922
by: dlite922 | last post by:
This might be a javascript problem, but I want to see if there's any way in CSS to do it first. I have two floating divs, one wide on the left contains the "content" of the page, the second narrow one contains a skyscraper (vertical) advertisement. Since the height of the content could be more or less than the advertisement, I want the advertisement column to match the content height IF the content is longer, or the content height match...
16
5193
by: vorlonfear | last post by:
I have been working on this for a while now and I wanted to see if someone could assist me. I have 2 tables each with 5 fields. 4 of the fields are 2 character strings, then the final field is the full value. I know how to match the finals where they are equal but I need to match where there is a partial match with the greatest number of Segments. So in the above I would need to exclude the exact match EX 2 and match 10111501 to 101115...
0
8749
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9165
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8080
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6699
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6007
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4777
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3216
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2627
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2153
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.