469,353 Members | 2,053 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,353 developers. It's quick & easy.

Shelton/C# should be able to match my HTM_TXT.EXE .


Hi Tom, You showed: <<
private const string PHONE_LIST =
"495.1000__424.1111___(206)564-5555_1.800.325.3333";

static void Main( string[] args ) {
foreach (string phoneNumber in Regex.Split (PHONE_LIST, "_+")) {
Console.WriteLine (phoneNumber); } }

Output:
495.1000
424.1111
(206)564-5555
1.800.325.3333 >>

Thanks Tom, that's very interesting,
but not enough to switch me away from LoopTo(),
RegEx simply isn't as flexible.

#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

It's a very simple matter to convert HTML to plain text,
following these rules:

These are valid HTML tags: &#x20; <! Comment --> <Alpha> </Alpha>
But, due to the leading space, < Alpha> is not.
Things like &Unknown are sent through untranslated, for obvious reasons.

Pass HTM_TXT.EXE a .HTML file and it spits out a .TXT file.
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

If RegEx is as powerful as you say,
you should be able to produce something that works at least as well,
and which is just as readable, or more, to me.

Jul 21 '05
58 4238

Hi Stefan_Simek,

Your HTM_TXT.EXE preserves whitespace reasonably well,
as your Index.TXT was exactly the same, byte for byte, as mine.
As I tried to say before, my code does not look for the <pre> tag,
it just always preserves whitespace.

Re: using (StreamWriter sw = new StreamWriter(args[1], false
, Encoding.GetEncoding(1250))) sw.Write(output);

Well done !

Re: This .NET 1.1 stuff:
http://www.kascomp.sk/tmp/htm_txt.cs
http://www.kascomp.sk/tmp/htm_txt.exe
http://www.kascomp.sk/tmp/build.bat

Now that's much more like it, well done,
I liked that much more than the .ZIP you showed before.

Re: Your .NET framework 2.0.5 ( beta2 ) code with anonymous delegates,

That was very interesting, but it complicated the installation.

Re: System.Web.HttpUtility.HtmlDecode( m.Value );,

You wrote: << I see no reason for writing my own entity parser
as long as there's one provided by the framework. >>

You have a point there, but I like playing with the lower_level stuff.

You wrote: << Sure it's not like C. It's been a few years since 1978,
and the ways of programing have evolved by now... >>

C# is a mutation, and a rather recent one at that,
it doesn't meet my needs... no #define.

Jul 21 '05 #51
Jeff_Relf wrote:
Hi Stefan_Simek,

Your HTM_TXT.EXE preserves whitespace reasonably well,
Am I missing something here?

As a test I saved this page:

http://www.howstuffworks.com/search.php

as a text file.

I compiled htm_txt.cs in mono and ran it against search.php.html, it
produced a text file search.php.txt This output file has only one line
in it -- what appears to be the title.

Both the html source and txt output are attached.


as your Index.TXT was exactly the same, byte for byte, as mine.
As I tried to say before, my code does not look for the <pre> tag,
it just always preserves whitespace.

Re: using (StreamWriter sw = new StreamWriter(args[1], false
, Encoding.GetEncoding(1250))) sw.Write(output);

Well done !

Re: This .NET 1.1 stuff:
http://www.kascomp.sk/tmp/htm_txt.cs
http://www.kascomp.sk/tmp/htm_txt.exe
http://www.kascomp.sk/tmp/build.bat

Now that's much more like it, well done,
I liked that much more than the .ZIP you showed before.

Re: Your .NET framework 2.0.5 ( beta2 ) code with anonymous delegates,

That was very interesting, but it complicated the installation.

Re: System.Web.HttpUtility.HtmlDecode( m.Value );,

You wrote: << I see no reason for writing my own entity parser
as long as there's one provided by the framework. >>

You have a point there, but I like playing with the lower_level stuff.

You wrote: << Sure it's not like C. It's been a few years since 1978,
and the ways of programing have evolved by now... >>

C# is a mutation, and a rather recent one at that,
it doesn't meet my needs... no #define.

HowStuffWorks - Search
Jul 21 '05 #52

Hi John, Re: Your attempt to convert
http://www.howstuffworks.com/search.php

You can't just do a Save_Page_As, you Must do a View_Source first.

This is what the .HTM file should look like:
http://www.Cotse.NET/users/jeffrelf/BB.HTM
and my HTM_TXT.EXE translates it to this:
http://www.Cotse.NET/users/jeffrelf/BB.TXT

Notice that my HTM_TXT is faithful to the raw whitespace and the <BR> tag,
leaving/printing as many as was called for,
while Stefan_Simek's HTM_TXT won't allow more than two blank lines in a row.

But I had to radically modify my HTM_TXT to properly handle multilined tags:
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

And, because HTM_TXT is merely a demo of code from X.CPP
( my custom e-mail client and newsreader ) these files were also affected:
http://www.Cotse.NET/users/jeffrelf/X.EXE
http://www.Cotse.NET/users/jeffrelf/X.CPP
http://www.Cotse.NET/users/jeffrelf/X.VCPROJ

Jul 21 '05 #53

By the way John, Re:
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.kascomp.sk/tmp/htm_txt.cs

Both HTM_TXT.CPP and htm_txt.cs could be called obfuscated.
In fact, I consider my own code to be more readable,
( perhaps because I wrote it ).

Any code that does useful work is going to take time to understand.

Jul 21 '05 #54
Jeff_Relf wrote:
Any code that does useful work is going to take time to understand.


Very true...which is why you should not dismiss all .NET code as being
written by "script kiddies".
Jul 21 '05 #55
Jeff_Relf wrote:
But I had to radically modify my HTM_TXT to properly handle multilined tags: And, because HTM_TXT is merely a demo of code from X.CPP


I see, so you admit that all you do is write sample code...but
completely unextensible.

The c# folks just proved how inflexible c++ code is because you have to
set up a static starting point and then produce a result.

Whereas us c# people have been able to dynamically change our code very
quickly as you move the goal posts around to suit yourself.
Jul 21 '05 #56
In article <2s********************@speakeasy.net>, Journey To The Center of The Earth wrote:
Jeff_Relf wrote:
But I had to radically modify my HTM_TXT to properly handle multilined tags:

And, because HTM_TXT is merely a demo of code from X.CPP


I see, so you admit that all you do is write sample code...but
completely unextensible.

The c# folks just proved how inflexible c++ code is because you have to
set up a static starting point and then produce a result.

Whereas us c# people have been able to dynamically change our code very
quickly as you move the goal posts around to suit yourself.


That's why I gave up... Relf will never be satisfied. My attempt met
every one of the criteria that he laid out in the original post - except
for entity translation, and the only reason I didn't handle that was
because I was unclear on what he wanted to have happen (and I did ask
for clarification, but never did see a response). But, when he saw that
I did it in like 3-4 lines of code - suddenly, several more criteria are
added.

It was the same thing with the phone number parsing... With every post
of an answer he had to change the format.

Relf doesn't want to use C#. That's fine, we all make choices and he
has made his. I have no intention of producing another line of code for
him.

--
Tom Shelton
Jul 21 '05 #57

Hi Tom_Shelton ( and Bellow ),
Re: The Piss_Poor job you did of converting HTML to plain text,

You thought you could match my HTM_TXT.CPP while your dishes dryed,
....how naive !

Not even Stefan_Simek could faithfully perserve blank lines
or honor all <br> tags.

Try converting BB.HTM Tom:
http://www.Cotse.NET/users/jeffrelf/BB.HTM
Make it look like this:
http://www.Cotse.NET/users/jeffrelf/BB.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

The best Simek could do was
to leave a lot of blank lines where the tags used to be
and then consolidate mulitple blank lines into one.

i.e. Simek's htm_txt.cs produces blank lines where I do Not want them,
and omits blank lines where I Do want them.

Does it surprize you that I didn't give you my full specs at first ?
It shouldn't... I didn't want to overwhelm you any more than I already was.

You told Bailo: << That's why I gave up... Relf will never be satisfied.
My attempt met every one of the criteria that he laid out
in the original post - except for entity translation... >>

Unlike Kelsey and Simek, you never figured out how to download an HTML page
( you must do a View_Source before the Save_Page_As ).

You wrote: << It was the same thing with the phone number parsing,
...With every post of an answer he had to change the format. >>

Right... you couldn't understand what my code was doing,
and, therefore, what I required of it.
Nor did you have the patience to have me explain it.
....No surprises there... huh ?

You concluded: << Relf doesn't want to use C#.
That's fine, we all make choices and he has made his.
I have no intention of producing another line of code for him. >>

Using COM and other bloatware is fine in a pinch ( Slo-o-ow ),
but it's not the hallmark of a serious coder.
....You're a Half_Assed coder, Shelton... end of story.

#define is too dangerous for C# kiddies such as you.

#define LOOP while ( 1 )

#define Loop( N ) int J = - 1, LLL = N ; while ( ++ J < LLL )

#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

#define LoopXx( Xx ) Xx##P P = 0, B ; int J = -1 ; \
Xx##A BB = Xx.BB, EE = Xx.PP + 1, PP = BB - 1 ; \
if ( BB ) while ( ++ J, B = P = * ++ PP, PP < EE )

Jul 21 '05 #58

Hi Bellow, You told me: <<
I see, so you admit that all you do is write sample code
...but completely unextensible. >>

HTM_TXT.EXE demonstrates 47 lines of code from X.CPP,
and X is used daily be me, as it's the best e-mail client newsreader
I've never known... by miles.

And you know it's totally flexible, as you see how I change it all the time.

Jul 21 '05 #59

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by bdwise | last post: by
6 posts views Thread by Matt Wette | last post: by
6 posts views Thread by Duane Morin | last post: by
3 posts views Thread by Jeff McPhail | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.