By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,626 Members | 1,182 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,626 IT Pros & Developers. It's quick & easy.

Shelton/C# should be able to match my HTM_TXT.EXE .

P: n/a

Hi Tom, You showed: <<
private const string PHONE_LIST =
"495.1000__424.1111___(206)564-5555_1.800.325.3333";

static void Main( string[] args ) {
foreach (string phoneNumber in Regex.Split (PHONE_LIST, "_+")) {
Console.WriteLine (phoneNumber); } }

Output:
495.1000
424.1111
(206)564-5555
1.800.325.3333 >>

Thanks Tom, that's very interesting,
but not enough to switch me away from LoopTo(),
RegEx simply isn't as flexible.

#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

It's a very simple matter to convert HTML to plain text,
following these rules:

These are valid HTML tags: &#x20; <! Comment --> <Alpha> </Alpha>
But, due to the leading space, < Alpha> is not.
Things like &Unknown are sent through untranslated, for obvious reasons.

Pass HTM_TXT.EXE a .HTML file and it spits out a .TXT file.
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

If RegEx is as powerful as you say,
you should be able to produce something that works at least as well,
and which is just as readable, or more, to me.

Jul 21 '05 #1
Share this Question
Share on Google+
58 Replies


P: n/a
In article <Je************************@Cotse.NET>, Jeff_Relf wrote:

Hi Tom, You showed: <<
private const string PHONE_LIST =
"495.1000__424.1111___(206)564-5555_1.800.325.3333";

static void Main( string[] args ) {
foreach (string phoneNumber in Regex.Split (PHONE_LIST, "_+")) {
Console.WriteLine (phoneNumber); } }

Output:
495.1000
424.1111
(206)564-5555
1.800.325.3333 >>

Thanks Tom, that's very interesting,
I'm glad you find it so.
but not enough to switch me away from LoopTo(),
Fine by me. I'm not trying to get you to switch.
RegEx simply isn't as flexible.
I have to disagree with that...
#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

It's a very simple matter to convert HTML to plain text,
following these rules:

These are valid HTML tags: &#x20; <! Comment --> <Alpha> </Alpha>
But, due to the leading space, < Alpha> is not.
Things like &Unknown are sent through untranslated, for obvious reasons.

Pass HTM_TXT.EXE a .HTML file and it spits out a .TXT file.
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

If RegEx is as powerful as you say,
you should be able to produce something that works at least as well,
and which is just as readable, or more, to me.


I might take you up on this if I have time...

--
Tom Shelton
Jul 21 '05 #2

P: n/a
On 2005-04-15, Jeff_Relf <Me@Privacy.NET> wrote:

Hi Tom, You showed: <<
<snip>
These are valid HTML tags: &#x20; <! Comment --> <Alpha> </Alpha>
But, due to the leading space, < Alpha> is not.
Things like &Unknown are sent through untranslated, for obvious reasons.

Pass HTM_TXT.EXE a .HTML file and it spits out a .TXT file.
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

If RegEx is as powerful as you say,
you should be able to produce something that works at least as well,
and which is just as readable, or more, to me.


Here is the code. I wrote this fast since I had a few minutes while the
dishwasher finished. It doesn't handle the &#x20. I wanted to ask you
what you wanted with that? Stripped? Converted? And are we only
looking for &#x20? I'll add that in once I know the answer.

using System;
using System.IO;
using System.Text.RegularExpressions;

public sealed class App
{
public static void Main(string[] args)
{
try
{
// TODO: IMPROVE COMMAND LINE PARSING
if (args.Length > 0)
{
// original text buffer
string htmlText = null;

// read in the file
using (StreamReader sr = new StreamReader (args [0]))
{
htmlText = sr.ReadToEnd ();
}

// strip the tags...
htmlText = Regex.Replace (
htmlText,
@"<(?:/{1}|[a-zA-z]+|! +)[\s\S]*?>",
string.Empty);

// write the file...
using (StreamWriter sw = new StreamWriter (Path.ChangeExtension (args [0], "TXT")))
{
sw.Write (htmlText);
}
}
else
{
Console.WriteLine ("No File Name Supplied");
}
}
catch (Exception ex)
{
Console.WriteLine (ex.ToString ());
}
}
}

Someone more expert in regex might be able to do it better :) I have
never been an expert, and I haven't done it for quite some time - but
there you have it.

Here is the modified input (modified to see if it followed the rules):

<Html>< head><title>Jeff Relf's Home Page</title></head><body><pre>
Welcome to my home page: http://www.Cotse.NET/users/jeffrelf/
e-mail: Ho*******@JeffRelf.Cotse.NET

I'm Jeff Relf and I was born in North Seattle at the start of 1960
to a Mormon family consisting of two parents, four brothers and one sister.

My first programming was on HP and TI calculators starting around 1974.
( e.g. in 1976, the HP 67 with it's magnetic cards ).
I broke and lost many of them.
<! Comment -->
I've been programming professionally since the start of 1982,
but I play much more than I work ( like a starving artist, if you will ).
..+.+.+
Here's my favorite pointing device:
<a
href="http://www.cotse.net/users/jeffrelf/MS_Trackball_Explorer.JPG">http://www.Cotse.NET/users/jeffrelf/MS_Trackball_Explorer.JPG</a>
This link shows my ramblings on Usenet ( Google Groups ):
<a href="http://google.com/groups?as_uauthors=Jeff_Relf&amp;scoring=d">http://Google.COM/groups?as_uauthors=Jeff_Relf&amp;scoring=d</a>
This link shows a Picture of me from around the start of 2003:
<a href="http://www.cotse.net/users/jeffrelf/Jeff_Relf.JPG">http://www.Cotse.NET/users/jeffrelf/Jeff_Relf.JPG</a>
Here's my main man:
<a href="http://www.cotse.net/users/jeffrelf/Bill.JPG">http://www.Cotse.NET/users/jeffrelf/Bill.JPG</a>
Here's Einstein, another guy I like:
<a href="http://www.loyno.edu/%7Ebrans/einstein.jpg">http://www.loyno.edu/~brans/einstein.jpg</a>
This link shows my daughter, Jenny, at her wedding ( Mid 2003 ):
<a href="http://www.cotse.net/users/jeffrelf/Jenny.JPG">http://www.Cotse.NET/users/jeffrelf/Jenny.JPG</a>
Here's my son, Jeff, at the Moab MUni Fest, summer 2001:
<a href="http://www.cotse.net/users/jeffrelf/JR.JPG">http://www.Cotse.NET/users/jeffrelf/JR.JPG</a>
Here's my ex, Janeen, at the Moab MUni Fest, summer 2001:
<a href="http://www.cotse.net/users/jeffrelf/Janeen.JPG">http://www.Cotse.NET/users/jeffrelf/Janeen.JPG</a>
..+.+.+
Here are my style sheet overrides for Moz FireFox 1.0,
userChrome.CSS and userContent.CSS:
<a href="http://www.cotse.net/users/jeffrelf/userContent.CSS">http://www.Cotse.NET/users/jeffrelf/userContent.CSS</a>
<a href="http://www.cotse.net/users/jeffrelf/userChrome.CSS">http://www.Cotse.NET/users/jeffrelf/userChrome.CSS</a>
They make pages look like this:
<a href="http://www.cotse.net/users/jeffrelf/CSS.PNG">http://www.Cotse.NET/users/jeffrelf/CSS.PNG</a>
..+.+.+
Here is my HTML to TXT converter:
<a href="http://www.cotse.net/users/jeffrelf/HTM_TXT.EXE">http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE</a>
<a href="http://www.cotse.net/users/jeffrelf/HTM_TXT.CPP">http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP</a>
<a href="http://www.cotse.net/users/jeffrelf/HTM_TXT.VCPROJ">http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ</a>
If, for example, you pass it a file named index.HTM ,
it creates a file called index.TXT .
..+.+.+
Here is my CPU/hard_disk benchmarker:
<a href="http://www.cotse.net/users/jeffrelf/Tom.EXE">http://www.Cotse.NET/users/jeffrelf/Tom.EXE</a>
<a href="http://www.cotse.net/users/jeffrelf/Tom.CPP">http://www.Cotse.NET/users/jeffrelf/Tom.CPP</a>
<a href="http://www.cotse.net/users/jeffrelf/Tom.VCPROJ">http://www.Cotse.NET/users/jeffrelf/Tom.VCPROJ</a>
( Tom is someone I met in Comp.OS.Linux.Advocacy )
For example, if you run Tom.EXE, you'll see that a 1.7 GHz Vaio notebook,
with a Centrino and 2 megs of L2 cache, is much faster than my
cheap Compaq Presario desktop with 2.7 GHz,
256 RAM, 128K L2 cache, 8 megs of actual VRAM ( 64 by emulation )
6 USB ports, at least 6 audio jacks, and a 40 gig hard disk.
( It cost 229 USD after rebates... a close out )

A lower clock rate is very desirable for notebooks
as it has little effect on actual performacne
and the power consumed rises with the square of the clock rate.

Tom.EXE can test both the the speed of your CPU/cache while task-switching
as well as the speed of your hard disk while swaping out to virtual memory.
You run it in a DOS shell,
passing it how many threads you want it to simultaneously hold.
If you don't pass a parameter it will create as many simultaneous threads
as possible, thereby invoking all of your virtual memory
( making the swap file very active for about 3-7 minutes ).

These are the numbers I get on my box:
&gt;&gt;&gt; Tom 2000
2,000 Simultaneous threads acheived.
58.7 Seconds per 100,000 simultaneously spawned/completed.
&gt;&gt;&gt; Tom
14,032 Simultaneous threads acheived.
17.3 Minutes per 100,000 simultaneously spawned/completed.

The 100,000 simultaneous threads standard refers to
a conversation Ingo Molnar had with Linus Torvalds
( see news:Je************************@Cotse.NET ).
..+.+.+
Here's X, my hand-rolled newsreader, Dialer, SMTP-AUTH and POP3 client:
X's settings are here:
<a href="http://www.cotse.net/users/jeffrelf/X.TXT">http://www.Cotse.NET/users/jeffrelf/X.TXT</a>
The .EXE file ( for MS_Win_XP ):
<a href="http://www.cotse.net/users/jeffrelf/X.EXE">http://www.Cotse.NET/users/jeffrelf/X.EXE</a>
The .CPP file ( MS_VC_7.1 ):
<a href="http://www.cotse.net/users/jeffrelf/X.CPP">http://www.Cotse.NET/users/jeffrelf/X.CPP</a>
The next two files are optional.
This contains my project settings ( VS.NET 2003, Microsoft C++ 7.1 )
( you can edit it with notepad to change the directories used ):
<a href="http://www.cotse.net/users/jeffrelf/X.VCPROJ">http://www.Cotse.NET/users/jeffrelf/X.VCPROJ</a>
This is just X's icon:
<a href="http://www.cotse.net/users/jeffrelf/X.RES">http://www.Cotse.NET/users/jeffrelf/X.RES</a>

I use X a lot, it's the only newsreader I use.
I never have to screw around with views or sorting
because my X ( in VS.NET ) allows me to slickly/quickly search
the full text of a newsgroup in one file.

Cola.TXT ( Comp.OS.Linux.Advocacy ) is a 3.8 MB example of that:
<a href="http://www.cotse.net/users/jeffrelf/Cola.TXT">http://www.Cotse.NET/users/jeffrelf/Cola.TXT</a>
This is what Cola.TXT looks like when browsing it in VS.NET:
<a href="http://www.cotse.net/users/jeffrelf/Cola_TXT.PNG">http://www.Cotse.NET/users/jeffrelf/Cola_TXT.PNG</a>
X is using Bit_Stream.TTF, a good monospaced font ( free ):
<a href="http://www.cotse.net/users/jeffrelf/Bit_Stream.TTF">http://www.Cotse.NET/users/jeffrelf/Bit_Stream.TTF</a>

X automatically excludes second-level quoting ( i.e. the " &gt; &gt; " )
HTML, PGP sigs, and other spam.

I have that newsgroup set up tp retain 2,000 unread articles,
800 read articles that were from/to me
( up to 5 levels away, per the Referencs header ),
and 200 read ( a.k.a. " Deleted " ) articles not To/From me.

A VBA macro assigned to F1 ( and a toolbar icon )
in Visual studio.NET appears to delete an article
but actually moves it to the end of the file
and prefixes it with the word " Deleted ".
X's VBA macros for VS.NET 2003:
<a href="http://www.cotse.net/users/jeffrelf/X.VB">http://www.Cotse.NET/users/jeffrelf/X.VB</a>

Then, when X.EXE updates Cola.TXT the articles are sorted like this:
1. To/From Me Unread. Most recent First ( as found on the server ).
2. Unread not To/From Me. Most recent First.
3. Deleted not To/From Me. Most recent Last.
4. To/From Me Deleted. Most recent Last.

Although I plan on adding an editor to X,
it currently works best with VS.NET .
In VS, proper updating/viewing of In.TXT and Cola.TXT
requires that these two boxes be checked
in VS's Options --&gt; Environment --&gt; Documents:
1. Detect when file is changed outside the environment.
2. Auto-load changes
( if not not currently modified inside the environment ).

X's update to Cola.TXT is an element in VS' undo/redo list,
so you have to be careful when undoing things.
The update produces a report like this in X:
1 FromMe, 5 ToMe, 55 New.
1,987 UnRead, 500 Read, 500 Read FromTos.

The 5 most recent MIDs from the References header ( youngest first )
are tagged ( e.g. AAE__ ) for quicker/slicker searching in VS.NET:
One click on one of my toolbar's up/down arrows
finds what's under the text cursor ( repeatedly so, of course ).
Another pair of arrows only repeats the last search, up or down.

An article from a newsgroup begins something like this:
.+.+.+
Comp.OS.Linux.Advocacy
AA**********************@newsread3.news.pas.earthl ink.net
AAW__cZlsd.509206$D%.175553@attbi_s51
AA*************************@comcast.com
ABH__Z%isd.508650$D%.378252@attbi_s51 .
Re: Firefox stability
John_Bailo ( 4, 11.39 A, rVH4W _BP7FiP D Earthlink.NET, KNode_0_7_7 ),
This is the second line of the article's body.

X.CPP also handles SMTP-AUTH ( which works from anyone's LAN ).
Just put what you want to e-mail in the cut buffer
( the Del_Art VBA script, assigned to a toolbar icon,
does that automatically for me ).
The format of the e-mail must be like this:
.+.+.+
Jack Black &lt;sd*****@yahoo.com&gt;
Jane Peters &lt;df****@yahoo.com&gt;
Albert &lt;df****@yahoo.com&gt; .
Re: This is the title.
Hi Jack ( and Jane, Albert ), Blah blah.
.+.+.+

When sending e-mails to multiple people,
the second address must be indeneted with exactly two spaces
and the last addess must end with a Space_Space_Dot .

Hit E-Mail in X's toolbar to send it.
Hitting In checks your POP3 account for e-mail,
appending your e-mails to the end of In.TXT .

Attachements are automatically placed in the shortcut's directory.
X will not send attachments
( because I prefer to use links to my web site instead ).
..+.+.+
This link shows a .CPP to .HTML converter with syntax highlighting,
I didn't wrap it in a main() because I call it from WinMain()
( too complex to show here ):
<a href="http://google.com/groups?selm=_Jeff_Relf_2004_Jun_24_ll6m%40NCPlus.N ET">http://Google.COM/groups?selm=_Jeff_Relf_2004_Jun_24_ll6m%40NCPlus.N ET</a>
..+.+.+
Here's my Games.EXE program:
<a href="http://www.cotse.net/users/jeffrelf/Games.EXE">http://www.Cotse.NET/users/jeffrelf/Games.EXE</a>
There's no help file.
It makes heavy use of my trackball ( wheel, 5 buttons ).
e.g. For quick exiting,
One of my buttons is set to Cntrl-F4 ( Close window ).
Games.EXE plays a kind of Random_Chess, Rapid Monopoly,
Random_Mario_Brothers, Reversie, and Solitaire.
The chess game is always one player ( white ),
and it's designed to be very very mindless.
Pieces can be quickly added or removed.
By default you have no king... so you can't lose.
Source code:
<a href="http://www.cotse.net/users/jeffrelf/Games.CPP">http://www.Cotse.NET/users/jeffrelf/Games.CPP</a>
Bitmaps:
<a href="http://www.cotse.net/users/jeffrelf/Games.RES">http://www.Cotse.NET/users/jeffrelf/Games.RES</a>
Project file ( VS.NET 2003, Microsoft C++ 7.1 ):
<a href="http://www.cotse.net/users/jeffrelf/Games.VCPROJ">http://www.Cotse.NET/users/jeffrelf/Games.VCPROJ</a>
..+.+.+
Here is my Dif.EXE program:
<a href="http://www.cotse.net/users/jeffrelf/Dif.EXE">http://www.Cotse.NET/users/jeffrelf/Dif.EXE</a>
It compares two plain-text files via a shortcut ( .LNK ), e.g.:
Dif.EXE AA.CPP BB.CPP
Hit escape or Cntrl-F4 to exit. Use the wheel to move lines.
Hold down the middle mouse button as you move the wheel
to page up and down.
The screen's output is sent to
a file called AA.TXT in the starting directory.
Source code:
<a href="http://www.cotse.net/users/jeffrelf/Dif.CPP">http://www.Cotse.NET/users/jeffrelf/Dif.CPP</a>
Project file ( VS.NET ):
<a href="http://www.cotse.net/users/jeffrelf/Dif.VCPROJ">http://www.Cotse.NET/users/jeffrelf/Dif.VCPROJ</a>
..+.+.+
Here are some quotes from Einstein: &lt;&lt;

People like us, who believe in physics,
know that the distinction between past, present,
and future is only a stubbornly persistent illusion. &gt;&gt;, &lt;&lt;

every true theorist is a kind of tamed metaphysicist,
no matter how pure a ' positivist '
he may fancy himself.
[ ... He believes in ] a conceptual system built on
premises of great simplicity. &gt;&gt;, &lt;&lt;

But the scientist is possessed by
the sense of universal causation. The future, to him,
is every whit as necessary and determined as the past. &gt;&gt;, &lt;&lt;

A human being is a part of a whole,
called by us ' Universe ',
a part limited in time and space.
He experiences himself, his thoughts and feelings
as something separated from the rest ...
a kind of optical delusion of his consciousness. &gt;&gt;, &lt;&lt;

Everything is determined,
the beginning as well as the end,
by forces over which we have no control.
It is determined for the insects as well as the star.
Human beings, vegetables, or cosmic dust,
we all dance to a mysterious tune
intoned in the distance by an invisible piper. &gt;&gt;, &lt;&lt;

If [ God ] is omnipotent, then every occurrence,
including every human action, every human thought,
and Every human feeling and aspiration is also His work ;
how is it possible to think of holding men responsible for
their deeds and thoughts before such an almighty Being ?
In giving out punishment and rewards he would,
to a certain extent, be passing judgment on Himself.
How can this be combined with
the goodness and righteousness ascribed to Him ? &gt;&gt;, &lt;&lt;

The more a man is imbued with
the ordered regularity of all events
the firmer becomes his conviction that
there is no room left by the side of
this ordered regularity for causes of a different
[ ? Supernatural ] nature. &gt;&gt;

These quotes are from Stephen Hawking: &lt;&lt;

one has to find a consistent solution
of the equations of physics &gt;&gt;, &lt;&lt;

It would imply that we were completely determined:
we couldn't change our minds. So much for free will. &gt;&gt;, &lt;&lt;

In summary, the title of this essay was a question:
' Is everything determined ? '
The answer is yes, it is.
But it might as well not be,
because we can never know what is determined. &gt;&gt;, &lt;&lt;

The boundary condition of the universe
is that it has no boundary.
The universe would be completely self-contained
and not affected by anything outside itself.
It would neither be created nor destroyed.
It would just Be .
What place, then, for a creator ? &gt;&gt;, &lt;&lt;

In relativity, there is no real distinction between
the space and time coordinates, just as there is
no difference between two space coordinates. &gt;&gt;

Hermann Weyl, Einstein's colleague said:
" The world doesn't happen, it simply is. "

From http://www.newadvent.org/cathen/02053a.htm : &lt;&lt;

The theory of Democritus [ Athens, 460 - 370 BC ]
may be summed up in the following propositions :
- All bodies are composed of atoms
and spaces between the atoms. &gt;&gt;, &lt;&lt;
- There is no purpose or design in nature,
and in this sense all is ruled by chance.
[ Jeff Relf adds: " Chance " is a mysterious,
yet still absolute, material determinism ]
- All activity is reduced to local motion. &gt;&gt;

Hawking wrote:
<a href="http://www.generationterrorists.com/quotes/abhotswh.html">http://www.GenerationTerrorists.COM/quotes/abhotswh.html</a>
&lt;&lt; There are something like ten million million million
million million million million million million million
million million million million
( 1 with eighty zeroes after it ) particles
in the region of the universe that we can observe.
Where did they all come from ?
The answer is that, in quantum theory,
particles can be created out of energy
in the form of particle/antiparticle parts.
But that just raises the question of
where the energy came from.
The answer is that
the total energy of the universe is exactly zero.
The matter in the universe
is made out of positive energy.
However,
the matter is all attracting itself by gravity.
Two pieces of matter that
are close to each other have less energy than
the same two pieces a long way apart,
because you have to expend energy
to separate them against the gravitational
force that is pulling them together.
Thus in a sense,
the gravitational field has negative energy.
In the case of a universe that is
approximately uniform in space,
one can show that this negative gravitational energy
exactly cancels the positive energy
represented by the matter.
So the total energy of the universe is zero.
Now twice zero is also zero.
Thus the universe can
double the amount of positive matter energy
and also double the negative gravitational energy
without violation of the conservation of energy. &gt;&gt;, &lt;&lt;

It is said that there's no such thing as a free lunch.
But the universe is the ultimate free lunch. &gt;&gt;
</pre></body></html>

Here is the output:

< head>Jeff Relf's Home Page
Welcome to my home page: http://www.Cotse.NET/users/jeffrelf/
e-mail: Ho*******@JeffRelf.Cotse.NET

I'm Jeff Relf and I was born in North Seattle at the start of 1960
to a Mormon family consisting of two parents, four brothers and one sister.

My first programming was on HP and TI calculators starting around 1974.
( e.g. in 1976, the HP 67 with it's magnetic cards ).
I broke and lost many of them.

I've been programming professionally since the start of 1982,
but I play much more than I work ( like a starving artist, if you will ).
..+.+.+
Here's my favorite pointing device:
http://www.Cotse.NET/users/jeffrelf/...l_Explorer.JPG
This link shows my ramblings on Usenet ( Google Groups ):
http://Google.COM/groups?as_uauthors...&amp;scoring=d
This link shows a Picture of me from around the start of 2003:
http://www.Cotse.NET/users/jeffrelf/Jeff_Relf.JPG
Here's my main man:
http://www.Cotse.NET/users/jeffrelf/Bill.JPG
Here's Einstein, another guy I like:
http://www.loyno.edu/~brans/einstein.jpg
This link shows my daughter, Jenny, at her wedding ( Mid 2003 ):
http://www.Cotse.NET/users/jeffrelf/Jenny.JPG
Here's my son, Jeff, at the Moab MUni Fest, summer 2001:
http://www.Cotse.NET/users/jeffrelf/JR.JPG
Here's my ex, Janeen, at the Moab MUni Fest, summer 2001:
http://www.Cotse.NET/users/jeffrelf/Janeen.JPG
..+.+.+
Here are my style sheet overrides for Moz FireFox 1.0,
userChrome.CSS and userContent.CSS:
http://www.Cotse.NET/users/jeffrelf/userContent.CSS
http://www.Cotse.NET/users/jeffrelf/userChrome.CSS
They make pages look like this:
http://www.Cotse.NET/users/jeffrelf/CSS.PNG
..+.+.+
Here is my HTML to TXT converter:
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ
If, for example, you pass it a file named index.HTM ,
it creates a file called index.TXT .
..+.+.+
Here is my CPU/hard_disk benchmarker:
http://www.Cotse.NET/users/jeffrelf/Tom.EXE
http://www.Cotse.NET/users/jeffrelf/Tom.CPP
http://www.Cotse.NET/users/jeffrelf/Tom.VCPROJ
( Tom is someone I met in Comp.OS.Linux.Advocacy )
For example, if you run Tom.EXE, you'll see that a 1.7 GHz Vaio notebook,
with a Centrino and 2 megs of L2 cache, is much faster than my
cheap Compaq Presario desktop with 2.7 GHz,
256 RAM, 128K L2 cache, 8 megs of actual VRAM ( 64 by emulation )
6 USB ports, at least 6 audio jacks, and a 40 gig hard disk.
( It cost 229 USD after rebates... a close out )

A lower clock rate is very desirable for notebooks
as it has little effect on actual performacne
and the power consumed rises with the square of the clock rate.

Tom.EXE can test both the the speed of your CPU/cache while task-switching
as well as the speed of your hard disk while swaping out to virtual memory.
You run it in a DOS shell,
passing it how many threads you want it to simultaneously hold.
If you don't pass a parameter it will create as many simultaneous threads
as possible, thereby invoking all of your virtual memory
( making the swap file very active for about 3-7 minutes ).

These are the numbers I get on my box:
&gt;&gt;&gt; Tom 2000
2,000 Simultaneous threads acheived.
58.7 Seconds per 100,000 simultaneously spawned/completed.
&gt;&gt;&gt; Tom
14,032 Simultaneous threads acheived.
17.3 Minutes per 100,000 simultaneously spawned/completed.

The 100,000 simultaneous threads standard refers to
a conversation Ingo Molnar had with Linus Torvalds
( see news:Je************************@Cotse.NET ).
..+.+.+
Here's X, my hand-rolled newsreader, Dialer, SMTP-AUTH and POP3 client:
X's settings are here:
http://www.Cotse.NET/users/jeffrelf/X.TXT
The .EXE file ( for MS_Win_XP ):
http://www.Cotse.NET/users/jeffrelf/X.EXE
The .CPP file ( MS_VC_7.1 ):
http://www.Cotse.NET/users/jeffrelf/X.CPP
The next two files are optional.
This contains my project settings ( VS.NET 2003, Microsoft C++ 7.1 )
( you can edit it with notepad to change the directories used ):
http://www.Cotse.NET/users/jeffrelf/X.VCPROJ
This is just X's icon:
http://www.Cotse.NET/users/jeffrelf/X.RES

I use X a lot, it's the only newsreader I use.
I never have to screw around with views or sorting
because my X ( in VS.NET ) allows me to slickly/quickly search
the full text of a newsgroup in one file.

Cola.TXT ( Comp.OS.Linux.Advocacy ) is a 3.8 MB example of that:
http://www.Cotse.NET/users/jeffrelf/Cola.TXT
This is what Cola.TXT looks like when browsing it in VS.NET:
http://www.Cotse.NET/users/jeffrelf/Cola_TXT.PNG
X is using Bit_Stream.TTF, a good monospaced font ( free ):
http://www.Cotse.NET/users/jeffrelf/Bit_Stream.TTF

X automatically excludes second-level quoting ( i.e. the " &gt; &gt; " )
HTML, PGP sigs, and other spam.

I have that newsgroup set up tp retain 2,000 unread articles,
800 read articles that were from/to me
( up to 5 levels away, per the Referencs header ),
and 200 read ( a.k.a. " Deleted " ) articles not To/From me.

A VBA macro assigned to F1 ( and a toolbar icon )
in Visual studio.NET appears to delete an article
but actually moves it to the end of the file
and prefixes it with the word " Deleted ".
X's VBA macros for VS.NET 2003:
http://www.Cotse.NET/users/jeffrelf/X.VB

Then, when X.EXE updates Cola.TXT the articles are sorted like this:
1. To/From Me Unread. Most recent First ( as found on the server ).
2. Unread not To/From Me. Most recent First.
3. Deleted not To/From Me. Most recent Last.
4. To/From Me Deleted. Most recent Last.

Although I plan on adding an editor to X,
it currently works best with VS.NET .
In VS, proper updating/viewing of In.TXT and Cola.TXT
requires that these two boxes be checked
in VS's Options --&gt; Environment --&gt; Documents:
1. Detect when file is changed outside the environment.
2. Auto-load changes
( if not not currently modified inside the environment ).

X's update to Cola.TXT is an element in VS' undo/redo list,
so you have to be careful when undoing things.
The update produces a report like this in X:
1 FromMe, 5 ToMe, 55 New.
1,987 UnRead, 500 Read, 500 Read FromTos.

The 5 most recent MIDs from the References header ( youngest first )
are tagged ( e.g. AAE__ ) for quicker/slicker searching in VS.NET:
One click on one of my toolbar's up/down arrows
finds what's under the text cursor ( repeatedly so, of course ).
Another pair of arrows only repeats the last search, up or down.

An article from a newsgroup begins something like this:
.+.+.+
Comp.OS.Linux.Advocacy
AA**********************@newsread3.news.pas.earthl ink.net
AAW__cZlsd.509206$D%.175553@attbi_s51
AA*************************@comcast.com
ABH__Z%isd.508650$D%.378252@attbi_s51 .
Re: Firefox stability
John_Bailo ( 4, 11.39 A, rVH4W _BP7FiP D Earthlink.NET, KNode_0_7_7 ),
This is the second line of the article's body.

X.CPP also handles SMTP-AUTH ( which works from anyone's LAN ).
Just put what you want to e-mail in the cut buffer
( the Del_Art VBA script, assigned to a toolbar icon,
does that automatically for me ).
The format of the e-mail must be like this:
.+.+.+
Jack Black &lt;sd*****@yahoo.com&gt;
Jane Peters &lt;df****@yahoo.com&gt;
Albert &lt;df****@yahoo.com&gt; .
Re: This is the title.
Hi Jack ( and Jane, Albert ), Blah blah.
.+.+.+

When sending e-mails to multiple people,
the second address must be indeneted with exactly two spaces
and the last addess must end with a Space_Space_Dot .

Hit E-Mail in X's toolbar to send it.
Hitting In checks your POP3 account for e-mail,
appending your e-mails to the end of In.TXT .

Attachements are automatically placed in the shortcut's directory.
X will not send attachments
( because I prefer to use links to my web site instead ).
..+.+.+
This link shows a .CPP to .HTML converter with syntax highlighting,
I didn't wrap it in a main() because I call it from WinMain()
( too complex to show here ):
http://Google.COM/groups?selm=_Jeff_...m%40NCPlus.NET
..+.+.+
Here's my Games.EXE program:
http://www.Cotse.NET/users/jeffrelf/Games.EXE
There's no help file.
It makes heavy use of my trackball ( wheel, 5 buttons ).
e.g. For quick exiting,
One of my buttons is set to Cntrl-F4 ( Close window ).
Games.EXE plays a kind of Random_Chess, Rapid Monopoly,
Random_Mario_Brothers, Reversie, and Solitaire.
The chess game is always one player ( white ),
and it's designed to be very very mindless.
Pieces can be quickly added or removed.
By default you have no king... so you can't lose.
Source code:
http://www.Cotse.NET/users/jeffrelf/Games.CPP
Bitmaps:
http://www.Cotse.NET/users/jeffrelf/Games.RES
Project file ( VS.NET 2003, Microsoft C++ 7.1 ):
http://www.Cotse.NET/users/jeffrelf/Games.VCPROJ
..+.+.+
Here is my Dif.EXE program:
http://www.Cotse.NET/users/jeffrelf/Dif.EXE
It compares two plain-text files via a shortcut ( .LNK ), e.g.:
Dif.EXE AA.CPP BB.CPP
Hit escape or Cntrl-F4 to exit. Use the wheel to move lines.
Hold down the middle mouse button as you move the wheel
to page up and down.
The screen's output is sent to
a file called AA.TXT in the starting directory.
Source code:
http://www.Cotse.NET/users/jeffrelf/Dif.CPP
Project file ( VS.NET ):
http://www.Cotse.NET/users/jeffrelf/Dif.VCPROJ
..+.+.+
Here are some quotes from Einstein: &lt;&lt;

People like us, who believe in physics,
know that the distinction between past, present,
and future is only a stubbornly persistent illusion. &gt;&gt;, &lt;&lt;

every true theorist is a kind of tamed metaphysicist,
no matter how pure a ' positivist '
he may fancy himself.
[ ... He believes in ] a conceptual system built on
premises of great simplicity. &gt;&gt;, &lt;&lt;

But the scientist is possessed by
the sense of universal causation. The future, to him,
is every whit as necessary and determined as the past. &gt;&gt;, &lt;&lt;

A human being is a part of a whole,
called by us ' Universe ',
a part limited in time and space.
He experiences himself, his thoughts and feelings
as something separated from the rest ...
a kind of optical delusion of his consciousness. &gt;&gt;, &lt;&lt;

Everything is determined,
the beginning as well as the end,
by forces over which we have no control.
It is determined for the insects as well as the star.
Human beings, vegetables, or cosmic dust,
we all dance to a mysterious tune
intoned in the distance by an invisible piper. &gt;&gt;, &lt;&lt;

If [ God ] is omnipotent, then every occurrence,
including every human action, every human thought,
and Every human feeling and aspiration is also His work ;
how is it possible to think of holding men responsible for
their deeds and thoughts before such an almighty Being ?
In giving out punishment and rewards he would,
to a certain extent, be passing judgment on Himself.
How can this be combined with
the goodness and righteousness ascribed to Him ? &gt;&gt;, &lt;&lt;

The more a man is imbued with
the ordered regularity of all events
the firmer becomes his conviction that
there is no room left by the side of
this ordered regularity for causes of a different
[ ? Supernatural ] nature. &gt;&gt;

These quotes are from Stephen Hawking: &lt;&lt;

one has to find a consistent solution
of the equations of physics &gt;&gt;, &lt;&lt;

It would imply that we were completely determined:
we couldn't change our minds. So much for free will. &gt;&gt;, &lt;&lt;

In summary, the title of this essay was a question:
' Is everything determined ? '
The answer is yes, it is.
But it might as well not be,
because we can never know what is determined. &gt;&gt;, &lt;&lt;

The boundary condition of the universe
is that it has no boundary.
The universe would be completely self-contained
and not affected by anything outside itself.
It would neither be created nor destroyed.
It would just Be .
What place, then, for a creator ? &gt;&gt;, &lt;&lt;

In relativity, there is no real distinction between
the space and time coordinates, just as there is
no difference between two space coordinates. &gt;&gt;

Hermann Weyl, Einstein's colleague said:
" The world doesn't happen, it simply is. "

From http://www.newadvent.org/cathen/02053a.htm : &lt;&lt;

The theory of Democritus [ Athens, 460 - 370 BC ]
may be summed up in the following propositions :
- All bodies are composed of atoms
and spaces between the atoms. &gt;&gt;, &lt;&lt;
- There is no purpose or design in nature,
and in this sense all is ruled by chance.
[ Jeff Relf adds: " Chance " is a mysterious,
yet still absolute, material determinism ]
- All activity is reduced to local motion. &gt;&gt;

Hawking wrote:
http://www.GenerationTerrorists.COM/.../abhotswh.html
&lt;&lt; There are something like ten million million million
million million million million million million million
million million million million
( 1 with eighty zeroes after it ) particles
in the region of the universe that we can observe.
Where did they all come from ?
The answer is that, in quantum theory,
particles can be created out of energy
in the form of particle/antiparticle parts.
But that just raises the question of
where the energy came from.
The answer is that
the total energy of the universe is exactly zero.
The matter in the universe
is made out of positive energy.
However,
the matter is all attracting itself by gravity.
Two pieces of matter that
are close to each other have less energy than
the same two pieces a long way apart,
because you have to expend energy
to separate them against the gravitational
force that is pulling them together.
Thus in a sense,
the gravitational field has negative energy.
In the case of a universe that is
approximately uniform in space,
one can show that this negative gravitational energy
exactly cancels the positive energy
represented by the matter.
So the total energy of the universe is zero.
Now twice zero is also zero.
Thus the universe can
double the amount of positive matter energy
and also double the negative gravitational energy
without violation of the conservation of energy. &gt;&gt;, &lt;&lt;

It is said that there's no such thing as a free lunch.
But the universe is the ultimate free lunch. &gt;&gt;

--
Tom Shelton
Jul 21 '05 #3

P: n/a

Hi Tom, You showed: << Regex.Replace ( htmlText,
@"<(?:/{1}|[a-zA-z]+|! +)[\s\S]*?>", string.Empty); >>

Parsing my homepage, the result should be like this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

Giving me a link to a index.TXT, like that, instead of a dump,
would be kinder to the others here.

Re: The raw input:
http://www.Cotse.NET/users/jeffrelf/index.htm

Because you did a Save_Page_As, instead of an FTP download,
your copy of index.htm is Radically different from mine,
for example, contrary to what you showed, my copy says this:

http://Google.COM/groups?as_uauthors=Jeff_Relf&scoring=d</a>
and future is only a stubbornly persistent illusion. >>, <<

Here are some translations you haven't made: <<
http://Google.COM/groups?as_uauthors...&amp;scoring=d
Jack Black &lt;sd*****@yahoo.com&gt;
and future is only a stubbornly persistent illusion. &gt;&gt;, &lt;&lt; >>

&amp; becomes simply &, for example, likewise:
lt < gt > amp & quot " apos '

&#x20; ( hex ) and &#o40; ( octal ) all become a space, ascii 32.
For simplicity's sake, I don't translate &# numbers greater than 255.
A RegEx_Search_and_Replace won't do it.

Another thing my HTM_TXT.EXE does is remove lines that contained only tags,
because HTML_Only e-mail from Hotmail.COM contains many lines of just tags,
Again, a RegEx_Search_and_Replace won't do it.

For example, all of these lines would be deleted:
<HTML>
<HEAD>
<style type=3D'text/css'>
</style>
</HEAD>
<BODY>
<TABLE>
</TABLE>
</BODY>
</HTML>

Of course, I know what you're going to say, RegEx is good enough,
....but it's just not good enough for me.

Jul 21 '05 #4

P: n/a
Jeff_Relf wrote:
Hi Tom, You showed: << Regex.Replace ( htmlText,
@"<(?:/{1}|[a-zA-z]+|! +)[\s\S]*?>", string.Empty); >>


Why not just put the text into a WebRequest and then show the resulting
text, since any HTML browser will "remove the tags" already.
Jul 21 '05 #5

P: n/a

Hi Mogul, Re: Converting HTML to plain_text,

You asked me: << Why not just put the text into a WebRequest
and then show the resulting text,
since any HTML browser will remove the tags already. >>

What ? You mean use COM to automate IE ?
That's very low_rent, I don't normally run IE,
I don't want to wait for it to load,
and I don't want to worry about IE zombies floating around.

Jul 21 '05 #6

P: n/a
Mogul wrote:
Jeff_Relf wrote:
Hi Tom, You showed: << Regex.Replace ( htmlText,
@"<(?:/{1}|[a-zA-z]+|! +)[\s\S]*?>", string.Empty); >>

Why not just put the text into a WebRequest and then show the resulting
text, since any HTML browser will "remove the tags" already.

http://www.developer.com/net/csharp/...0918_2230091_1

Parsing HTML in Microsoft C#
By Jeff Heaton
Because of this, I found it necessary to write my own HTML parser. In
this article, I will show you how my HTML parser was constructed, and
how you can use this parser with your own applications. I will begin by
showing you the main components that make up the HTML parser. I will
conclude this article by showing a simple example that uses the HTML parser.

The HTML parser consists of the following four classes:

* Attribute—The attribute class is used to hold an individual
attribute inside an HTML tag.
* AttributeList—The attribute list holds an individual HTML tag and
all of its attributes.
* Parse—Holds general text parsing routines.
* ParseHTML—The main class that you will interface with; the
ParseHTML class is fed the HTML that you would like to parse.

I will now show you how each of these classes functions, and how you
will use them. I will begin with the Attribute class.
Jul 21 '05 #7

P: n/a

Hi Mogul, Re:
http://www.developer.com/net/csharp/...0918_2230091_1
<< The HTML parser consists of the following four classes... >>

That's overkill in my book.
First Tom wants to do a RegEx_Search_and_Replace, which is too weak for me,
and now you want to add a bunch of classes,
....sheesh, where's the middle ground ?

Jul 21 '05 #8

P: n/a
Jeff_Relf wrote:
Hi Mogul, Re:
http://www.developer.com/net/csharp/...0918_2230091_1
<< The HTML parser consists of the following four classes... >>

That's overkill in my book.
First Tom wants to do a RegEx_Search_and_Replace, which is too weak for me,
and now you want to add a bunch of classes,
...sheesh, where's the middle ground ?


Bottom line: I can take those classes and manipulate the HTML -- live
from the Web -- to do almost anything.

Jul 21 '05 #9

P: n/a
On 2005-04-16, Jeff_Relf <Me@Privacy.NET> wrote:

Hi Tom, You showed: << Regex.Replace ( htmlText,
@"<(?:/{1}|[a-zA-z]+|! +)[\s\S]*?>", string.Empty); >>

Parsing my homepage, the result should be like this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

Giving me a link to a index.TXT, like that, instead of a dump,
would be kinder to the others here.

Yeah... I realized that after I posted. Sorry.
Re: The raw input:
http://www.Cotse.NET/users/jeffrelf/index.htm

Because you did a Save_Page_As, instead of an FTP download,
your copy of index.htm is Radically different from mine,
for example, contrary to what you showed, my copy says this:

http://Google.COM/groups?as_uauthors=Jeff_Relf&scoring=d</a>
and future is only a stubbornly persistent illusion. >>, <<

Ok? So? I can do that if you want.
Here are some translations you haven't made: <<
http://Google.COM/groups?as_uauthors...&amp;scoring=d
Jack Black &lt;sd*****@yahoo.com&gt;
and future is only a stubbornly persistent illusion. &gt;&gt;, &lt;&lt; >>

I didn't do entity translations... I asked you if that is what you
wanted. Apparently, yes. Easy to do - but, I find this whole thing
pointless, so I doubt I'll do it.

I only did the other since I had about 10 minutes to spare.
&amp; becomes simply &, for example, likewise:
lt < gt > amp & quot " apos '

&#x20; ( hex ) and &#o40; ( octal ) all become a space, ascii 32.
For simplicity's sake, I don't translate &# numbers greater than 255.
A RegEx_Search_and_Replace won't do it.

Ahh, yes it will.
Another thing my HTM_TXT.EXE does is remove lines that contained only tags,
because HTML_Only e-mail from Hotmail.COM contains many lines of just tags,
Again, a RegEx_Search_and_Replace won't do it.

Ahhh, yes it will.
For example, all of these lines would be deleted:
<HTML>
<HEAD>
<style type=3D'text/css'>
</style>
</HEAD>
<BODY>
<TABLE>
</TABLE>
</BODY>
</HTML>

Of course, I know what you're going to say, RegEx is good enough,
...but it's just not good enough for me.


fine.

--
Tom Shelton
Jul 21 '05 #10

P: n/a

Hi Tom_Shelton, You showed me five things,

1. You can do a simple RegEx_Search_and_Replace.
2. Like Kelsey, You can't remove lines with just tags.
3. Like Kelsey, You can't handle cases like &#x20; and &#o40;
4. Like Kelsey, You couldn't do a decent job of translating my home page.
( to say nothing of translating HTML_Only e-mails )
5. Unlike Kelsey, You can't ftp my page.

Jul 21 '05 #11

P: n/a
Jeff_Relf wrote:
Hi Tom_Shelton, You showed me five things,

1. You can do a simple RegEx_Search_and_Replace.
2. Like Kelsey, You can't remove lines with just tags.
3. Like Kelsey, You can't handle cases like &#x20; and &#o40;
4. Like Kelsey, You couldn't do a decent job of translating my home page.
( to say nothing of translating HTML_Only e-mails )
5. Unlike Kelsey, You can't ftp my page.

http://www.codeproject.com/asp/removehtml.asp
Removing HTML from the text in ASP
Jul 21 '05 #12

P: n/a

Hi Mogul ( Bailo and Tom_Shelton ), Re:
http://www.developer.com/net/csharp/...0918_2230091_1
<< The HTML parser consists of the following four classes... >>

You told me: << Bottom line:
I can take those classes and manipulate the HTML
-- live from the Web -- to do almost anything. >>

Reeeally... Prove it, translate this
( Download it by: View_Source --> File --> Save_Page_As ):
http://www.Cotse.NET/users/jeffrelf/index.htm
to produce something as good as this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

Here's a Much tougher test file for you, AA.HTM:
http://www.Cotse.NET/users/jeffrelf/AA.HTM
Notice how HTM_TXT.EXE removed the lines with just whitespace and tags:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

Jul 21 '05 #13

P: n/a

Hi Mogul ( Bailo and Tom_Shelton, Kelsey, Spooky ), Re:
Shelton being unable to handle chevrons, >, quoted inside tags,
or remove lines with just whitespace and tags,
or translate tags like &#x20; and &#o40;,
....Even though he claims he could do it with one hand tied behind his back,

You showed this link: << Removing HTML from the text in ASP >>
http://www.codeproject.com/asp/removehtml.asp

Reeeally... Prove it then, translate this: AA.HTM:
http://www.Cotse.NET/users/jeffrelf/AA.HTM

Notice how, in my result, AA.TXT,
HTM_TXT.EXE handled chevrons, >, quoted inside tags,
and removed lines with just whitespace and tags:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

And remember, HTM_TXT.CPP simply demos 38 lines of code in X.CPP,
it's not as trivial as Shelton's RegEx_Search_and_Replace
nor is it as complicated as your Big_Ass SourceForge project:
http://www.Cotse.NET/users/jeffrelf/X.EXE
http://www.Cotse.NET/users/jeffrelf/X.CPP
http://www.Cotse.NET/users/jeffrelf/X.VCPROJ

Long live #define, long live LoopTo(), death to C# !

#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

Jul 21 '05 #14

P: n/a
Jeff_Relf wrote:
Hi Mogul ( Bailo and Tom_Shelton, Kelsey, Spooky ), Re:
Shelton being unable to handle chevrons, >, quoted inside tags,
or remove lines with just whitespace and tags,
or translate tags like &#x20; and &#o40;,


Jeff,

Your arguments are pointless.

In the worse case, c# can implement your code structure as is.

In other cases, we're merely showing you models that may achieve what
you are doing more efficiently.
Jul 21 '05 #15

P: n/a

Hi Mogul, Re: Shelton and Kelsey being unable to match my code,

You told me: << Your arguments are pointless.
In the worse case, c# can implement your code structure as is.
In other cases, we're merely showing you models that
may achieve what you are doing more efficiently. >>

C# can Not implement my code as is, because it has no #define,
( nor memmove(), natively ).

Kelsey and Shelton claim to have superior methods, e.g. RegEx, String, STL.
Yet they can't match the 38 lines of X.CPP code I demo in HTM_TXT.CPP

The challenge is simple, given AA.HTM, View_Source --> File --> Save_Page_As
http://www.Cotse.NET/users/jeffrelf/AA.HTM
Produce a result this good or better:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

That means handling <> chevrons quoted inside tags,
removing lines with just whitespace and tags,
and preserving blank lines that didn't have tags, i.e. preserving whitespace.

Jul 21 '05 #16

P: n/a
Jeff_Relf wrote:
C# can Not implement my code as is, because it has no #define,
( nor memmove(), natively ).


http://msdn.microsoft.com/library/de...clrfdefine.asp

C# Programmer's Reference
#define

#define lets you define a symbol, such that, by using the symbol as the
expression passed to the #if directive, the expression will evaluate to
true.

#define symbol

where:

symbol
The name of the symbol to define.
Jul 21 '05 #17

P: n/a
Jeff_Relf wrote:
That means handling <> chevrons quoted inside tags,
removing lines with just whitespace and tags,
and preserving blank lines that didn't have tags, i.e. preserving whitespace.


The bottom line is that any of the other designs are far more flexible
in adding these addtional requirements.

Yes, anyone can code a very optimized program to do one very specific thing.

But it's hard to build a program flexible enough to handle more and more
cases and requirements.

In the case of using Regex for this problem, it's far easier to provide
a system that allows the passing in a regex string as needs change.

All you ( Relf ) do is say:

"Oh, but c# can't do x".

Then it's shown that yes, it can't do x.

Then you say "oh, I meant x(2)". and so on.

The bottom line is:

Q: can c# implement a fast, flexible string parser in seven easy steps,
with far more robustness than whatever it is your doing?

A: YES!!!!!

Jul 21 '05 #18

P: n/a

Hi Mogul, Re: This link of yours: << C# Programmer's Reference
#define lets you define a symbol, such that,
by using the symbol as the expression passed to the #if directive,
the expression will evaluate to true. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp

Get serious, for once, Bailo: <<
While the compiler does not have a separate preprocessor,
the directives described in this section are processed as if there was one;
these directives are used to aid in conditional compilation.
Unlike C and C++ directives,
you cannot use these directives to create macros. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp

Jul 21 '05 #19

P: n/a
Jeff_Relf wrote:
Hi Mogul, Re: This link of yours: << C# Programmer's Reference
#define lets you define a symbol, such that,
by using the symbol as the expression passed to the #if directive,
the expression will evaluate to true. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp

Get serious, for once, Bailo: <<
While the compiler does not have a separate preprocessor,
the directives described in this section are processed as if there was one;
these directives are used to aid in conditional compilation.
Unlike C and C++ directives,
you cannot use these directives to create macros. >>
http://msdn.microsoft.com/library/de...clrfdefine.asp


http://www.codeproject.com/csharp/prepro.asp

The Code Project - A Macro Preprocessor in C# - C# Programming
This library supplies the same macro substitution facilities as the
C/C++ preprocessor.
Jul 21 '05 #20

P: n/a

Yet again, Bailo, you've posted a link to something you didn't read,
Sure it's called: << The Code Project
- A Macro Preprocessor in C# - C# Programming This library supplies
the same macro substitution facilities as the C/C++ preprocessor. >>
http://www.codeproject.com/csharp/prepro.asp

But, I'm sad to say, it's mislabled,
....it's not like C's preprocessor, it's more like a call to RegEx,
with the result sent to an interpreter.

Jul 21 '05 #21

P: n/a
Jeff_Relf wrote:
it's more like a call to RegEx,
with the result sent to an interpreter.


Superb!

Just exactly what is called for to do the work!

Jul 21 '05 #22

P: n/a

Hi Bailo, Ya told me: << The bottom line is:
Q: can c# implement a fast, flexible string parser in seven easy steps,
with far more robustness than whatever it is your doing ?
A: YES ! ! ! ! ! >>

That's Your asinine question.

My question was could any of the self-proclaimed programmers reading me now
possibly use their So_Callled Bad_Ass tools like String, RegEx and the STL
to transform AA.HTM into something as good as the AA.TXT HTM_TXT.EXE produced
with it's mostly plain_C... memmove, fprintf, #define LoopTo(), etc. ?

http://www.Cotse.NET/users/jeffrelf/AA.HTM
http://www.Cotse.NET/users/jeffrelf/AA.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

I know the answer already... it's No.

Jul 21 '05 #23

P: n/a

Hi Bailo Re: This Pre_Interpreter_Expander: <<
A Macro Preprocessor in C# - C# Programming This library supplies
the same macro substitution facilities as the C/C++ preprocessor. >>
http://www.codeproject.com/csharp/prepro.asp

Your asked me: << Superb !
Just exactly what is called for to do the work ! >>

I don't care enough about it to look it up.

The thing about #define LoopTo(), for example,
is that it makes my code more readable.

Your Pre_Interpreter_Expander can't do that,
all it can do is send stuff of an interpreter... It's chicken wire at best.

More readable code is what I want, not C#.

Jul 21 '05 #24

P: n/a
Jeff_Relf wrote:
http://www.Cotse.NET/users/jeffrelf/AA.HTM
http://www.Cotse.NET/users/jeffrelf/AA.TXT


Any idiot can subtract value.

Adding value is what is needed here.
Jul 21 '05 #25

P: n/a
Jeff_Relf wrote:
The thing about #define LoopTo(), for example,
is that it makes my code more readable.


Well -- given that I have yet to see a single person claim to be able to
'read' your code, I find that unlikely.

Jeff -- don't get me wrong.

I think the underlying effort is good.

Like you, I think that /readability/ of Web pages is the most neglected
facet of them. It's somewhere at about 98 on a list of 100. The top
entry being "looks cool". Yet, the Web is in fact a literary medium.

I still think that text with well organized hyperlinks is better than
the totally stripped down format that you offer in AA.TXT ( which I
actually find quite *unreadable* ).

One must walk a balance between some nostalgia for the printed page --
which is not what a computer screen is -- and the overbearing,
javascripted, animated gifted format that we are stuck with.

I think as an exercise, what you are doing is good. However, as a
be-all and end-all, I can't really give it my blessing. The effort is
to find the deep reality of what this media, The Web, is really about.
It took almost three decades of filmmaking for directors to start
making films and not recordings of plays. In the same way, Art
Directors will make the Web a magazine...and writers will make the Web a
book.

But it is neither.
Jul 21 '05 #26

P: n/a

Hi Bailo,
Re: AA.HTM --> AA.TXT, in news:rI********************@speakeasy.net

You told me: << Any idiot can subtract value.
Adding value is what is needed here. >>

Like so many others, I don't want HTM_Only e-mails,
but Hotmail.COM and vendors send HTML_Only e-mails to me,
so I convert them to plain text.

Besides, You, with your not_so_amazing C#, can't convert HTML like I can,
....and that's the bigger point, in my book.

Re: My comment that #define's like LoopTo() make my code more readable,

You replied: << Well -- given that I have yet to see a single person
claim to be able to read your code, I find that unlikely. >>

None of the self_proclaimed coders here could read my code if they tried,
....Hence they never spent 10 seconds trying.

That fact, along with the parcity of quality code shown,
tells me They lack of competence... not me.

#define's like LoopTo() make my code more readable To_Me,
and to other qualified coders... none of which are here.

You added: << Jeff -- don't get me wrong.
I think the underlying effort is good.

Like you, I think that readability of Web pages
is the most neglected facet of them.
It's somewhere at about 98 on a list of 100.
The top entry being looks cool .
Yet, the Web is in fact a literary medium.

I still think that text with well organized hyperlinks is better than
the totally stripped down format that you offer in AA.TXT
( which I actually find quite unreadable ). >>

Unlike AA.HTM, I can Edit AA.TXT,
that makes it infinitely more readable... Right_There .

Editable mediums are the future, not Read_Only HTML.
Plain text is best, followed by .DOC files.

You concluded: <<
It took almost three decades of filmmaking for directors to start
making films and not recordings of plays. In the same way, Art
Directors will make the Web a magazine
...and writers will make the Web a book.

But it is neither. >>

Then go watch some more of those movies then, and let us be.

Jul 21 '05 #27

P: n/a

"Jeff_Relf" <Me@Privacy.NET> wrote in message
news:Je************************@Cotse.NET...
Re: My comment that #define's like LoopTo() make my code more readable,

You replied: << Well -- given that I have yet to see a single person
claim to be able to read your code, I find that unlikely. >>


Maybe you think your code is readable, but your posting in the newsgroups
are not very readable.

Please use a newsreader program that indent's the text you are replaying to
and put a > in front of the line.

/Søren
Jul 21 '05 #28

P: n/a

Hi Soren_Reinke, You asked me: <<
Please use a newsreader program that indent's
the text you are replaying to and put a > in front of the line. >>

Maybe you find automated contextualizing, such as: > > > > >
to be more readable, but I, for one, do not... I call that spam,
....and my newsreader, X.CPP, duly removes all such lines.

X.CPP also removes PGP sigs, vcards and other spam_like attachments.
Further, This automated crap from you unforgivible: <<
"Jeff_Relf" <Me@Privacy.NET> wrote in message
news:Je************************@Cotse.NET... >>

That is that bullshit anyways ? !
Why didn't you just tell me: <<
Hello Jeff, Re: Your superior contextualizing, >>

Handcrafted contextualizing, handcrafted quoting, rules... always will.
Now do us all a favor and get lost... you Shit_Twag.

Jul 21 '05 #29

P: n/a

"Jeff_Relf" <Me@Privacy.NET> wrote in message
news:Je************************@Cotse.NET...

Hi Soren_Reinke, You asked me: <<
Please use a newsreader program that indent's
the text you are replaying to and put a > in front of the line. >>

Maybe you find automated contextualizing, such as: > > > > >
to be more readable, but I, for one, do not... I call that spam,
...and my newsreader, X.CPP, duly removes all such lines.
Then rewrite your newsreader so it follows the netiquette.
See http://www.netmeister.org/news/learn2quote3.html#ss3.1

3.1 Which character should I use to mark the quoted text?
Use the "Greater-Than" character (">"). This character is recognized as a
quotationmark by almost every newsreader and is mentioned in the netiquette
as such for technical reasons (Son-Of-RFC 1036 and successors).

X.CPP also removes PGP sigs, vcards and other spam_like attachments.
Further, This automated crap from you unforgivible: <<
"Jeff_Relf" <Me@Privacy.NET> wrote in message
news:Je************************@Cotse.NET... >>

That is that bullshit anyways ? !
Why didn't you just tell me: <<
Hello Jeff, Re: Your superior contextualizing, >>

Handcrafted contextualizing, handcrafted quoting, rules... always will.
Now do us all a favor and get lost... you Shit_Twag.


Sure sure, mister flamebait.
Jul 21 '05 #30

P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Søren Reinke wrote:
"Jeff_Relf" <Me@Privacy.NET> wrote in message
news:Je************************@Cotse.NET...


Bye, Relffeeder, bye! *PLONK*

Greetings,
Johannes

- --
PLEASE verify my signature. Some forging troll is claiming to be me.
My GPG key id is 0xCC727E2E (dated 2004-11-03). You can get it from
wwwkeys.pgp.net or random.sks.keyserver.penguin.de.
Also: Messages from "Comcast Online" are ALWAYS forged.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCY6JSCseFG8xyfi4RAnBTAJ9qeFODE+kq/y+UlNtrZnaNGq5I7wCfe7Rl
mUl8DNMaFC2XXRTP6IJO6Oc=
=M2q/
-----END PGP SIGNATURE-----
Jul 21 '05 #31

P: n/a
In article <Je************************@Cotse.NET>, Jeff_Relf wrote:

Hi Tom, You showed: << Regex.Replace ( htmlText,
@"<(?:/{1}|[a-zA-z]+|! +)[\s\S]*?>", string.Empty); >>

Parsing my homepage, the result should be like this:
http://www.Cotse.NET/users/jeffrelf/index.TXT


Hmmm - other then entity translations - mine is exactly the same... I
told you i didn't do entity translations...

--
Tom Shelton
Jul 21 '05 #32

P: n/a

Hi Tom, Re: How my index.htm should become like this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

You told me: <<
Hmmm - other then entity translations - mine is exactly the same
...I told you i didn't do entity translations. >>

But your copy of index.htm is very different from mine,
to see what it looks like unaltered, you must do:
View_Source --> File --> Save_Page_As

At any rate, try handling <> chevrons embeded inside quoted strings,
removing lines with just whitespace and tags,
and preserving blank lines that didn't have tags, i.e. preserving whitespace.

If String and RegEx are as powerful as you claim,
you should be able to do that while your clothes are in the dryer.

This is the input, View_Source --> File --> Save_Page_As
http://www.Cotse.NET/users/jeffrelf/AA.HTM
This is what HTM_TXT.EXE outputs:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

Jul 21 '05 #33

P: n/a
Jeff_Relf wrote:
Hi Tom, Re: How my index.htm should become like this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

You told me: <<
Hmmm - other then entity translations - mine is exactly the same
...I told you i didn't do entity translations. >>


Jeff,

I find it funny you questioning people's productivity when you're a
deadbeat who can spend 24 hours a day twiddling your bits while the rest
of us work hard at paying jobs all day, and then dust your code off in
like 5 minutes in the evening.

Jul 21 '05 #34

P: n/a

Hi Bailo, Re: Shelton's claim that C# allows him to breezily write code
that matches HTM_TXT.EXE's functionality,

You told me: << I find it funny you questioning people's productivity
when you're a deadbeat who can spend 24 hours a day twiddling your bits
while the rest of us work hard at paying jobs all day,
and then dust your code off in like 5 minutes in the evening. >>

No one here has Dusted_Off my code, or otherwise matched it,
lots say they can... but no one has demonstrated it to me.
Are you serious programmers... I don't think so.

Now playing: Sole's Da_Baddest_Poet

Jul 21 '05 #35

P: n/a
Jeff_Relf wrote:
and then dust your code off in like 5 minutes in the evening. >>

No one here has Dusted_Off my code, or otherwise matched it,
lots say they can... but no one has demonstrated it to me.
Are you serious programmers... I don't think so.


Yes, in fact they did.

Simply by using Regex.

Why? Because writing a program that way, allows any valid Regex
expression to be used within the program. So, we wrote code that is
essentially open and can be immediately changed by oher users.

Your code is closed, single purpose. It is not extensible and serves no
purpose.

Jul 21 '05 #36

P: n/a

Hi John, I don't care what you think,
no one matched the Results that I produced.

Kelsey's code trashed my home page,
and Shelton couldn't even downloand my home page.

The challange was simple, you udder moron ( mooo ),
given AA.HTM, produce something as good as HTM_TXT.EXE's AA.TXT.

You concluded: << Your code is closed, single purpose.
It is not extensible and serves no purpose. >>

My HTM --> TXT code works, John. Yours doesn't even exist.
X.CPP, which HTM_TXT.EXE partially demonstrates, has serious utility.
You, Kelsey and Shelton judge code my how easy it is for you to write,
....but I judge code by what it does.

Jul 21 '05 #37

P: n/a
Jeff_Relf wrote:
Hi John, I don't care what you think,
no one matched the Results that I produced.

Kelsey's code trashed my home page,
and Shelton couldn't even downloand my home page.

The challange was simple, you udder moron ( mooo ),
given AA.HTM, produce something as good as HTM_TXT.EXE's AA.TXT.

You concluded: << Your code is closed, single purpose.
It is not extensible and serves no purpose. >>

My HTM --> TXT code works, John. Yours doesn't even exist.
X.CPP, which HTM_TXT.EXE partially demonstrates, has serious utility.
You, Kelsey and Shelton judge code my how easy it is for you to write,
...but I judge code by what it does.


what about this?

using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;

namespace htm_txt
{
class Program
{
static void Main(string[] args)
{
string input;

using (StreamReader sr = new StreamReader(args[0]))
input = sr.ReadToEnd();

string output = Regex.Replace(input,
@"(?'entity'&((\w+)|(#[0-9]+)|(#x[0-9a-fA-F]+));?)|(?'tag'<[/?]?\w+(([^\][^""])|(\""[^\][^""]*?\""))*?>)|(?'comment'<!--.*?-->)",

delegate (Match m)
{
// convert entities
if (m.Groups["entity"].Success)
{
if (m.Value == "&nbsp;")
return " ";
return System.Web.HttpUtility.HtmlDecode(m.Value);
}

// convert the <br> tag into a \n
if (m.Groups["tag"].Success && m.Value.ToLower() ==
"<br>")
return "\n";

// clear the rest
return "";
}, RegexOptions.Singleline); // the singleline
option is required in order to match commends and/or tags spanning
multiple lines

// strip excess empty lines
output = Regex.Replace(output, @"(([ \t]*\r?\n){2})([
\t]*\r?\n)+", "$1");
// strip empty lines at beginning and end
output = Regex.Replace(output, @"(^([ \t]*\r?\n)+)|([
\t]*\r?\n)+$", "");

// write output
using (StreamWriter sw = new StreamWriter(args[1]))
sw.Write(output);
}
}
}
Jul 21 '05 #38

P: n/a

Hi Stefan_Simek, The title of the post you replied to was: <<
I judge code by what it does. >>

Can you show me links like my AA.HTM, HTM_TXT.EXE and AA.TXT ?

This is the input, View_Source --> File --> Save_Page_As
http://www.Cotse.NET/users/jeffrelf/AA.HTM
This is what HTM_TXT.EXE outputs:
http://www.Cotse.NET/users/jeffrelf/AA.TXT

http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ

Jul 21 '05 #39

P: n/a
Stefan Simek wrote:
Jeff_Relf wrote:


*plonk*

Jul 21 '05 #40

P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stefan Simek wrote:
Jeff_Relf wrote:


Freaking Relffeeder. *PLONK*

Greetings,
Johannes

- --
PLEASE verify my signature. Some forging troll is claiming to be me.
My GPG key id is 0xCC727E2E (dated 2004-11-03). You can get it from
wwwkeys.pgp.net or random.sks.keyserver.penguin.de.
Also: Messages from "Comcast Online" are ALWAYS forged.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCZXCfCseFG8xyfi4RAtObAKCf3t+Wq6/cEl3Zw+u2MbM1FLud0QCgl1pu
i2LNKe6YeOcuG1YzUxD75lc=
=rbsL
-----END PGP SIGNATURE-----
Jul 21 '05 #41

P: n/a
Jeff_Relf wrote:
no one matched the Results that I produced.


And you did not match the results of anyone else's code, either.
Jul 21 '05 #42

P: n/a

Hi John, Re: My comment that no one produced Results as good as mine,

You told me: <<
And you did not match the results of anyone else's code, either. >>

An' yer saying exactly what... that you prefered their Results,
....or, more likely, that you prefered the inane simplicity of their source ?

I'm glad I'm not working beside you... because your attitude sucks.

P.S. I watched that Veronica_Mars AVI you told me about,
veronica_mars.1x18.weapons_of_class_destruction.hd tv_xvid-fov.[BT].avi.torrent
The penguin was not just the perp... he was also a Whining_Wimp.

The Mac expert was a Sexy_Super_Hero.
The Windows user was a Dumb_Brute.
The FBI investigators were the Keystone_Cops.

Speaking of WiFi illegalities,
I've been benching my free LANs a lot, one is often 3.7/.37 MBps, Comcast,
so my self imposed cap of .13 MBps upstream would not be noticed at all,
neither by me when I'm surfing, nor by the LAN's owner.

Jul 21 '05 #43

P: n/a

Oops, make that 3.7/.37 Mbps, and .13 Mbps upstream, not MBps.

Jul 21 '05 #44

P: n/a
Jeff_Relf wrote:
Oops, make that 3.7/.37 Mbps, and .13 Mbps upstream, not MBps.


Yeah, Comcast is even advertising 4M on their flyers ( got one in the
mail today ).

But there's something about cable networks...that makes me prefer dsl
Jul 21 '05 #45

P: n/a

Hi John, Re: My free LANs, You told me: <<
Yeah, Comcast is even advertising 4M on their flyers
( got one in the mail today ).

But there's something about cable networks
...that makes me prefer dsl >>

I've been benching a lot, the one I just did was 4 Mbps/.36 Mbps !
I think it's the colledge kids next door... they must be asleep.
I need something to download, I hate seeing all that bandwith being unused.

Jul 21 '05 #46

P: n/a
Can you take this OT stuff of the .NET groups please

Regards

Richard Blewett - DevelopMentor
http://www.dotnetconsult.co.uk/weblog
http://www.dotnetconsult.co.uk
Jul 21 '05 #47

P: n/a
Jeff_Relf wrote:
Hi Stefan_Simek, The title of the post you replied to was: <<
I judge code by what it does. >>

Can you show me links like my AA.HTM, HTM_TXT.EXE and AA.TXT ?

This is the input, View_Source --> File --> Save_Page_As
http://www.Cotse.NET/users/jeffrelf/AA.HTM http://www.triaxis.sk/temp/AA.HTM This is what HTM_TXT.EXE outputs:
http://www.Cotse.NET/users/jeffrelf/AA.TXT http://www.triaxis.sk/temp/AA.TXT
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.EXE
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.CPP
http://www.Cotse.NET/users/jeffrelf/HTM_TXT.VCPROJ http://www.triaxis.sk/temp/HTM_TXT.ZIP

Jul 21 '05 #48

P: n/a

Re: http://www.triaxis.sk/temp/AA.TXT
Well, Howdy_Do_Dee, Stefan_Simek... well done !

But you're not preserving whitespace like I do,
you're simply collapsing whitespace,
so it doesn't handle the <pre> tag.

For example, HTM_TXT.EXE handles <pre> by turning this:
http://www.Cotse.NET/users/jeffrelf/index.htm
( View_Source --> File --> Save_Page_As )
into this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

What I do is remove lines that have nothing but whitespace and tags
but leave all lines that had no tags, even if they're just blank lines.
I doubt that your RegEx could do that.

On a much more minor point,
your  » and  © are each using two 7-bit char UTF encoding,
which can be further decoded to single 8-bit chars.

Given that Ch is the name of the first 7-bit char and Ch2 is the second,
UTF decoding goes something like this:
if ( Ch == 194 || Ch == 195 ) && ( Ch2 & 0xC0 ) == 0x80 )
return ( Ch & 3 ) << 6 | Ch2 & 0x3F ;

Re: http://www.triaxis.sk/temp/HTM_TXT.ZIP

That's a lot more files/directories than I showed you,
which, in my opinion, makes your project much less readable/maintainable.

Re: Double_clicking: C:\__\Stefan_Simek\bin\Release\htm_txt.EXE

I got a message saying that I didn't have the .NET's 2.0.5 framework.
I'm running on a Win_XP system I bought about 6 months ago at Office_Depot.
I have MS_Office_XP and Visual_Studio_NET_2003 installed on it,
with whatever the default install did.

Re: C:\__\Stefan_Simek\Program.cs

Although I'm sure it's not true, my copy of Visual_Studio_NET_2003 tells me
there's a bunch of syntax errors in it, including mismatched braces {}.

Re:
Regex.Replace( input
, @"(?'entity'&((\w+)|(#[0-9]+)|(#x[0-9a-fA-F]+));?)|
(?'tag'<[/?]?\w+(([^\][^""])|(\""[^\][^""]*?\""))*?>)
|(?'comment'<!--.*?-->)",
delegate ( Match m ) { // convert entities
if ( m.Groups["entity"].Success ) {
if ( m.Value == "&nbsp;") return " ";
return System.Web.HttpUtility.HtmlDecode( m.Value ); }
if ( m.Groups[ "tag" ].Success && m.Value.ToLower() == "<br>" )
return "\n";
// clear the rest
return ""; }, RegexOptions.Singleline );

Hmm... return System.Web.HttpUtility.HtmlDecode( m.Value ); ?
That's quite bizarre, nothing like C... but it works, I see.

Jul 21 '05 #49

P: n/a
Jeff_Relf wrote:
Re: http://www.triaxis.sk/temp/AA.TXT
Well, Howdy_Do_Dee, Stefan_Simek... well done !
Thx ;)

But you're not preserving whitespace like I do,
you're simply collapsing whitespace,
so it doesn't handle the <pre> tag.
??

For example, HTM_TXT.EXE handles <pre> by turning this:
http://www.Cotse.NET/users/jeffrelf/index.htm
( View_Source --> File --> Save_Page_As )
into this:
http://www.Cotse.NET/users/jeffrelf/index.TXT

What I do is remove lines that have nothing but whitespace and tags
but leave all lines that had no tags, even if they're just blank lines.
I doubt that your RegEx could do that.
Well, my conversion of your index.htm is byte-to-byte equivalent except
for two empty lines at the end of the document. And I see no special
treatment of the <pre> tag in your code, unless I'm blind...

On a much more minor point,
your  » and  © are each using two 7-bit char UTF encoding,
which can be further decoded to single 8-bit chars.
The output encoding is UTF-8 by default in .NET. You can change it to
anything else by changing the line

using (StreamWriter sw = new StreamWriter(args[1]))

to

using (StreamWriter sw = new StreamWriter(args[1], false,
Encoding.GetEncoding(1250_or_whatever)))

Given that Ch is the name of the first 7-bit char and Ch2 is the second,
UTF decoding goes something like this:
if ( Ch == 194 || Ch == 195 ) && ( Ch2 & 0xC0 ) == 0x80 )
return ( Ch & 3 ) << 6 | Ch2 & 0x3F ;

Re: http://www.triaxis.sk/temp/HTM_TXT.ZIP

That's a lot more files/directories than I showed you,
which, in my opinion, makes your project much less readable/maintainable.
???
I thought that VS6.0 for example generated .dsw, .dsp, .ncb and more
files as well. The *only* file required is the Program.cs. I've provided
a link to the .NET 1.1 source, build command and exe at the end.

Re: Double_clicking: C:\__\Stefan_Simek\bin\Release\htm_txt.EXE

I got a message saying that I didn't have the .NET's 2.0.5 framework.
I'm running on a Win_XP system I bought about 6 months ago at Office_Depot.
I have MS_Office_XP and Visual_Studio_NET_2003 installed on it,
with whatever the default install did.
Well, as the error message says, it requires the .NET framework 2.0.5
(beta2).

Re: C:\__\Stefan_Simek\Program.cs

Although I'm sure it's not true, my copy of Visual_Studio_NET_2003 tells me
there's a bunch of syntax errors in it, including mismatched braces {}.
Because anonymous delegates were not supported back in 1.1.

Re:
Regex.Replace( input
, @"(?'entity'&((\w+)|(#[0-9]+)|(#x[0-9a-fA-F]+));?)|
(?'tag'<[/?]?\w+(([^\][^""])|(\""[^\][^""]*?\""))*?>)
|(?'comment'<!--.*?-->)",
delegate ( Match m ) { // convert entities
if ( m.Groups["entity"].Success ) {
if ( m.Value == "&nbsp;") return " ";
return System.Web.HttpUtility.HtmlDecode( m.Value ); }
if ( m.Groups[ "tag" ].Success && m.Value.ToLower() == "<br>" )
return "\n";
// clear the rest
return ""; }, RegexOptions.Singleline );

Hmm... return System.Web.HttpUtility.HtmlDecode( m.Value ); ?
I see no reason for writing my own entity parser as long as there's one
provided by the framework.
That's quite bizarre, nothing like C... but it works, I see.
Sure it's not like C. It's been a few years since 1978, and the ways of
programing have evolved by now...


See the following for a .NET 1.1 version, whith added command line
checking and exception handling:
http://www.kascomp.sk/tmp/htm_txt.cs
http://www.kascomp.sk/tmp/htm_txt.exe
http://www.kascomp.sk/tmp/build.bat
Jul 21 '05 #50

58 Replies

This discussion thread is closed

Replies have been disabled for this discussion.