Hi there,
I have an html file like this.
----------------------------------------
<body>
<h1>Home Page</h1>
<p>
Welcome<br>To<br>
<br> My Home Page
</p>
--------------------------------------------
I want to know exact number of string tokens
it should discard the new lines (But completely discarding them will result
in merging the words seperated by new lines), discard too much white space
etc.
Please, some help will be appreciated.
I wrote this function initially, which only works with single white space.
public int count_body(string s)
{
char[] sp = {' '};
int count = s.Split(sp).Length;
return count;
}
Thanks!! 2 1258
kman,
You are better off using an HTML parser for something like this. You
can use MSHTML through interop (Microsoft's HTML parser), and then access
the innerText property to get just the text for the document. You can then
parse apart that text easily (it should be broken properly, even with the BR
tags in between, which you won't see in the innerText).
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
<km**@yahoo.com> wrote in message news:nb********************@rogers.com... Hi there,
I have an html file like this. ---------------------------------------- <body> <h1>Home Page</h1> <p> Welcome<br>To<br>
<br> My Home Page </p> --------------------------------------------
I want to know exact number of string tokens
it should discard the new lines (But completely discarding them will result in merging the words seperated by new lines), discard too much white space etc.
Please, some help will be appreciated.
I wrote this function initially, which only works with single white space.
public int count_body(string s) { char[] sp = {' '}; int count = s.Split(sp).Length; return count; }
Thanks!!
Try using regular expressions
<km**@yahoo.com> wrote in message news:nb********************@rogers.com... Hi there,
I have an html file like this. ---------------------------------------- <body> <h1>Home Page</h1> <p> Welcome<br>To<br>
<br> My Home Page </p> --------------------------------------------
I want to know exact number of string tokens
it should discard the new lines (But completely discarding them will
result in merging the words seperated by new lines), discard too much white space etc.
Please, some help will be appreciated.
I wrote this function initially, which only works with single white space.
public int count_body(string s) { char[] sp = {' '}; int count = s.Split(sp).Length; return count; }
Thanks!!
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Andy Mee |
last post by:
Hello one and all,
I'm developing an Asp.NET system to take a CSV file uploaded via the web,
parse it, and insert the values into an SQL database. My sticking point
comes when I try to split()...
|
by: Christopher Benson-Manica |
last post by:
(if this is a FAQ, I apologize for not finding it)
I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned...
|
by: Generic Usenet Account |
last post by:
Is it that I am blurry eyed, or is it indeed that the C++ string class
has no tokenizer method defined? I have defined my own functions , but I would prefer to use the
standard functions, if...
|
by: Mike N. |
last post by:
Hello:
I have a form that contains a multiple-select field that has 12
options in it. I would like the user to be able to select UP TO FOUR
of those options. If they select more than four, I...
|
by: Mikael Syska |
last post by:
Hi,
I'm reading Beginning Visual C-Sharp by wrox, great book by the way.
In the book the describe how I can print, and it works great, but what
if I has like 300 lines, that wont fit on a...
|
by: j |
last post by:
Hi,
I've been trying to do line/character counts on documents that are
being uploaded. As well as the "counting" I also have to remove
certain sections from the file.
So, firstly I was working...
|
by: Jerry |
last post by:
We have a 10-question quiz for kids, each question being a yes or no
answer using radio selections. I'd like to keep a current total of
yes's and no's at the bottom of the quiz (if the user selects...
|
by: Tempo |
last post by:
Hello. I am having a little trouble extracting text from a string. The
string that I am dealing with is pasted below, and I want to extract
the prices that are contained in the string below. Thanks...
|
by: Bilal |
last post by:
Hello,
I'm trying to perform some string manipulations in my stylesheet and
have gotten stuck on the issue below so hopefully can elicit some useful
hints.
Namely, the problem is that I need to...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |