Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I use
length($sentence[0]) I get values in the hundreds for small sentences. Most
documentation I found said "length() returns the number of chars" however,
some said "length() returns the number of bytes". To get the number of
chars in this case, can I just divide by 8 or something?
Thanks for your help.
Mitchua 5 14603
"Mitchua" <mi*****@yahoo.com> wrote in message
news:V5********************@news02.bloor.is.net.ca ble.rogers.com... Simplified a bit, I'm parsing HTML documents to get sentences e.g. my $html = get($URL); # remove all HTML TAGs...blah blah blah @sentences = split(/\./, $html)); then I'm trying to determine the number of characters in the sentence. However, although when I print the sentences they look fine, when I use length($sentence[0]) I get values in the hundreds for small sentences.
Most documentation I found said "length() returns the number of chars" however, some said "length() returns the number of bytes". To get the number of chars in this case, can I just divide by 8 or something?
Would something like sprintf("%20s", $sentence[0]) work to crop the sentence
to 20 characters?
--Mitchua
Mitchua wrote: "Mitchua" <mi*****@yahoo.com> wrote in message news:V5********************@news02.bloor.is.net.ca ble.rogers.com... Simplified a bit, I'm parsing HTML documents to get sentences e.g. my $html = get($URL); # remove all HTML TAGs...blah blah blah @sentences = split(/\./, $html)); then I'm trying to determine the number of characters in the sentence. However, although when I print the sentences they look fine, when I use length($sentence[0]) I get values in the hundreds for small sentences. Most documentation I found said "length() returns the number of chars" however, some said "length() returns the number of bytes". To get the number of chars in this case, can I just divide by 8 or something?
Would something like sprintf("%20s", $sentence[0]) work to crop the sentence to 20 characters?
--Mitchua
perldoc -f length:
"length EXPR
length Returns the length in characters of the value of EXPR..."
BUT length() returns the length in bytes when the bytes pragma is used, eg:
$x = chr(400);
print "Length is ", length $x, "\n"; # "Length is 1"
printf "Contents are %vd\n", $x; # "Contents are 400"
{
use bytes;
print "Length is ", length $x, "\n"; # "Length is 2"
printf "Contents are %vd\n", $x; # "Contents are 198.144"
}
perldoc bytes for more info.
Cheers,
--
Rich sc*********@yahoo.co.uk
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
"Mitchua" <mi*****@yahoo.com> wrote in
news:V5********************@news02.bloor.is.net.ca ble.rogers.com: Simplified a bit, I'm parsing HTML documents to get sentences e.g. my $html = get($URL); # remove all HTML TAGs...blah blah blah @sentences = split(/\./, $html)); then I'm trying to determine the number of characters in the sentence. However, although when I print the sentences they look fine, when I use length($sentence[0]) I get values in the hundreds for small sentences. Most documentation I found said "length() returns the number of chars" however, some said "length() returns the number of bytes". To get the number of chars in this case, can I just divide by 8 or something?
Only if your characters are 8 bytes wide!
Do you have an example of input data that exhibits this length()
discrepancy? Can you include the output of something like:
print "[[[$string]]] ", length($string), "\n";
- --
Eric
$_ = reverse sort qw p ekca lre Js reh ts
p, $/.r, map $_.$", qw e p h tona e; print
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>
iQA/AwUBPxTBu2PeouIeTNHoEQIJcgCeNrC1lDNYKBtdGsL5Bw0bxd IM2BMAnRAr
vTZutckih5KT81pj/63k5mDZ
=1LLa
-----END PGP SIGNATURE-----
"Eric J. Roode" <RE***********@comcast.net> wrote in message
news:Xn*************************@206.127.4.25... -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
"Mitchua" <mi*****@yahoo.com> wrote in news:V5********************@news02.bloor.is.net.ca ble.rogers.com:
Simplified a bit, I'm parsing HTML documents to get sentences e.g. my $html = get($URL); # remove all HTML TAGs...blah blah blah @sentences = split(/\./, $html)); then I'm trying to determine the number of characters in the sentence. However, although when I print the sentences they look fine, when I use length($sentence[0]) I get values in the hundreds for small sentences. Most documentation I found said "length() returns the number of chars" however, some said "length() returns the number of bytes". To get the number of chars in this case, can I just divide by 8 or something?
Only if your characters are 8 bytes wide!
Do you have an example of input data that exhibits this length() discrepancy?
Checkout Rich's reply. My problem was that I was using length($sentence)
instead of length $sentence. Once I changed that, it was all good. Thanks
for the reply.
Mitchua
"Mitchua" <mi*****@yahoo.com> wrote in
news:7X********************@news04.bloor.is.net.ca ble.rogers.com: Checkout Rich's reply. My problem was that I was using length($sentence) instead of length $sentence. Once I changed that, it was all good. Thanks for the reply.
Hmmm. I fail to see how that could possibly make a difference. But hey,
whatever works is good.
--
Eric
$_ = reverse sort qw p ekca lre Js reh ts
p, $/.r, map $_.$", qw e p h tona e; print This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Bert Sierra |
last post by:
Does anyone know if there's an upper limit to the length of the query
string supplied to the mysql_query() function? It appears that strings
themselves can go well beyond 65,536 characters: the...
|
by: Wade G. Pemberton |
last post by:
Can't find it quickly in the reference books.
|
by: John Smith |
last post by:
I want to shorten a string by replacing the last character by '\0'.
The following code displays the string. It works fine. It's in a
loop and different strings are displayed without problems....
|
by: sudhaoncyberworld |
last post by:
using eval function i am executing a lengthy string expression...in
fact i am framing some string which contains the commands of creating
elements using dom2...this is very very lengthy string
...
|
by: suresh |
last post by:
Hello all,
I have a problem while working with fixed length stings in Vb.net
The problem is, when I call a Function FormInfo to return a FormName, which
is a string value, it is returning some...
|
by: Martin Herbert Dietze |
last post by:
Hi,
I need to calculate the physical length of text in a text
input. The term "physical" means in this context, that I
consider 7bit-Ascii as one-byte-per character. Other characters
may be...
|
by: Martin Pöpping |
last post by:
Hello,
does a String in C# have a maximum length?
I tried to write a ToString Method of my class containing a hashtable.
At the beginning of the method i defined a String "ret".
In every...
|
by: Rick Knospler |
last post by:
I am trying to convert a vb6 project to vb.net. The conversion worked for
the most part except for the fixed length strings and fixed length string
arrays. Bascially the vb6 programmer stored all...
|
by: gezerpunta |
last post by:
Hi
strlen does not return the correct value .I compared the filesize()
and strlen byte size but they are not equal. I must find binary string
length and it must be equal to filesize()
thks.
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
| |