473,387 Members | 3,033 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

How to get length of string? length() problems

Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I use
length($sentence[0]) I get values in the hundreds for small sentences. Most
documentation I found said "length() returns the number of chars" however,
some said "length() returns the number of bytes". To get the number of
chars in this case, can I just divide by 8 or something?

Thanks for your help.
Mitchua
Jul 19 '05 #1
5 14603
"Mitchua" <mi*****@yahoo.com> wrote in message
news:V5********************@news02.bloor.is.net.ca ble.rogers.com...
Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I use
length($sentence[0]) I get values in the hundreds for small sentences. Most documentation I found said "length() returns the number of chars" however,
some said "length() returns the number of bytes". To get the number of
chars in this case, can I just divide by 8 or something?


Would something like sprintf("%20s", $sentence[0]) work to crop the sentence
to 20 characters?

--Mitchua
Jul 19 '05 #2
Mitchua wrote:
"Mitchua" <mi*****@yahoo.com> wrote in message
news:V5********************@news02.bloor.is.net.ca ble.rogers.com...
Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I use
length($sentence[0]) I get values in the hundreds for small sentences.

Most
documentation I found said "length() returns the number of chars"
however,
some said "length() returns the number of bytes". To get the number of
chars in this case, can I just divide by 8 or something?


Would something like sprintf("%20s", $sentence[0]) work to crop the
sentence to 20 characters?

--Mitchua


perldoc -f length:

"length EXPR
length Returns the length in characters of the value of EXPR..."
BUT length() returns the length in bytes when the bytes pragma is used, eg:

$x = chr(400);
print "Length is ", length $x, "\n"; # "Length is 1"
printf "Contents are %vd\n", $x; # "Contents are 400"
{
use bytes;
print "Length is ", length $x, "\n"; # "Length is 2"
printf "Contents are %vd\n", $x; # "Contents are 198.144"
}

perldoc bytes for more info.

Cheers,
--
Rich
sc*********@yahoo.co.uk
Jul 19 '05 #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"Mitchua" <mi*****@yahoo.com> wrote in
news:V5********************@news02.bloor.is.net.ca ble.rogers.com:
Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I
use length($sentence[0]) I get values in the hundreds for small
sentences. Most documentation I found said "length() returns the
number of chars" however, some said "length() returns the number of
bytes". To get the number of chars in this case, can I just divide by
8 or something?


Only if your characters are 8 bytes wide!

Do you have an example of input data that exhibits this length()
discrepancy? Can you include the output of something like:

print "[[[$string]]] ", length($string), "\n";

- --
Eric
$_ = reverse sort qw p ekca lre Js reh ts
p, $/.r, map $_.$", qw e p h tona e; print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBPxTBu2PeouIeTNHoEQIJcgCeNrC1lDNYKBtdGsL5Bw0bxd IM2BMAnRAr
vTZutckih5KT81pj/63k5mDZ
=1LLa
-----END PGP SIGNATURE-----
Jul 19 '05 #4

"Eric J. Roode" <RE***********@comcast.net> wrote in message
news:Xn*************************@206.127.4.25...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"Mitchua" <mi*****@yahoo.com> wrote in
news:V5********************@news02.bloor.is.net.ca ble.rogers.com:
Simplified a bit, I'm parsing HTML documents to get sentences e.g.
my $html = get($URL);
# remove all HTML TAGs...blah blah blah
@sentences = split(/\./, $html));
then I'm trying to determine the number of characters in the sentence.
However, although when I print the sentences they look fine, when I
use length($sentence[0]) I get values in the hundreds for small
sentences. Most documentation I found said "length() returns the
number of chars" however, some said "length() returns the number of
bytes". To get the number of chars in this case, can I just divide by
8 or something?


Only if your characters are 8 bytes wide!

Do you have an example of input data that exhibits this length()
discrepancy?


Checkout Rich's reply. My problem was that I was using length($sentence)
instead of length $sentence. Once I changed that, it was all good. Thanks
for the reply.

Mitchua
Jul 19 '05 #5
"Mitchua" <mi*****@yahoo.com> wrote in
news:7X********************@news04.bloor.is.net.ca ble.rogers.com:
Checkout Rich's reply. My problem was that I was using
length($sentence) instead of length $sentence. Once I changed that,
it was all good. Thanks for the reply.


Hmmm. I fail to see how that could possibly make a difference. But hey,
whatever works is good.

--
Eric
$_ = reverse sort qw p ekca lre Js reh ts
p, $/.r, map $_.$", qw e p h tona e; print
Jul 19 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Bert Sierra | last post by:
Does anyone know if there's an upper limit to the length of the query string supplied to the mysql_query() function? It appears that strings themselves can go well beyond 65,536 characters: the...
5
by: Wade G. Pemberton | last post by:
Can't find it quickly in the reference books.
24
by: John Smith | last post by:
I want to shorten a string by replacing the last character by '\0'. The following code displays the string. It works fine. It's in a loop and different strings are displayed without problems....
4
by: sudhaoncyberworld | last post by:
using eval function i am executing a lengthy string expression...in fact i am framing some string which contains the commands of creating elements using dom2...this is very very lengthy string ...
1
by: suresh | last post by:
Hello all, I have a problem while working with fixed length stings in Vb.net The problem is, when I call a Function FormInfo to return a FormName, which is a string value, it is returning some...
13
by: Martin Herbert Dietze | last post by:
Hi, I need to calculate the physical length of text in a text input. The term "physical" means in this context, that I consider 7bit-Ascii as one-byte-per character. Other characters may be...
7
by: Martin Pöpping | last post by:
Hello, does a String in C# have a maximum length? I tried to write a ToString Method of my class containing a hashtable. At the beginning of the method i defined a String "ret". In every...
1
by: Rick Knospler | last post by:
I am trying to convert a vb6 project to vb.net. The conversion worked for the most part except for the fixed length strings and fixed length string arrays. Bascially the vb6 programmer stored all...
5
by: gezerpunta | last post by:
Hi strlen does not return the correct value .I compared the filesize() and strlen byte size but they are not equal. I must find binary string length and it must be equal to filesize() thks.
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.