By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,798 Members | 1,349 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,798 IT Pros & Developers. It's quick & easy.

parsing contents of variable for a specific value...

P: n/a
i have the contents of an html page stored within a variable. i would like
to parse out the value of the TITLE tag,
ie..<TITLE>this_value_is_what_i_want</TITLE>

String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
title = urlContent.substring(startidx, endidx);
System.out.println(title); //this doesn't work for me. generates an out
of bounds err msg.

is there a way of doing this with java2...trying to stay away from xml since
it's new to me.

thanks
Jul 17 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
nos
i would suggest you first check the result of the urlContent.indexOf()
method invocations to see if you are getting -1 or null or whatever
(some html pages use lower case)

"Scaramouche" <sp*************@forgetit.com> wrote in message
news:o6********************@twister.tampabay.rr.co m...
i have the contents of an html page stored within a variable. i would like to parse out the value of the TITLE tag,
ie..<TITLE>this_value_is_what_i_want</TITLE>

String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
title = urlContent.substring(startidx, endidx);
System.out.println(title); //this doesn't work for me. generates an out of bounds err msg.

is there a way of doing this with java2...trying to stay away from xml since it's new to me.

thanks

Jul 17 '05 #2

P: n/a
thank you for taking the time to try and help out.
i thought the same thing and when i checked, endidx did contain a -1. this
is somewhat confusing since the spelling and case of the closing (</TITLE>)
tag is correct, i thought the slash might be throwing it off but since it's
a string i don't think that's it.
thanks again!

"nos" <no*@nospam.com> wrote in message
news:98avb.258761$Tr4.806047@attbi_s03...
i would suggest you first check the result of the urlContent.indexOf()
method invocations to see if you are getting -1 or null or whatever
(some html pages use lower case)

"Scaramouche" <sp*************@forgetit.com> wrote in message
news:o6********************@twister.tampabay.rr.co m...
i have the contents of an html page stored within a variable. i would

like
to parse out the value of the TITLE tag,
ie..<TITLE>this_value_is_what_i_want</TITLE>

String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
title = urlContent.substring(startidx, endidx);
System.out.println(title); //this doesn't work for me. generates an

out
of bounds err msg.

is there a way of doing this with java2...trying to stay away from xml

since
it's new to me.

thanks


Jul 17 '05 #3

P: n/a
Hmm, strange. I compiled and ran your code and it worked just fine.

Here's the exact program I used:

public class Tester {

public static void main (String[] args) {
String urlContent = "<TITLE>this_value_is_what_i_want</TITLE>";
String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
System.out.println("startidx: " + startidx + " endidx: " + endidx);
title = urlContent.substring(startidx, endidx);
System.out.println(title);
}
}
And here's the output:

startidx: 7 endidx: 32

this_value_is_what_i_want
Seems OK to me.


Scaramouche wrote:
thank you for taking the time to try and help out.
i thought the same thing and when i checked, endidx did contain a -1. this
is somewhat confusing since the spelling and case of the closing (</TITLE>)
tag is correct, i thought the slash might be throwing it off but since it's
a string i don't think that's it.
thanks again!

"nos" <no*@nospam.com> wrote in message
news:98avb.258761$Tr4.806047@attbi_s03...
i would suggest you first check the result of the urlContent.indexOf()
method invocations to see if you are getting -1 or null or whatever
(some html pages use lower case)

"Scaramouche" <sp*************@forgetit.com> wrote in message
news:o6********************@twister.tampabay.rr. com...
i have the contents of an html page stored within a variable. i would


like
to parse out the value of the TITLE tag,
ie..<TITLE>this_value_is_what_i_want</TITLE>

String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
title = urlContent.substring(startidx, endidx);
System.out.println(title); //this doesn't work for me. generates an


out
of bounds err msg.

is there a way of doing this with java2...trying to stay away from xml


since
it's new to me.

thanks




Jul 17 '05 #4

P: n/a
nos
i wonder if it might be something about an imbedded <cr><lf>

"Greg" <sp*******@noThanks.com> wrote in message
news:M3*****************@news.nnrp.ca...
Hmm, strange. I compiled and ran your code and it worked just fine.

Here's the exact program I used:

public class Tester {

public static void main (String[] args) {
String urlContent = "<TITLE>this_value_is_what_i_want</TITLE>";
String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
System.out.println("startidx: " + startidx + " endidx: " + endidx); title = urlContent.substring(startidx, endidx);
System.out.println(title);
}
}
And here's the output:

startidx: 7 endidx: 32

this_value_is_what_i_want
Seems OK to me.


Scaramouche wrote:
thank you for taking the time to try and help out.
i thought the same thing and when i checked, endidx did contain a -1. this is somewhat confusing since the spelling and case of the closing (</TITLE>) tag is correct, i thought the slash might be throwing it off but since it's a string i don't think that's it.
thanks again!

"nos" <no*@nospam.com> wrote in message
news:98avb.258761$Tr4.806047@attbi_s03...
i would suggest you first check the result of the urlContent.indexOf()
method invocations to see if you are getting -1 or null or whatever
(some html pages use lower case)

"Scaramouche" <sp*************@forgetit.com> wrote in message
news:o6********************@twister.tampabay.rr. com...

i have the contents of an html page stored within a variable. i would

like

to parse out the value of the TITLE tag,
ie..<TITLE>this_value_is_what_i_want</TITLE>

String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
title = urlContent.substring(startidx, endidx);
System.out.println(title); //this doesn't work for me. generates an

out

of bounds err msg.

is there a way of doing this with java2...trying to stay away from xml

since

it's new to me.

thanks



Jul 17 '05 #5

P: n/a
my variable (htmlContent) contains the source of an entire web page. as i
go through it i get the output below, if i try and assign the value i'm
after to a string variable i would get a null pointer exception.

======
while ((htmlContent=input.readLine())!= null)
{
System.out.println(htmlContent);
startIdx = htmlContent.indexOf("<title>");
startIdx += 7;
endIdx = htmlContent.indexOf("</title>");
}
// String myTitle = htmlContent.substring(startIdx, endIdx);
System.out.println(startIdx + " " + endIdx);
------output------
6 -1
======

not sure what i'm doing wrong.

"nos" <no*@nospam.com> wrote in message
news:Yndvb.195123$mZ5.1451628@attbi_s54...
i wonder if it might be something about an imbedded <cr><lf>

"Greg" <sp*******@noThanks.com> wrote in message
news:M3*****************@news.nnrp.ca...
Hmm, strange. I compiled and ran your code and it worked just fine.

Here's the exact program I used:

public class Tester {

public static void main (String[] args) {
String urlContent = "<TITLE>this_value_is_what_i_want</TITLE>";
String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
System.out.println("startidx: " + startidx + " endidx: " +

endidx);
title = urlContent.substring(startidx, endidx);
System.out.println(title);
}
}
And here's the output:

startidx: 7 endidx: 32

this_value_is_what_i_want
Seems OK to me.


Scaramouche wrote:
thank you for taking the time to try and help out.
i thought the same thing and when i checked, endidx did contain a -1. this is somewhat confusing since the spelling and case of the closing (</TITLE>) tag is correct, i thought the slash might be throwing it off but since it's a string i don't think that's it.
thanks again!

"nos" <no*@nospam.com> wrote in message
news:98avb.258761$Tr4.806047@attbi_s03...

>i would suggest you first check the result of the urlContent.indexOf()
>method invocations to see if you are getting -1 or null or whatever
>(some html pages use lower case)
>
>"Scaramouche" <sp*************@forgetit.com> wrote in message
>news:o6********************@twister.tampabay.rr. com...
>
>>i have the contents of an html page stored within a variable. i would>
>like
>
>>to parse out the value of the TITLE tag,
>>ie..<TITLE>this_value_is_what_i_want</TITLE>
>>
>> String title=null;
>> int startidx=0, endidx=0;
>>
>> startidx = urlContent.indexOf("<TITLE>");
>> startidx += 7;
>> endidx = urlContent.indexOf("</TITLE>");
>> title = urlContent.substring(startidx, endidx);
>> System.out.println(title); //this doesn't work for me. generates an>
>out
>
>>of bounds err msg.
>>
>>is there a way of doing this with java2...trying to stay away from xml>
>since
>
>>it's new to me.
>>
>>thanks
>>
>>
>
>
>


Jul 17 '05 #6

P: n/a
nos
ok, you have to check the particular html page,
but it is perfectly fine to have "<title>" on one line and "</title>" on
another line.

I did not mean to imply that it will usually be lower case, but your program
should be able to handle both

"Scaramouche" <sp*******@forgetIt.com> wrote in message
news:JY**************@nwrddc03.gnilink.net...
my variable (htmlContent) contains the source of an entire web page. as i
go through it i get the output below, if i try and assign the value i'm
after to a string variable i would get a null pointer exception.

======
while ((htmlContent=input.readLine())!= null)
{
System.out.println(htmlContent);
startIdx = htmlContent.indexOf("<title>");
startIdx += 7;
endIdx = htmlContent.indexOf("</title>");
}
// String myTitle = htmlContent.substring(startIdx, endIdx);
System.out.println(startIdx + " " + endIdx);
------output------
6 -1
======

not sure what i'm doing wrong.

"nos" <no*@nospam.com> wrote in message
news:Yndvb.195123$mZ5.1451628@attbi_s54...
i wonder if it might be something about an imbedded <cr><lf>

"Greg" <sp*******@noThanks.com> wrote in message
news:M3*****************@news.nnrp.ca...
Hmm, strange. I compiled and ran your code and it worked just fine.

Here's the exact program I used:

public class Tester {

public static void main (String[] args) {
String urlContent = "<TITLE>this_value_is_what_i_want</TITLE>"; String title=null;
int startidx=0, endidx=0;

startidx = urlContent.indexOf("<TITLE>");
startidx += 7;
endidx = urlContent.indexOf("</TITLE>");
System.out.println("startidx: " + startidx + " endidx: " + endidx);
title = urlContent.substring(startidx, endidx);
System.out.println(title);
}
}
And here's the output:

startidx: 7 endidx: 32

this_value_is_what_i_want
Seems OK to me.


Scaramouche wrote:

> thank you for taking the time to try and help out.
> i thought the same thing and when i checked, endidx did contain a -1.
this
> is somewhat confusing since the spelling and case of the closing

(</TITLE>)
> tag is correct, i thought the slash might be throwing it off but
since
it's
> a string i don't think that's it.
> thanks again!
>
> "nos" <no*@nospam.com> wrote in message
> news:98avb.258761$Tr4.806047@attbi_s03...
>
>>i would suggest you first check the result of the

urlContent.indexOf() >>method invocations to see if you are getting -1 or null or whatever
>>(some html pages use lower case)
>>
>>"Scaramouche" <sp*************@forgetit.com> wrote in message
>>news:o6********************@twister.tampabay.rr. com...
>>
>>>i have the contents of an html page stored within a variable. i

would >>
>>like
>>
>>>to parse out the value of the TITLE tag,
>>>ie..<TITLE>this_value_is_what_i_want</TITLE>
>>>
>>> String title=null;
>>> int startidx=0, endidx=0;
>>>
>>> startidx = urlContent.indexOf("<TITLE>");
>>> startidx += 7;
>>> endidx = urlContent.indexOf("</TITLE>");
>>> title = urlContent.substring(startidx, endidx);
>>> System.out.println(title); //this doesn't work for me.
generates an >>
>>out
>>
>>>of bounds err msg.
>>>
>>>is there a way of doing this with java2...trying to stay away from xml >>
>>since
>>
>>>it's new to me.
>>>
>>>thanks
>>>
>>>
>>
>>
>>
>
>



Jul 17 '05 #7

P: n/a
"nos" <no*@nospam.com> wrote in message news:<%ievb.200465$ao4.710467@attbi_s51>...
ok, you have to check the particular html page,
but it is perfectly fine to have "<title>" on one line and "</title>" on
another line.

I did not mean to imply that it will usually be lower case, but your program
should be able to handle both

"Scaramouche" <sp*******@forgetIt.com> wrote in message
news:JY**************@nwrddc03.gnilink.net...
my variable (htmlContent) contains the source of an entire web page. as i
go through it i get the output below, if i try and assign the value i'm
after to a string variable i would get a null pointer exception.

======
while ((htmlContent=input.readLine())!= null)
{
System.out.println(htmlContent);
startIdx = htmlContent.indexOf("<title>");
startIdx += 7;
endIdx = htmlContent.indexOf("</title>");
}
// String myTitle = htmlContent.substring(startIdx, endIdx);
System.out.println(startIdx + " " + endIdx);
------output------
6 -1
======

not sure what i'm doing wrong.

"nos" <no*@nospam.com> wrote in message
news:Yndvb.195123$mZ5.1451628@attbi_s54...
i wonder if it might be something about an imbedded <cr><lf>

"Greg" <sp*******@noThanks.com> wrote in message
news:M3*****************@news.nnrp.ca...
> Hmm, strange. I compiled and ran your code and it worked just fine.
>
> Here's the exact program I used:
>
> public class Tester {
>
> public static void main (String[] args) {
> String urlContent = "<TITLE>this_value_is_what_i_want</TITLE>"; > String title=null;
> int startidx=0, endidx=0;
>
> startidx = urlContent.indexOf("<TITLE>");
> startidx += 7;
> endidx = urlContent.indexOf("</TITLE>");
> System.out.println("startidx: " + startidx + " endidx: " + endidx); > title = urlContent.substring(startidx, endidx);
> System.out.println(title);
> }
> }
>
>
> And here's the output:
>
> startidx: 7 endidx: 32
>
> this_value_is_what_i_want
>
>
> Seems OK to me.
>
>
>
>
> Scaramouche wrote:
>
> > thank you for taking the time to try and help out.
> > i thought the same thing and when i checked, endidx did contain a -1.
this > > is somewhat confusing since the spelling and case of the closing (</TITLE>) > > tag is correct, i thought the slash might be throwing it off but since
it's > > a string i don't think that's it.
> > thanks again!
> >
> > "nos" <no*@nospam.com> wrote in message
> > news:98avb.258761$Tr4.806047@attbi_s03...
> >
> >>i would suggest you first check the result of the urlContent.indexOf() > >>method invocations to see if you are getting -1 or null or whatever
> >>(some html pages use lower case)
> >>
> >>"Scaramouche" <sp*************@forgetit.com> wrote in message
> >>news:o6********************@twister.tampabay.rr. com...
> >>
> >>>i have the contents of an html page stored within a variable. i would > >>
> >>like
> >>
> >>>to parse out the value of the TITLE tag,
> >>>ie..<TITLE>this_value_is_what_i_want</TITLE>
> >>>
> >>> String title=null;
> >>> int startidx=0, endidx=0;
> >>>
> >>> startidx = urlContent.indexOf("<TITLE>");
> >>> startidx += 7;
> >>> endidx = urlContent.indexOf("</TITLE>");
> >>> title = urlContent.substring(startidx, endidx);
> >>> System.out.println(title); //this doesn't work for me. generates
an > >>
> >>out
> >>
> >>>of bounds err msg.
> >>>
> >>>is there a way of doing this with java2...trying to stay away from xml > >>
> >>since
> >>
> >>>it's new to me.
> >>>
> >>>thanks
> >>>
> >>>
> >>
> >>
> >>
> >
> >
>



Use Java regular expression, java.util.regex package, with MULTILINE
option. Don't parse specific string. Parse the whole document with a
single breath.
Jul 17 '05 #8

P: n/a
hiwa,
i'm not familiar with that package (java.util.regex), thus i'll do some
research on it.
thank you.

"hiwa" <HG******@nifty.ne.jp> wrote in message
news:68**************************@posting.google.c om...
"nos" <no*@nospam.com> wrote in message

news:<%ievb.200465$ao4.710467@attbi_s51>...
ok, you have to check the particular html page,
but it is perfectly fine to have "<title>" on one line and "</title>" on
another line.

I did not mean to imply that it will usually be lower case, but your program should be able to handle both

"Scaramouche" <sp*******@forgetIt.com> wrote in message
news:JY**************@nwrddc03.gnilink.net...
my variable (htmlContent) contains the source of an entire web page. as i go through it i get the output below, if i try and assign the value i'm after to a string variable i would get a null pointer exception.

======
while ((htmlContent=input.readLine())!= null)
{
System.out.println(htmlContent);
startIdx = htmlContent.indexOf("<title>");
startIdx += 7;
endIdx = htmlContent.indexOf("</title>");
}
// String myTitle = htmlContent.substring(startIdx, endIdx);
System.out.println(startIdx + " " + endIdx);
------output------
6 -1
======

not sure what i'm doing wrong.

"nos" <no*@nospam.com> wrote in message
news:Yndvb.195123$mZ5.1451628@attbi_s54...
> i wonder if it might be something about an imbedded <cr><lf>
>
> "Greg" <sp*******@noThanks.com> wrote in message
> news:M3*****************@news.nnrp.ca...
> > Hmm, strange. I compiled and ran your code and it worked just fine. > >
> > Here's the exact program I used:
> >
> > public class Tester {
> >
> > public static void main (String[] args) {
> > String urlContent =

"<TITLE>this_value_is_what_i_want</TITLE>";
> > String title=null;
> > int startidx=0, endidx=0;
> >
> > startidx = urlContent.indexOf("<TITLE>");
> > startidx += 7;
> > endidx = urlContent.indexOf("</TITLE>");
> > System.out.println("startidx: " + startidx + " endidx: " +
endidx);
> > title = urlContent.substring(startidx, endidx);
> > System.out.println(title);
> > }
> > }
> >
> >
> > And here's the output:
> >
> > startidx: 7 endidx: 32
> >
> > this_value_is_what_i_want
> >
> >
> > Seems OK to me.
> >
> >
> >
> >
> > Scaramouche wrote:
> >
> > > thank you for taking the time to try and help out.
> > > i thought the same thing and when i checked, endidx did contain

a -1.
this
> > > is somewhat confusing since the spelling and case of the closing

(</TITLE>)
> > > tag is correct, i thought the slash might be throwing it off but

since
it's
> > > a string i don't think that's it.
> > > thanks again!
> > >
> > > "nos" <no*@nospam.com> wrote in message
> > > news:98avb.258761$Tr4.806047@attbi_s03...
> > >
> > >>i would suggest you first check the result of the

urlContent.indexOf()
> > >>method invocations to see if you are getting -1 or null or
whatever > > >>(some html pages use lower case)
> > >>
> > >>"Scaramouche" <sp*************@forgetit.com> wrote in message
> > >>news:o6********************@twister.tampabay.rr. com...
> > >>
> > >>>i have the contents of an html page stored within a variable.

i would
> > >>
> > >>like
> > >>
> > >>>to parse out the value of the TITLE tag,
> > >>>ie..<TITLE>this_value_is_what_i_want</TITLE>
> > >>>
> > >>> String title=null;
> > >>> int startidx=0, endidx=0;
> > >>>
> > >>> startidx = urlContent.indexOf("<TITLE>");
> > >>> startidx += 7;
> > >>> endidx = urlContent.indexOf("</TITLE>");
> > >>> title = urlContent.substring(startidx, endidx);
> > >>> System.out.println(title); //this doesn't work for me.

generates
an
> > >>
> > >>out
> > >>
> > >>>of bounds err msg.
> > >>>
> > >>>is there a way of doing this with java2...trying to stay away
from xml
> > >>
> > >>since
> > >>
> > >>>it's new to me.
> > >>>
> > >>>thanks
> > >>>
> > >>>
> > >>
> > >>
> > >>
> > >
> > >
> >
>
>


Use Java regular expression, java.util.regex package, with MULTILINE
option. Don't parse specific string. Parse the whole document with a
single breath.

Jul 17 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.