473,327 Members | 1,979 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Parsing JavaScript from HTML file

I want to parse a HTML file in Java which has JavaScript also in it. I want to fetch the data of Java Script tag also. The tag is SCRIPT. Please help with suggestions / solutions.

I have tried using Java HTMLEditorKit API but it does not work for SCRIPT tag/

Regards,
Sagar
Mar 25 '08 #1
4 1799
JosAH
11,448 Expert 8TB
I want to parse a HTML file in Java which has JavaScript also in it. I want to fetch the data of Java Script tag also. The tag is SCRIPT. Please help with suggestions / solutions.

I have tried using Java HTMLEditorKit API but it does not work for SCRIPT tag/

Regards,
Sagar
I haven't tried it but define an HTML.UnknownTag with the "SCRIPT" as the
identifier. Then try to get an HTMLDocument.Iterator from the HTMLDocument
given this tag. Given this iterator you should be able to get the text between tags
of the SCRIPT type.

kind regards,

Jos
Mar 25 '08 #2
You can use Jericho HTML Parser.
http://jerichohtml.sourceforge.net/doc/index.html

The author says that the ASP, JSP, PSP, PHP and Mason server tags are explicitly recognised by the parser
Mar 26 '08 #3
I have tried using it Jos but it does not works.


<SCRIPT> document.write ( "<A onclick=\"OpenMSDigestWin()\" target=\"MSDigestWin\" HREF = \"" + msdigestLink + "dna_reading_frame=1&access_method=Accession+Numbe r&hide_protein_sequence=2&open_reading_frame=1&cov erage_map=0+138+13+32+14+38+6+25+14+5+12+77+12+ 34+13+4+14+97+9+23+27&search_cycle=1&accession_num =P02769\">22%</A>" );</SCRIPT>


I want to parse this code and want the text between anchor tag which I have made bold. I am trying using HTMLEditorKit.ParserCallback but it does not works for above piece of code.

Please help me with the solution / suggestions.

regards,
Mar 28 '08 #4
I have build my own methowd which parse the SCRIPT Tag. Using HTMLEditorKit.ParserCallback, I diverted the control flow whenever a SCRIPT tag occured.

This method would have handle any <SCRIPT> tag in a HTML file. Also< i wanted the <A> which is placed in SCRIPT tag which i am also able to get. thanks for every1 who contributed to my queriers.

Below is the code.

BufferedReader br;

boolean checkLineForScript = false;
String tempGlobalString = "";


String parsedScriptTag = "", tempString = "", fullScriptTag = "",
scriptStart = "", scriptEnd = "", str = "", actualDataToken = "";

int getScriptTagStartPosition = 0 , getScripttagEndPosition = -1,
tempPosition = -1, lastindex = 0, anchorTagEndPosition = -1,
anchorTagStartPosition = -1;

try
{
if(checkLineForScript == true)
{
getScriptTagStartPosition = tempGlobalString.indexOf("<SCRIPT");
lastindex = tempGlobalString.length();
getScripttagEndPosition = tempGlobalString.indexOf("</SCRIPT>");

if(getScriptTagStartPosition > getScripttagEndPosition)
{
fullScriptTag = tempGlobalString.substring(getScriptTagStartPositi on, lastindex);
in: while((scriptEnd = br.readLine()) != null)
{
getScripttagEndPosition = scriptEnd.indexOf("</SCRIPT>");
if(getScripttagEndPosition == -1)
{
fullScriptTag = fullScriptTag + scriptEnd;
continue in;
}
else
{
tempString = scriptEnd.substring(0,getScripttagEndPosition);
fullScriptTag = tempString + fullScriptTag;
anchorTagEndPosition = fullScriptTag.indexOf("</A>");
if(anchorTagEndPosition > -1)
{
anchorTagStartPosition = fullScriptTag.indexOf("\">");
actualDataToken = fullScriptTag.substring(anchorTagStartPosition+2,
anchorTagEndPosition);
//System.out.println(actualDataToken);
}
System.out.println(fullScriptTag);
break in;
}
}
}
else if(getScriptTagStartPosition < getScripttagEndPosition)
{
parsedScriptTag = tempGlobalString.substring(getScriptTagStartPositi on, getScripttagEndPosition);
anchorTagEndPosition = parsedScriptTag.indexOf("</A>");
if(anchorTagEndPosition > -1)
{
anchorTagStartPosition = parsedScriptTag.indexOf("\">");
actualDataToken = parsedScriptTag.substring(anchorTagStartPosition+2 ,
anchorTagEndPosition);
//System.out.println(actualDataToken);
}
System.out.println(parsedScriptTag);
}
}
else
{
out: while ((scriptStart = br.readLine()) != null)
{
getScriptTagStartPosition = scriptStart.indexOf("<SCRIPT");
lastindex = scriptStart.length();
getScripttagEndPosition = scriptStart.indexOf("</SCRIPT>");

if(getScriptTagStartPosition > getScripttagEndPosition)
{
fullScriptTag = scriptStart.substring(getScriptTagStartPosition, lastindex);
in: while((scriptStart = br.readLine()) != null)
{
getScripttagEndPosition = scriptStart.indexOf("</SCRIPT>");
if(getScripttagEndPosition == -1)
{
fullScriptTag = fullScriptTag + scriptStart;
continue in;
}
else
{
tempString = scriptStart.substring(0,getScripttagEndPosition);
fullScriptTag = tempString + fullScriptTag;
anchorTagEndPosition = fullScriptTag.indexOf("</A>");
if(anchorTagEndPosition > -1)
{
anchorTagStartPosition = fullScriptTag.indexOf("\">");
actualDataToken = fullScriptTag.substring(anchorTagStartPosition+2,
anchorTagEndPosition);
//System.out.println(actualDataToken);
}
//System.out.println(fullScriptTag);
break out;
}
}
}
else if(getScriptTagStartPosition < getScripttagEndPosition)
{
parsedScriptTag = scriptStart.substring(getScriptTagStartPosition, getScripttagEndPosition);
anchorTagEndPosition = parsedScriptTag.indexOf("</A>");
if(anchorTagEndPosition > -1)
{
anchorTagStartPosition = parsedScriptTag.indexOf("\">");
actualDataToken = parsedScriptTag.substring(anchorTagStartPosition+2
, anchorTagEndPosition);
//System.out.println(actualDataToken);
}
//System.out.println(parsedScriptTag);
break out;
}
}

lastindex = scriptStart.length();
str = scriptStart.substring(getScripttagEndPosition, lastindex);
tempPosition = str.indexOf("<SCRIPT");
lastindex = str.length();
}
if(tempPosition > 0)
{
checkLineForScript = true;
tempGlobalString = str.substring(tempPosition, lastindex);
}
else
checkLineForScript = false;
Mar 29 '08 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
11
by: Sven Neuberg | last post by:
Hi, I have been handed the task of updating and maintaining a web application, written in ASP and Javascript, that takes complex user inputs in HTML form and submits them to server-side ASP...
2
by: Esa | last post by:
Hi, I'm having problems with one strange web system where submitting an application and making queries about its handling status require a series of form submits and response parsing - all in...
5
by: Martin Walke | last post by:
Hi all, Can someone help me out here? I'm been using ASP and VBScript for some years but have just ventured into the realms of using server side Javascript and apart from hitting various...
4
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
0
by: bruce | last post by:
hi... it appears that i'm running into a possible problem with mechanize/browser/python rgarding the "select_form" method. i've tried the following and get the error listed: br.select_form(nr...
3
by: Rodrigo Meza | last post by:
Hello Everyone For a project I am working on, I need to retrieve links from html documents. The easy part is to obtain 'plain' links like <A HREF="http://site/path/document">, but when those...
2
by: hzgt9b | last post by:
I've written a simple javascript page that parses an XML file... (Actually I just modified the "Parsing an XML File" sample from http://www.w3schools.com/dom/dom_parser.asp) The page works great...
1
by: avpkills2002 | last post by:
I seem to be getting this weird problem in Internet explorer. I have written a code for parsing a XML file and displaying the output. The code works perfectly fine with ffx(Firefox).However is not...
1
by: Philip Semanchuk | last post by:
On Oct 12, 2008, at 5:25 AM, S.Selvam Siva wrote: Selvam, You can try to find them yourself using string parsing, but that's difficult. The closer you want to get to "perfect" at finding URLs...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.