473,406 Members | 2,954 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

removing tags from html file

12
when i am reading the html file i am getting da tags .after storing it as a text file also same tags are stored but when i want to extract a particular sentence i am getting da same tags in the result whereas i don't want them .how can i get my result without the tags.
with regards
Apr 16 '08 #1
5 1385
sukatoa
539 512MB
when i am reading the html file i am getting da tags .after storing it as a text file also same tags are stored but when i want to extract a particular sentence i am getting da same tags in the result whereas i don't want them .how can i get my result without the tags.
with regards
Can you post the content in that text file?

Update us,
sukatoa
Apr 16 '08 #2
nomad
664 Expert 512MB
when i am reading the html file i am getting da tags .after storing it as a text file also same tags are stored but when i want to extract a particular sentence i am getting da same tags in the result whereas i don't want them .how can i get my result without the tags.
with regards
copy and paste the contents in text application like notepad. You then can copy and paste that content back to your application that you are using.

nomad
Apr 16 '08 #3
litun
12
Can you post the content in that text file?

Update us,
sukatoa
<h1>this document contains information about me cse students. students are now doing their project.
they are<I> working </I>in different individual project. generally the <U>document</U> is just like a progess report .</h>
i want to get the word working which is in itatics and want 2 retieve the sentence containing that particular italic word.
Apr 17 '08 #4
sukatoa
539 512MB
<h1>this document contains information about me cse students. students are now doing their project.
they are<I> working </I>in different individual project. generally the <U>document</U> is just like a progess report .</h>
i want to get the word working which is in itatics and want 2 retieve the sentence containing that particular italic word.
You may use split()... for example,
<I> working </I>
You may split the whole string with regex "<I>" and </I> ( twice to execute that function ).... and now it returns a splitted Strings ( Now Array of Strings ).....

You can now search for "working".... that string should be in 2nd element of the String array..... and so on....


Base on your example,


Please correct me if im wrong,
sukatoa
Apr 17 '08 #5
JosAH
11,448 Expert 8TB
There's no need to read that file and fiddle with those tags yourself. Have a look
at the HTMLEditorKit. It can create an HTMLDocument document
for you. Given this document and a Reader the kit can produce the content
in the document. The document can give you an iterator over a certain HTML.Tag
You just want to iterate (i.e. get the content) of the <i> ... </i> tag. Read the API
documentation for these classes and interfaces.

kind regards,

Jos
Apr 17 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: chotiwallah | last post by:
i have a little database driven content managment system. people can load up html-docs. some of them use ms word as their html-editor, which resultes in loads of "class" and "style" attributes -...
11
by: rajarao | last post by:
hi I want to remove the content embedded in <script> and </script> tags submitted via text box. My java script should remove the content embedded between <script> and </script> tag. my current...
2
by: Raja Kannan | last post by:
Is there a way to remove text portion from the HTML keeping the HTML Tags using the browser, say javascript RegEx or something ? I have seen lot of examples removing HTML tags to get the text...
16
by: graham.reeds | last post by:
I am updating a website that uses a countdown script embedded on the page. When the page is served the var's are set to how long the countdown has left in minutes and seconds, but the rest of the...
3
by: keith | last post by:
Hi, I'm using WebClient to retrieve the contents of a particular page. I would like to get a string containing only the page's text and no html markup. How can I do this? Is there a class to...
2
by: Nathan Sokalski | last post by:
I have a section in my ASP.NET code where I have an HTML unordered list. Visual Studio keeps removing the closing list item tags, except for the last list item. In other words, Visual Studio makes...
19
by: thisis | last post by:
Hi All, i have this.asp page: <script type="text/vbscript"> Function myFunc(val1ok, val2ok) ' do something ok myFunc = " return something ok" End Function </script>
3
by: sebzzz | last post by:
Hi, I'm doing a little script with the help of the BeautifulSoup HTML parser and uTidyLib (HTML Tidy warper for python). Essentially what it does is fetch all the html files in a given...
3
by: alimsdb | last post by:
By using Formatter.pm in perl we can remove all html tags. But I want to keep tag <a href and remove all other tags. Can any body help me to change Formatter.pm file to do this task. sub a_start...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.