473,473 Members | 2,159 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Extract all Img Src tags using Java Regular Expression

1 New Member
Hi,

I have a huge string containing html tags, some of these tags being <img src="URL"> ones. I need to extract the urls from all the occurences of these tags in the input string. This is what I am doing:

Expand|Select|Wrap|Line Numbers
  1. Pattern p=null;
  2. Matcher m= null;
  3. String word0= null;
  4. String word1= null;
  5.  
  6. p= Pattern.compile(".*<img[^>]*src=\"([^\"]*)",Pattern.CASE_INSENSITIVE);
  7. m= p.matcher(txt);
  8. while (m.find())
  9.      {
  10.     word0=m.group(1);
  11.     System.out.println(word0.toString());
  12.      }

The problem with this code is that this prints only the last URL. For example if there are 5 <img src="URL"> tags, this code prints only the URL contained withn the 5th< img src> tag. Please tell me how to solve this.

Thanking you in advance
Jan 23 '08 #1
4 21738
BigDaddyLH
1,216 Recognized Expert Top Contributor
Usually when someone wants to extract tags from XML or HTML, it makes sense to parse the input using a proper XML/HTML parser. Have you considered that? For example, what about HTML comments -- they may contain what looks like an image tag...
Jan 23 '08 #2
adeel809
1 New Member
while (m.find()) change to if (m.find()) you got first img tag scr
and change while to for you get any you want.
Jan 31 '13 #3
Anas Mosaad
185 New Member
Because RegEx matches the biggest match. How that is related to your case? It's the .* at the beginning of your expression. It gets the largest match that is all the document until the start tag of your last img tag. If you moved that to the end of your expression, it will match only the first one. If you want to get all images, just drop it to have something like this:
Expand|Select|Wrap|Line Numbers
  1. p = Pattern.compile("<img[^>]*src=[\"']([^\"^']*)",
  2.                 Pattern.CASE_INSENSITIVE);
  3.  
P.S: I added support to ' as well as " as valid container of the src URL.

@adeel809, if will match only once. He will never be able to get all images.

@BigDaddyLH, this is a very simple case that doesn't require all these sophistications.
Feb 2 '13 #4
ibilal
1 New Member
Just change
word0=m.group(1);
to
word0=m.group();
May 19 '17 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Tim Smith | last post by:
I am looking to extract form element values from html, more generally I have a substring that identifies the beginning of a value and a string that identifies the end of value and I need to extract...
3
by: jarod1701 | last post by:
Hi, I'm currently trying to create a regular expression that can extract certain elements from a url. The url will be of the following form: http://user:pass@www.sitename.com I want a...
3
by: ksr | last post by:
Hi, I am looking for a regular expression that would extract UNC paths from a given string and place that inside a href. Currently the expression fails if there is a space in the path.. eg....
3
by: James D. Marshall | last post by:
The issue at hand, I believe is my comprehension of using regular expression, specially to assist in replacing the expression with other text. using regular expression (\s*) my understanding is...
7
by: teo | last post by:
hallo, I need to extract a word and few text that precedes and follows it (about 30 + 30 chars) from a long textual document. Like the description that Google returns when it has found a...
1
by: roberto321 | last post by:
Hi Guys, I was wondering if someone could help me out with the following requirements <mydocument> <div id="other"> <a href="linkother">linkother</a> </div> <div id="hello">
9
by: trihanhcie | last post by:
Hi, I would like to extract the text in an HTML file For the moment, I'm trying to get all text between <tdand </td>. I used a regular expression because i don't know the "format between...
14
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any...
3
rizwan6feb
by: rizwan6feb | last post by:
I am trying to extract php code from a php file (php file also contains html, css and javascript code). I am using the following regex for this <\?*?\?> but this doesn't cater quotation marks...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.