469,327 Members | 1,242 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,327 developers. It's quick & easy.

Extract all Img Src tags using Java Regular Expression

Hi,

I have a huge string containing html tags, some of these tags being <img src="URL"> ones. I need to extract the urls from all the occurences of these tags in the input string. This is what I am doing:

Expand|Select|Wrap|Line Numbers
  1. Pattern p=null;
  2. Matcher m= null;
  3. String word0= null;
  4. String word1= null;
  5.  
  6. p= Pattern.compile(".*<img[^>]*src=\"([^\"]*)",Pattern.CASE_INSENSITIVE);
  7. m= p.matcher(txt);
  8. while (m.find())
  9.      {
  10.     word0=m.group(1);
  11.     System.out.println(word0.toString());
  12.      }

The problem with this code is that this prints only the last URL. For example if there are 5 <img src="URL"> tags, this code prints only the URL contained withn the 5th< img src> tag. Please tell me how to solve this.

Thanking you in advance
Jan 23 '08 #1
4 19708
BigDaddyLH
1,216 Expert 1GB
Usually when someone wants to extract tags from XML or HTML, it makes sense to parse the input using a proper XML/HTML parser. Have you considered that? For example, what about HTML comments -- they may contain what looks like an image tag...
Jan 23 '08 #2
while (m.find()) change to if (m.find()) you got first img tag scr
and change while to for you get any you want.
Jan 31 '13 #3
Anas Mosaad
185 128KB
Because RegEx matches the biggest match. How that is related to your case? It's the .* at the beginning of your expression. It gets the largest match that is all the document until the start tag of your last img tag. If you moved that to the end of your expression, it will match only the first one. If you want to get all images, just drop it to have something like this:
Expand|Select|Wrap|Line Numbers
  1. p = Pattern.compile("<img[^>]*src=[\"']([^\"^']*)",
  2.                 Pattern.CASE_INSENSITIVE);
  3.  
P.S: I added support to ' as well as " as valid container of the src URL.

@adeel809, if will match only once. He will never be able to get all images.

@BigDaddyLH, this is a very simple case that doesn't require all these sophistications.
Feb 2 '13 #4
ibilal
1
Just change
word0=m.group(1);
to
word0=m.group();
May 19 '17 #5

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

1 post views Thread by Tim Smith | last post: by
3 posts views Thread by ksr | last post: by
3 posts views Thread by James D. Marshall | last post: by
9 posts views Thread by trihanhcie | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by Purva khokhar | last post: by
reply views Thread by haryvincent176 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.