473,387 Members | 1,572 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

REGEX Question: Get filenames and alt-tags from html

1
I have 2 problems with regular expressions:

1) I want to get the image's filenames from a website:

e.g. src="http://bytes.com/images/world.jpg", src=images/world.jpg and src='images/world.jpg' should result in "world"

Expand|Select|Wrap|Line Numbers
  1. $imgfilenames=preg_match_all('/^[0-9A-Za-z_ ](.jpg|.gif|.JPG|.GIF|.png|.PNG)$/i', $html, $matches); 
2) I want to get the alt descriptions from images in a website

Expand|Select|Wrap|Line Numbers
  1. $alttags=preg_match_all('/<img[^>]*alt="([^"]*)"/i', $html, $matches); 
But both my expressions don't work because I don't really get it ;-).
May 28 '10 #1
1 2028
Atli
5,058 Expert 4TB
Your first expression should work, except that the first character class is defined as just a single character. If you want a class to cover more than one character, it needs to be trailed by one of:
  • * - Any number of characters, (including none).
  • ? - 0 or 1 characters.
  • + - One or more characters.
  • {high,low} - A specific range of characters.
For example:
Expand|Select|Wrap|Line Numbers
  1. [a-zA-Z0-9]+
This matches one or more alphanumeric character. If you remove the + it only matches one (which is what your expression does).

See regular-expressions.info for more details.

P.S.
It is usually better to parse HTML documents using a DOM parser, or one of the XML parsers. - Regular expressions are poorly suited to parse large chunks of HTML, because HTML is not a "regular" format.
May 29 '10 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: andrea.gavana | last post by:
Hello NG, I'm quite new to Python and I don't know if this is a FAQ (I can't find it) or an obvious question. I'm using the RE module in python, and I would like to be able to contruct something...
4
by: engwar1 | last post by:
Not sure where to ask this. Please suggest another newsgroup if this isn't the best place for this question. I'm new to both vb.net and regex. I need a regular expression that will validate what...
1
by: JM | last post by:
Hi, I am not sure if this is the place to post a REGEX question if not, please indicate me where i can post it?. My question: Given a string like: "Boston. MA. Holidays" I need to define the...
6
by: Sačo Zagoranski | last post by:
Hi, could someone help me putting together a regex expression for my problem? I need my search filter to treat spaces and commas in the query the same way no matter how many there are... ...
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
2
by: Jeff Williams | last post by:
I have a list of file names I need to parse and check if they match a valid expression. I want this to work like you were listing a directory. ie *.doc dhows all doc *.* shows all...
2
by: GS | last post by:
How can one avoid capturing leading empty or blank lines? the data I deal with look like this "will be paid on the dates you specified. xyz supplier amount: $100.52 when: September 07,...
4
by: pedrito | last post by:
I have a regex question and it never occurred to me to ask here, until I saw Jesse Houwing's quick response to Phil for his Regex question. I have some filenames that I'm trying to parse out of...
3
by: Peter Proost | last post by:
Hi group first of all I need to say that I almost never use regex hence my question may be stupid. I'm using regex to find all words that start with an @ in a string. But the regex that I figured...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.