473,586 Members | 2,639 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Using Regular Expression to extract specific data

10 New Member
Hi everyone,
this is about my one project, i was planning to do like this: Firstly i printed out the source code of one html page on the DOS window, and then use PERL Regular Expression to extract specifc data from the source code , at last save these data in MYSQL tables. Does this work?
Jul 19 '07 #1
1 2262
numberwhun
3,509 Recognized Expert Moderator Specialist
Um...sure!? Perl's motto is TMTOWTDI, and believe me, there are many ways to do anything you want to, but some are a little more streamlined that others.

Please remember that when posting a question (which in this forum typically involves code), be sure to post what you have written so far and tried, so that those trying to help you have something to go on. If you don't, then we can only assume you haven't tried anything yet and you may not get too many replies as this isn't a code writing service.

That said, there are modules to assist you in not only downloading the HTML data, but also with the parsing of the HTML to break it out into its tree so you can grab what you need. Once you have the data you need, then you can use another module (DBI) to plug it into your database.

Regards,

Jeff
Jul 19 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

9
17225
by: gary | last post by:
I want to pick all intergers and decimal numbers out of a string. Would this be the most correct regular expression to use? "\d+\.?\d*"
3
2136
by: EFP | last post by:
Can anyone help me with a simple regular expression problem. All that I want to do is take a list of known data and extract a particular section of the string to form a new list. Here is my code snipet: my $fh = new FileHandle('c:\Units.txt') ; ## sucks up my work file my @lines1 = <$fh>; ## assigns the filehandler contents to @lines1...
4
284
by: Kristian | last post by:
I have a program which recives a string with an address. The string has no spesific format and I would like to extract the entrance character. some rules for the regular expression: one char, standing alone, can or canot contain space string eksample: "flowstreet 2 B"
11
2126
by: lucky | last post by:
hi, i got file which contains "----------------" in a line. the line only contains this data as a saperation. using regular expression i want to i detify the line contains that data and replace with spaces. if anyone has any idea,solution or link plz do share with me. thans in advt. Lucky
3
6022
by: ksr | last post by:
Hi, I am looking for a regular expression that would extract UNC paths from a given string and place that inside a href. Currently the expression fails if there is a space in the path.. eg. \\server\my doc.doc is there a regular expression to do this? TIA,
3
5033
by: steve551979 | last post by:
Hello, I am having some difficulty creating a regular expression for the following string situation in html. I want to find a table that has specific text in it and then extract the html just for that immediate table. the string would look something like this: ....stuff here...
0
2039
by: napolpie | last post by:
DISCUSSION IN USER nappie writes: Hello, I'm Peter and I'm new in python codying and I'm using parsying to extract data from one meteo Arpege file. This file is long file and it's composed by word and number arguments like this: GRILLE EURAT5 Coin Nord-Ouest : 46.50/ 0.50 Coin Sud-E Hello, I'm Peter and I'm new in python codying and I'm...
14
4972
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any text</aClosingTag> I need a Regex that will get all of the text between the html tags above (the html tags are random and i do not know them before...
3
4098
rizwan6feb
by: rizwan6feb | last post by:
I am trying to extract php code from a php file (php file also contains html, css and javascript code). I am using the following regex for this <\?*?\?> but this doesn't cater quotation marks (single and double quotes) and comments, i mean how can i skip php tags inside a string (and comments). Please have a look at the following code ...
0
7912
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7839
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8202
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8338
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7959
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6614
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5390
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3865
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1180
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.