473,787 Members | 2,938 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Simple question on string extraction

Hi,

I'm having to modify a PHP script even though I have little knowledge of PHP
itself. The script extracts specific strings from an html file, and I need
to it extract some further information.

Specifically, each file represents an article written by an author. The
author's name is typically preceded by a 'By' or a 'by', then it goes on
till there's a carriage return.

So for example, the file might contain something like this:
The Need For Regeneration

by <b>John Smith</b>

We have seen the waste that has been produced....

(rest of article)
or
How To Make Lots and Lots of Money Writing PHP

by The Supreme Coder

The first thing you need to know about making money is...

(rest of article)
So I need code that will start searching the file from the beginning for the
words 'by ' or 'By ', then grab everything that follows that until it gets
to a new line and assign that to a variable. In the examples I have given
above, it would grab '<b>John Smith</b>' and 'The Supreme Coder'. I've seen
a function called preg_match which might do the job, but it uses regular
expressions which I have little knowledge of.

Would any person be so kind as to post what arguments I would need to call
this function with?

TIA,

--
Akin

aknak at aksoto dot idps dot co dot uk
Aug 30 '05 #1
1 2349
I've been doing something similar myself, but wanted to avoid the chance of
getting an accidental early string match.

The strpos() function will let you locate a string within another string
(I'm assuming here that you've got the whole html page as a single string),
and, if required, you can specify a starting position.

So something like

$p1 = strpos($rec,"</header>");

would let you get beyond the html header, then

$p2 = strpos($rec," by ",$p1);

would let you find the first occurrence of " by " beyond position $p1 (or
maybe "by<", depending whether there's a space there or not)

then you can search for <b> and </b> in the same way, adjust your sums a
bit, and get

$author = substr($rec,$st art,$length);

where $start will probably be something like $p1+3 and $length something
like $p2-$p1-2, or whatever it turns out to be, and whichever way round $p1
and $p2 end up.
Hope this helps. As an alternative you might try the explode function using
" by " as the string to split $rec on, and then check each array element.


"Epetruk" <no****@blackho le.com> wrote in message
news:3n******** ****@individual .net...
Hi,

I'm having to modify a PHP script even though I have little knowledge of
PHP
itself. The script extracts specific strings from an html file, and I need
to it extract some further information.

Specifically, each file represents an article written by an author. The
author's name is typically preceded by a 'By' or a 'by', then it goes on
till there's a carriage return.

So for example, the file might contain something like this:
The Need For Regeneration

by <b>John Smith</b>

We have seen the waste that has been produced....

(rest of article)
or
How To Make Lots and Lots of Money Writing PHP

by The Supreme Coder

The first thing you need to know about making money is...

(rest of article)
So I need code that will start searching the file from the beginning for
the
words 'by ' or 'By ', then grab everything that follows that until it gets
to a new line and assign that to a variable. In the examples I have given
above, it would grab '<b>John Smith</b>' and 'The Supreme Coder'. I've
seen
a function called preg_match which might do the job, but it uses regular
expressions which I have little knowledge of.

Would any person be so kind as to post what arguments I would need to call
this function with?

TIA,

--
Akin

aknak at aksoto dot idps dot co dot uk

Aug 30 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2251
by: Reckless | last post by:
I've got a file with this in it: The data I'd like extracted is within the quotes: Some string data I can read the file out and extract (using string positions) the data I'd like but it would neater if I use a regular expression. Only problem is I've never seen a working example of this type of extraction and am completely new to PHP.
1
3903
by: Xah Lee | last post by:
# strings can be joined by +. print "this" + " that" # string can be multiplied print "this" *5 # substring extraction is done by appending a bracket # with begin and ending index a="this and that" print a
31
14351
by: da Vinci | last post by:
OK, this has got to be a simple one and yet I cannot find the answer in my textbook. How can I get a simple pause after an output line, that simply waits for any key to be pressed to move on? Basically: "Press any key to continue..." I beleive that I am looking for is something along the lines of a....
1
7029
by: Adam Parkin | last post by:
Hello all, I'm trying to write a function which given a std::string parses the string by breaking the sentance up by whitespace (\t, ' ', \n) and returns the result as a vector of strings. Here's what I have so far: std::vector<std::string> tokenize (std::string foo) { std::istringstream s (foo); std::vector <std::string> v; std::string tok;
2
2975
by: Jason Huang | last post by:
Hi, Would someone show me how to do the data extraction to Excel in ASP.Net using C# web form? I am not familiar with VB, so I am asking someone to help me out! Any help will be appreciated. Jason
2
37550
by: tonytony24 | last post by:
Hi All: I was wondering if there's a simple way to call a MS Access Module through either Command Prompt MS Script any other way... Thanks for the response.
1
2338
by: James Lehman | last post by:
Hello. I want to write a program that reads AutoCAD shape (font) files. They are written with the convention that hexadecimal values have a leading zero and decimal values do not. All numbers can be negative or positive. All numbers can be stored in a single byte. My questions is: can ifstream extraction into an int see the leading zero and interpret the number as hex and can I ask the ifstream, right after the extraction, if the value I...
6
5013
by: Robbie Hatley | last post by:
I'm maintaining a software project with 134 C++ files, some of them huge (as much as 10,000 lines each), and very few prototypes. The author's attitude towards prototypes was like this: Prototypes are only good for headers to be included in other files. For functions which call each other inside one file, such as A calls B which calls C and D, just define the functions in order D, C, B, A, and you'll never need prototypes.
5
2297
by: TheSteph | last post by:
Hi, I'm new to Regex.. Could someone show me how I can extract substring enclosed in ? Example :
0
9655
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10363
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9964
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7517
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5398
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5535
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4069
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.