473,396 Members | 1,853 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

extract character strings from displayed web page.

I'm an experienced C programmer, but I have never worked with any sort
of internet programming. I would like to write a program to search for
certain character strings in a currently displayed web page, and then
get the string that immediatly follows the one that I searched for. It
seems like an easy thing to do, after all the stuff that I want is
staring me right in the face, but I have no idea where that stuff is
stored or how to access it.
Thanks

Ron
Nov 13 '05 #1
7 3116
ot****@cs.com (A Causal) wrote:
I'm an experienced C programmer, but I have never worked with any sort
of internet programming. I would like to write a program to search for
certain character strings in a currently displayed web page, and then
get the string that immediatly follows the one that I searched for. It
seems like an easy thing to do, after all the stuff that I want is
staring me right in the face, but I have no idea where that stuff is
stored or how to access it.


As this is highly OS/application dependent, you should ask this in a
newsgroup dedicated to that, as comp.lang.c is only about portable
ISO-C. See http://www.angelfire.com/ms3/bchambl...me_to_clc.html
Regards
--
Irrwahn
(ir*******@freenet.de)
Nov 13 '05 #2
Greetings.

In article <29**************************@posting.google.com >, A Causal
wrote:
I'm an experienced C programmer, but I have never worked with any sort
of internet programming. I would like to write a program to search for
certain character strings in a currently displayed web page, and then
get the string that immediatly follows the one that I searched for. It
seems like an easy thing to do, after all the stuff that I want is
staring me right in the face, but I have no idea where that stuff is
stored or how to access it.


Interfacing with web browsers or the http protocol is not something which is
built into C, so there is no standard answer to your query. It will depend
on your particular compiler, operating system, and/or whatever third-party
libraries you use. If you assume that the user has already saved the HTML
file to disk, however, then it's just a regular text file which C can
process.

Note that C isn't particularly well-suited for intensive text processing,
though; unless it's being integrated in a much larger C program, it would
be better and faster to write the sort of application you describe using
some regexp-based tool such as sed or perl.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #3
Tristan Miller <ps********@nothingisreal.com> spoke thus:
Note that C isn't particularly well-suited for intensive text processing,
though; unless it's being integrated in a much larger C program, it would
be better and faster to write the sort of application you describe using
some regexp-based tool such as sed or perl.


Why do you say that? (note that this is an honest question, not a challenge)

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 13 '05 #4
Greetings.

In article <bm**********@chessie.cirr.com>, Christopher Benson-Manica wrote:
Note that C isn't particularly well-suited for intensive text processing,
though; unless it's being integrated in a much larger C program, it would
be better and faster to write the sort of application you describe using
some regexp-based tool such as sed or perl.


Why do you say that? (note that this is an honest question, not a
challenge)


It's simply a question of specialization of tools. It's certainly possible
to drive in a nail using the blunt end of a screwdriver, though it would be
faster and less accident-prone to use a hammer. Likewise, building
applications (even small ones) which deal almost exclusively with text
processing is usually more efficient (with respect to development time and
ease of debugging, not necessarily execution speed) when using a language
specifically devoted to that task. A program to do regular expression
search-and-replacement on multiple files is literally four characters long
in sed (not counting the filenames and regular expressions themselves); the
corresponding program in C would necessarily be several lines long, even if
one used a third-party regexp library. You would need to include the
regexp and stdio headers, define the main function, declare a file pointer,
open each file in argv[] for reading (including error checking), loop
through each line of the file, do the regexp replacement, write out the new
line, close the file, and finally return from main. Sure, the compiled C
program might run a hundred times faster than the corresponding interpreted
sed or perl code, but if it's just a one-off program, you've just wasted
five minutes to write the C program plus 0.00001 seconds to run it versus
spending five seconds to write the sed program plus 0.001 seconds to run
it.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #5
Tristan Miller <ps********@nothingisreal.com> spoke thus:
It's simply a question of specialization of tools. It's certainly possible
to drive in a nail using the blunt end of a screwdriver, though it would be
faster and less accident-prone to use a hammer.


You've clearly never hit your thumb with a hammer ;) Seriously, on reading
your original post, I thought you were speaking of execution efficiency, which
you were not. No complaints from me in that case... Would you say, then,
that C is pretty good for text processing as far as execution efficiency is
concerned?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 13 '05 #6
Greetings.

In article <bm**********@chessie.cirr.com>, Christopher Benson-Manica wrote:
You've clearly never hit your thumb with a hammer ;) Seriously, on
reading your original post, I thought you were speaking of execution
efficiency, which
you were not. No complaints from me in that case... Would you say, then,
that C is pretty good for text processing as far as execution efficiency
is concerned?


Optimization for speed and memory use is compiler-dependent, but generally
speaking, yes, a well-written algorithm in compiled C will be faster at
running text processing applications than the same application executed in
interpreted sed. With C, you're simply "closer to the hardware", plus
there's no need to load in a potentially huge interpreter every time you
want to run your application.

In this day and age, however, you aren't going to gain that much in text
processing even if you do use C. The bottleneck in the application is more
likely to be the inherent sloth of I/O rather than inefficient code. I
work in natural-language processing, and I can attest that even those
researchers who routinely process text corpora ranging into the gigabytes
don't flinch at using high-level text- or logic-oriented languages like
Perl or Prolog to munge the data. We tend to use C more for plain old
number-crunching, as with the large co-occurrence matrices the
aforementioned mungers may produce.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #7
ot****@cs.com (A Causal) wrote:
# I'm an experienced C programmer, but I have never worked with any sort
# of internet programming. I would like to write a program to search for
# certain character strings in a currently displayed web page, and then
# get the string that immediatly follows the one that I searched for. It
# seems like an easy thing to do, after all the stuff that I want is
# staring me right in the face, but I have no idea where that stuff is
# stored or how to access it.

You'll need some library to open the socket and fetch the page; this is not
part of standard C. You'll also need to decide exactly what you mean by 'string'
and 'after' if you are fetching (as normal) an HTML page; you can also find
libraries to parse HTML if you need that. If you don't need to parse the HTML,
you can just read the socket stream with stdio and use state table or
strstr() or other such techniques to scan the input.

You can also use something other than C. Scripting languages can do this kind
of stuff in half a dozen lines.

--
Derk Gwen http://derkgwen.250free.com/html/index.html
What kind of convenience store do you run here?
Nov 13 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Logical | last post by:
I wanted to do: include('page.htm?id=12&foo=bar'); But since I can't (and don't want to make another seperate HTTP request with include('http://...')); I was wondering if there's a function...
9
by: Sharon | last post by:
hi, I want to extract a string from a file, if the file is like this: 1 This is the string 2 3 4 how could I extract the string, starting from the 10th position (i.e. "T") and...
5
by: rs | last post by:
I have a table with a timestamp field which contains the date and time. ie. 9/13/2004 9:10:00 AM. I would like to split this field into 2 fields, one with just the DATE portion ie 9/13/2004 and...
1
by: Kenneth McDonald | last post by:
I am going to demonstrate my complete lack of understanding as to going back and forth between character encodings, so I hope someone out there can shed some light on this. I have always...
1
by: Kevin Laurence | last post by:
Can someone help me with a character set/encoding problem? I am using a MySQL database with PHP to store the name "Bedrich". Notice the letter "r" in the name. It has an accent, just as it does...
1
by: davidson1 | last post by:
Hai, i have a string 04AF045 in that AF are character , others are numbers , now what i need is i have to extract AF alone in ASP.NET written in vb. if anybody know pl reply me.. other...
45
by: Dennis | last post by:
Hi, I have a text file that contents a list of email addresses like this: "foo@yahoo.com" "tom@hotmail.com" "jerry@gmail.com" "tommy@apple.com" I like to
3
by: SteveB | last post by:
I have posted this question in the Visual Basic 2005 and Visual Basic .Net 2005 discussion groups, also. Hi. I am developing an application/web page with VB.Net that will populate a SQL...
10
by: Paul W | last post by:
Hi all, I have an application that reads data in from a text file and stores it in a database. My problem is that there are some characters in the file that aren't being handled properly. For...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.