473,699 Members | 2,386 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

extract character strings from displayed web page.

I'm an experienced C programmer, but I have never worked with any sort
of internet programming. I would like to write a program to search for
certain character strings in a currently displayed web page, and then
get the string that immediatly follows the one that I searched for. It
seems like an easy thing to do, after all the stuff that I want is
staring me right in the face, but I have no idea where that stuff is
stored or how to access it.
Thanks

Ron
Nov 13 '05 #1
7 3126
ot****@cs.com (A Causal) wrote:
I'm an experienced C programmer, but I have never worked with any sort
of internet programming. I would like to write a program to search for
certain character strings in a currently displayed web page, and then
get the string that immediatly follows the one that I searched for. It
seems like an easy thing to do, after all the stuff that I want is
staring me right in the face, but I have no idea where that stuff is
stored or how to access it.


As this is highly OS/application dependent, you should ask this in a
newsgroup dedicated to that, as comp.lang.c is only about portable
ISO-C. See http://www.angelfire.com/ms3/bchambl...me_to_clc.html
Regards
--
Irrwahn
(ir*******@free net.de)
Nov 13 '05 #2
Greetings.

In article <29************ **************@ posting.google. com>, A Causal
wrote:
I'm an experienced C programmer, but I have never worked with any sort
of internet programming. I would like to write a program to search for
certain character strings in a currently displayed web page, and then
get the string that immediatly follows the one that I searched for. It
seems like an easy thing to do, after all the stuff that I want is
staring me right in the face, but I have no idea where that stuff is
stored or how to access it.


Interfacing with web browsers or the http protocol is not something which is
built into C, so there is no standard answer to your query. It will depend
on your particular compiler, operating system, and/or whatever third-party
libraries you use. If you assume that the user has already saved the HTML
file to disk, however, then it's just a regular text file which C can
process.

Note that C isn't particularly well-suited for intensive text processing,
though; unless it's being integrated in a much larger C program, it would
be better and faster to write the sort of application you describe using
some regexp-based tool such as sed or perl.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #3
Tristan Miller <ps********@not hingisreal.com> spoke thus:
Note that C isn't particularly well-suited for intensive text processing,
though; unless it's being integrated in a much larger C program, it would
be better and faster to write the sort of application you describe using
some regexp-based tool such as sed or perl.


Why do you say that? (note that this is an honest question, not a challenge)

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cybers pace.org | don't, I need to know. Flames welcome.
Nov 13 '05 #4
Greetings.

In article <bm**********@c hessie.cirr.com >, Christopher Benson-Manica wrote:
Note that C isn't particularly well-suited for intensive text processing,
though; unless it's being integrated in a much larger C program, it would
be better and faster to write the sort of application you describe using
some regexp-based tool such as sed or perl.


Why do you say that? (note that this is an honest question, not a
challenge)


It's simply a question of specialization of tools. It's certainly possible
to drive in a nail using the blunt end of a screwdriver, though it would be
faster and less accident-prone to use a hammer. Likewise, building
applications (even small ones) which deal almost exclusively with text
processing is usually more efficient (with respect to development time and
ease of debugging, not necessarily execution speed) when using a language
specifically devoted to that task. A program to do regular expression
search-and-replacement on multiple files is literally four characters long
in sed (not counting the filenames and regular expressions themselves); the
corresponding program in C would necessarily be several lines long, even if
one used a third-party regexp library. You would need to include the
regexp and stdio headers, define the main function, declare a file pointer,
open each file in argv[] for reading (including error checking), loop
through each line of the file, do the regexp replacement, write out the new
line, close the file, and finally return from main. Sure, the compiled C
program might run a hundred times faster than the corresponding interpreted
sed or perl code, but if it's just a one-off program, you've just wasted
five minutes to write the C program plus 0.00001 seconds to run it versus
spending five seconds to write the sed program plus 0.001 seconds to run
it.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #5
Tristan Miller <ps********@not hingisreal.com> spoke thus:
It's simply a question of specialization of tools. It's certainly possible
to drive in a nail using the blunt end of a screwdriver, though it would be
faster and less accident-prone to use a hammer.


You've clearly never hit your thumb with a hammer ;) Seriously, on reading
your original post, I thought you were speaking of execution efficiency, which
you were not. No complaints from me in that case... Would you say, then,
that C is pretty good for text processing as far as execution efficiency is
concerned?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cybers pace.org | don't, I need to know. Flames welcome.
Nov 13 '05 #6
Greetings.

In article <bm**********@c hessie.cirr.com >, Christopher Benson-Manica wrote:
You've clearly never hit your thumb with a hammer ;) Seriously, on
reading your original post, I thought you were speaking of execution
efficiency, which
you were not. No complaints from me in that case... Would you say, then,
that C is pretty good for text processing as far as execution efficiency
is concerned?


Optimization for speed and memory use is compiler-dependent, but generally
speaking, yes, a well-written algorithm in compiled C will be faster at
running text processing applications than the same application executed in
interpreted sed. With C, you're simply "closer to the hardware", plus
there's no need to load in a potentially huge interpreter every time you
want to run your application.

In this day and age, however, you aren't going to gain that much in text
processing even if you do use C. The bottleneck in the application is more
likely to be the inherent sloth of I/O rather than inefficient code. I
work in natural-language processing, and I can attest that even those
researchers who routinely process text corpora ranging into the gigabytes
don't flinch at using high-level text- or logic-oriented languages like
Perl or Prolog to munge the data. We tend to use C more for plain old
number-crunching, as with the large co-occurrence matrices the
aforementioned mungers may produce.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #7
ot****@cs.com (A Causal) wrote:
# I'm an experienced C programmer, but I have never worked with any sort
# of internet programming. I would like to write a program to search for
# certain character strings in a currently displayed web page, and then
# get the string that immediatly follows the one that I searched for. It
# seems like an easy thing to do, after all the stuff that I want is
# staring me right in the face, but I have no idea where that stuff is
# stored or how to access it.

You'll need some library to open the socket and fetch the page; this is not
part of standard C. You'll also need to decide exactly what you mean by 'string'
and 'after' if you are fetching (as normal) an HTML page; you can also find
libraries to parse HTML if you need that. If you don't need to parse the HTML,
you can just read the socket stream with stdio and use state table or
strstr() or other such techniques to scan the input.

You can also use something other than C. Scripting languages can do this kind
of stuff in half a dozen lines.

--
Derk Gwen http://derkgwen.250free.com/html/index.html
What kind of convenience store do you run here?
Nov 13 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2059
by: Logical | last post by:
I wanted to do: include('page.htm?id=12&foo=bar'); But since I can't (and don't want to make another seperate HTTP request with include('http://...')); I was wondering if there's a function similar to extract(); that can handle a query string as input, so that I could: $id = 12; $foo = 'bar'; include('page.htm');
9
16977
by: Sharon | last post by:
hi, I want to extract a string from a file, if the file is like this: 1 This is the string 2 3 4 how could I extract the string, starting from the 10th position (i.e. "T") and extract 35 characters (including "T") from a file and then go to next line?
5
102520
by: rs | last post by:
I have a table with a timestamp field which contains the date and time. ie. 9/13/2004 9:10:00 AM. I would like to split this field into 2 fields, one with just the DATE portion ie 9/13/2004 and the other with just the TIME portion. ie 9:10:00 AM. I can make the table view display what I want by placing the same data in 3 fields and setting the display property to 'General Date',' Medium Time' and 'Short Date' but the underlying data...
1
2454
by: Kenneth McDonald | last post by:
I am going to demonstrate my complete lack of understanding as to going back and forth between character encodings, so I hope someone out there can shed some light on this. I have always depended on the kindness of strangers... :-) I'm playing around with some very simplistic french to english translation. As some text to work with, I copied the following from a french news site:
1
1339
by: Kevin Laurence | last post by:
Can someone help me with a character set/encoding problem? I am using a MySQL database with PHP to store the name "Bedrich". Notice the letter "r" in the name. It has an accent, just as it does in the following Wikipedia entry: http://en.wikipedia.org/wiki/Bedrich_Smetana. On my PHP-generated HTML page it's displayed as "Bed?ich". What am I doing wrong? MySQL is configured to use UTF-8, and I can see that it's working when I
1
2025
by: davidson1 | last post by:
Hai, i have a string 04AF045 in that AF are character , others are numbers , now what i need is i have to extract AF alone in ASP.NET written in vb. if anybody know pl reply me.. other example 05AD100
45
4435
by: Dennis | last post by:
Hi, I have a text file that contents a list of email addresses like this: "foo@yahoo.com" "tom@hotmail.com" "jerry@gmail.com" "tommy@apple.com" I like to
3
10580
by: SteveB | last post by:
I have posted this question in the Visual Basic 2005 and Visual Basic .Net 2005 discussion groups, also. Hi. I am developing an application/web page with VB.Net that will populate a SQL database from text extracted from PDF documents. However, I am having a difficult time finding or developing the appropriate code to convert the PDF streams into text strings. Has anyone developed code to convert PDF's to Text? I was able write a...
10
3974
by: Paul W | last post by:
Hi all, I have an application that reads data in from a text file and stores it in a database. My problem is that there are some characters in the file that aren't being handled properly. For instance, one of the characters has an ASCII code of 150 (it looks like a dash '-'), when I'm debugging this character is displayed as the square box that Windows uses for unsupported characters and when it's copied to the database it's stored as...
0
9172
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9032
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8908
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8880
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6532
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5869
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3054
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2344
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.