473,387 Members | 1,549 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Is C# appropriate for this ?

I have a task which involves downloading data from some web pages. There
is a lot of messing about with forms (clicking buttons, selecting icons
etc) as well as pasrsing HTML to extract tables from the resulting web page.

I am torn as to whether to do this in PHP (or *shudder* Perl) or C#. I
know the C# language is "internet aware" (but so are the other
languages I mentioned.

Obviously, this being a C# ng, no prizes for guessing where the bias
will lie - BUT, I would be very grateful if anyone could point out where
C# may have an edge over the other languages - as well as any "gotchas"
(or drawbacks) I may need to be aware of ...

Jul 7 '06 #1
10 1126
Bit byte,

Yes, you can definitely do this.

I would recommend that you look into using MSHTML for this. This is the
document object model that microsoft uses in internet explorer for parsing
pages that are downloaded from the web.

While you can use this through C#, you really don't have to use it if
you don't want to. MSHTML is a COM component, and therefore, accessible by
any language that offers access to COM components (.NET does, as do a good
number of other development technologies).

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Bit byte" <fl**@flop.comwrote in message
news:V4********************@bt.com...
>I have a task which involves downloading data from some web pages. There is
a lot of messing about with forms (clicking buttons, selecting icons etc)
as well as pasrsing HTML to extract tables from the resulting web page.

I am torn as to whether to do this in PHP (or *shudder* Perl) or C#. I
know the C# language is "internet aware" (but so are the other languages
I mentioned.

Obviously, this being a C# ng, no prizes for guessing where the bias will
lie - BUT, I would be very grateful if anyone could point out where C# may
have an edge over the other languages - as well as any "gotchas" (or
drawbacks) I may need to be aware of ...

Jul 7 '06 #2
On Fri, 07 Jul 2006 03:34:12 +0100, Bit byte wrote:
I have a task which involves downloading data from some web pages. There
is a lot of messing about with forms (clicking buttons, selecting icons
etc) as well as pasrsing HTML to extract tables from the resulting web
page.

I am torn as to whether to do this in PHP (or *shudder* Perl) or C#. I
know the C# language is "internet aware" (but so are the other
languages I mentioned.

Obviously, this being a C# ng, no prizes for guessing where the bias
will lie - BUT, I would be very grateful if anyone could point out where
C# may have an edge over the other languages - as well as any "gotchas"
(or drawbacks) I may need to be aware of ...
C# language not know internet at all, knows as much as Eskymoe knows about
curry. PHP and Perl are old fashion interpreter so slowly execute compare
to C#.
Jul 7 '06 #3
Bit byte wrote:
I have a task which involves downloading data from some web pages. There
is a lot of messing about with forms (clicking buttons, selecting icons
etc) as well as pasrsing HTML to extract tables from the resulting web page.

I am torn as to whether to do this in PHP (or *shudder* Perl) or C#. I
know the C# language is "internet aware" (but so are the other
languages I mentioned.

Obviously, this being a C# ng, no prizes for guessing where the bias
will lie - BUT, I would be very grateful if anyone could point out where
C# may have an edge over the other languages - as well as any "gotchas"
(or drawbacks) I may need to be aware of ...
I still haven't bitten the bullet and done more than read a PHP book,
but "anything Perl can do, C# can do better." That may be a slight
exaggeration, as Perl does have libraries for just about everything,
but it's probably not, as the FCL is awfully comprehensive. I do know
that lately all the "screen scraping" applets that I used to do in
Perl, I now do in C#:

* The FCL WebRequest is just as easy to use as the Perl
libraries that synchronously download web pages.

* The FCL Regex can do anything the Perl regex can do ... and more.

* It's just as easy to upload files (via FTP) in C# as in Perl, and
you don't even have to create a temp file.

And the kicker is that the C# is much easier to read.

--
Be the first to review my new book!

..NET 2.0 for Delphi Programmers www.midnightbeach.com/.net
Delphi skills make .NET easy to learn In print, in stores.
Jul 7 '06 #4
Bhagat Gurtu wrote:
On Fri, 07 Jul 2006 03:34:12 +0100, Bit byte wrote:
>I have a task which involves downloading data from some web pages. There
is a lot of messing about with forms (clicking buttons, selecting icons
etc) as well as pasrsing HTML to extract tables from the resulting web
page.

I am torn as to whether to do this in PHP (or *shudder* Perl) or C#. I
know the C# language is "internet aware" (but so are the other
languages I mentioned.

Obviously, this being a C# ng, no prizes for guessing where the bias
will lie - BUT, I would be very grateful if anyone could point out where
C# may have an edge over the other languages - as well as any "gotchas"
(or drawbacks) I may need to be aware of ...

C# language not know internet at all, knows as much as Eskymoe knows about
curry. PHP and Perl are old fashion interpreter so slowly execute compare
to C#.
Hi Bhagat,

I was wondering if you could clarify your point about C# not knowing the
internet at all. I'm wondering what you mean by this.

Thanks.

--
Hope this helps,
Tom Spink
Jul 7 '06 #5
On Fri, 07 Jul 2006 05:51:18 +0100, Tom Spink wrote:

Hi Bhagat,

I was wondering if you could clarify your point about C# not knowing the
internet at all. I'm wondering what you mean by this.
Language of C# does not concern with internet, it just like any other
computer language specyfication. The one is confusing language of itself
and libry of classes for utilization of internet. Not need C# to use libry
of classes for utilization of internet, can use ASP.NET, VB.NET and other
..net language.
Jul 7 '06 #6

Bhagat Gurtu wrote:
On Fri, 07 Jul 2006 05:51:18 +0100, Tom Spink wrote:

Hi Bhagat,

I was wondering if you could clarify your point about C# not knowing the
internet at all. I'm wondering what you mean by this.
Language of C# does not concern with internet, it just like any other
computer language specyfication. The one is confusing language of itself
and libry of classes for utilization of internet. Not need C# to use libry
of classes for utilization of internet, can use ASP.NET, VB.NET and other
.net language.
How can somebody preach about C# being a language and knowing nothing
about the internet etc, and then go on to call ASP.Net a .net
language?? ASP.Net is just another part of the .Net FCL, it's not a
language in the slightest.

Jul 7 '06 #7
Bit byte,
You can use the classes in System.Net to handle much of this. Clicking
buttons, etc. is nothing more than a form post, so if you can figure out the
target you can handle that as well. If you need to do any heavy-duty HTML
Parsing to scrape out specific contents, take a look at Simon Mourier's
HtmlAgilityPack, which is written in (Gasp!) -- C#.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"Bit byte" wrote:
I have a task which involves downloading data from some web pages. There
is a lot of messing about with forms (clicking buttons, selecting icons
etc) as well as pasrsing HTML to extract tables from the resulting web page.

I am torn as to whether to do this in PHP (or *shudder* Perl) or C#. I
know the C# language is "internet aware" (but so are the other
languages I mentioned.

Obviously, this being a C# ng, no prizes for guessing where the bias
will lie - BUT, I would be very grateful if anyone could point out where
C# may have an edge over the other languages - as well as any "gotchas"
(or drawbacks) I may need to be aware of ...

Jul 7 '06 #8
ma**********@button-it.co.uk wrote:
Bhagat Gurtu wrote:
>On Fri, 07 Jul 2006 05:51:18 +0100, Tom Spink wrote:

>>Hi Bhagat,

I was wondering if you could clarify your point about C# not knowing the
internet at all. I'm wondering what you mean by this.
Language of C# does not concern with internet, it just like any other
computer language specyfication. The one is confusing language of itself
and libry of classes for utilization of internet. Not need C# to use libry
of classes for utilization of internet, can use ASP.NET, VB.NET and other
.net language.

How can somebody preach about C# being a language and knowing nothing about
the internet etc, and then go on to call ASP.Net a .net language?? ASP.Net is
just another part of the .Net FCL, it's not a language in the slightest.
I find Bhagat's main point to be clear enough. And correct, AFAIK - any
knowledge of the internet is in applications, not the language itself. If you
disagree, perhaps you could post an example of "internet aware" C# syntax, or a
reference to the relevant portion of the language spec?

I hadn't noticed any particular "preaching" in this thread, not sure what you
mean by that. There may have been a bit of trivial carping though, now that you
mention it . . .

Regards,
-rick-
Jul 7 '06 #9

Rick Lones wrote:
ma**********@button-it.co.uk wrote:
Bhagat Gurtu wrote:
On Fri, 07 Jul 2006 05:51:18 +0100, Tom Spink wrote:
Hi Bhagat,

I was wondering if you could clarify your point about C# not knowing the
internet at all. I'm wondering what you mean by this.

Language of C# does not concern with internet, it just like any other
computer language specyfication. The one is confusing language of itself
and libry of classes for utilization of internet. Not need C# to use libry
of classes for utilization of internet, can use ASP.NET, VB.NET and other
.net language.
How can somebody preach about C# being a language and knowing nothing about
the internet etc, and then go on to call ASP.Net a .net language?? ASP.Net is
just another part of the .Net FCL, it's not a language in the slightest.

I find Bhagat's main point to be clear enough. And correct, AFAIK - any
knowledge of the internet is in applications, not the language itself. If you
disagree, perhaps you could post an example of "internet aware" C# syntax, or a
reference to the relevant portion of the language spec?

I hadn't noticed any particular "preaching" in this thread, not sure what you
mean by that. There may have been a bit of trivial carping though, now that you
mention it . . .

Regards,
-rick-
I would have to own up and admit that my post was pointless really as
although Bhagat's post wasn't completely accurate you could easily see
what he ment.

My apologies

Jul 7 '06 #10

"Jon Shemitz" <jo*@midnightbeach.comwrote in message
news:44***************@midnightbeach.com...
Bit byte wrote:
>I have a task which involves downloading data from some web pages. There
is a lot of messing about with forms (clicking buttons, selecting icons
etc) as well as pasrsing HTML to extract tables from the resulting web
page.

I am torn as to whether to do this in PHP (or *shudder* Perl) or C#. I
know the C# language is "internet aware" (but so are the other
languages I mentioned.

Obviously, this being a C# ng, no prizes for guessing where the bias
will lie - BUT, I would be very grateful if anyone could point out where
C# may have an edge over the other languages - as well as any "gotchas"
(or drawbacks) I may need to be aware of ...

I still haven't bitten the bullet and done more than read a PHP book,
but "anything Perl can do, C# can do better." That may be a slight
exaggeration, as Perl does have libraries for just about everything,
but it's probably not, as the FCL is awfully comprehensive. I do know
that lately all the "screen scraping" applets that I used to do in
Perl, I now do in C#:

* The FCL WebRequest is just as easy to use as the Perl
libraries that synchronously download web pages.

* The FCL Regex can do anything the Perl regex can do ... and more.

* It's just as easy to upload files (via FTP) in C# as in Perl, and
you don't even have to create a temp file.

And the kicker is that the C# is much easier to read.

Hmm, I don't think C# is much easier to read. I think that the .Net IDE
makes it easier because of intellisense, colorizations, regions, etc. I
think that if perl had all this, it *could* be just as easy to read. Also,
like most other topics relating to *easier to read*, it's all based on the
developer that wrote it. I know people that can write C# that is almost
unreadable (and you'd think it went through a blender).

:) Just a side-note .. I agree that C# is better because (and this is the
best point I'm making)...I like C# better :P

Mythran

Jul 7 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Pham Nguyen | last post by:
I haven't used the READ UNCOMMITTED transaction isolation level before, and I was wondering if this would be an appropriate use: I have an ID table containing ID numbers that are randomly...
2
by: ted | last post by:
Was wondering if XSLT alone is appropriate for the following situation. From XML, I'm creating a small website (around 50 pages) with pages that link to each other through a nav menu and a...
1
by: David | last post by:
we are a group of developer, our intention is to developer free Microsoft ..NET components Initially source code is not available, so we are trying to find the appropriate license. All...
2
by: jason | last post by:
hello all, silly question, but i can't find a thread that answers it. i have stored procedures that have output parameters of the datatypes Money and DateTime. i have the following sample...
19
by: Linda | last post by:
In classic ASP I used to have a file called settings.asp included on every page of my web, it consisted of a number of different settings unique to this application, among them the database path...
0
by: Rob Dob | last post by:
Hi, I have a VS2003 C# asp.net project that has been converted into a VS2005 project. Everything seemed to work well until I make a modification to anything within the Component Designer...
53
by: Alan Silver | last post by:
Hello, I understand the issue that tables should be used for tabular data and not for layout, but I would like some clarification as to exactly what constitutes tabular data. For example, if...
1
by: satish mullapudi | last post by:
Hi all, I am using DB2 v 8.2 , Win XP OS. I have creaed a sample situation which results in a deadlock created by appl1 & appl2.Now, I want to release one of the locks (using CLP, not Control...
4
by: Viator | last post by:
Hi folks, I have a basic question. When I do object-oriented programming using C++ or Java, all my objects reside in RAM. I do not have to think about storing and retrieving them, because they...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.