473,396 Members | 1,914 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Using python to create a web crawler/ spider

Hello everyone, I thank you for your time. I am completely new to the field so I apologize for any ignorance. I am trying to write a program using python that will go into a university web-page and retrieve all the ISBNs for books being used the following semester that fit certain criteria. I believe Python can do this but like I said I know almost nothing about computer programming. I believe this can be VERY easily done from what I've seen. If someone could just give me a starting point or at least let me know if Python can do this. Again, thank you for your time!!!
Jun 1 '10 #1
2 3108
dwblas
626 Expert 512MB
Since you have no idea, I would suggest using something like links
http://www.jikos.cz/~mikulas/links/download/binaries/
http://links.sourceforge.net/docs/ma...nks-usage.html
use "links -dump www.URL" to download and save as a text file which you can then parse and extract whatever data you want.
Jun 2 '10 #2
Glenton
391 Expert 256MB
Python can almost certainly do this.

You'll need the urllib and urllib2 libraries, and possibly the regular expression library. And a bunch of hours. Good luck!
Jun 2 '10 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Jon Moldover | last post by:
Hi, I'm using Python in my win32 app by linking to the python23.dll. I'm trying to expose some c++ code in my app to Python so I can make application calls from Python scripts (according to the...
3
by: Kay Lee | last post by:
Hi, I looked up os module to find out some method to move and copy files in python, but os doesn't support such methods. Is there any way to move & copy files in python? Thanks in adv.
10
by: Frog | last post by:
Hi, i'm not a programmer so I have a very stupid question. I'm trying to make a practical script. I need to run an executable program in it but i can't get it to work. Maybe someone here can...
8
by: Sridhar R | last post by:
Hi, I am a little experienced python programmer (2 months). I am somewhat experienced in C/C++. I am planning (now in design stage) to write an IDE in python. The IDE will not be a simple...
0
by: David Mitchell | last post by:
Hello group, I'm trying to create a TCP server using Python, and I want it to run under Windows as a service. Now, I'm fine with building the TCP server using Python - done it lots of times,...
0
by: Michael B. Trausch | last post by:
Hello, everyone. I am doing some searching and winding up a little bit confused. I have a MUD client that I am writing using Python and wxWidgets, as some of you may remember. What I am...
9
by: dominiquevalentine | last post by:
Hello, I'm a teen trying to do my part in improving the world, and me and my pal came up with some concepts to improve the transportation system. I have googled up and down for examples of using...
1
by: harryGill | last post by:
Hi Was just wondering how to create a spider diagram in asp by collecting data from SQL database? Spider diagrams can be created in word or excel but i dont know how to do one in programming. If...
0
by: kishorealla | last post by:
Hello I need to create a web bot/crawler/spider that would go into different web sites and collect data for us and store in a database. The crawler needs to 'READ' the options on a website (either...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.