473,382 Members | 1,814 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

scraping image by beautifulsoup

1
hi all
I am trying to scraping question from chegg.com site and save it
as html file
the web site when contains images .
The images link is either internal as https://media.cheggcdn.com/media/eb7...0307/phpDbKTCI look at the question link https://www.chegg.com/homework-help/...t2-u-q69085812
or external as //d2vlcm61l7u1fs.cloudfront.net/media%2Fb2b%2Fb2b8dcb5-ae0d-4ad1-9156-eda0dd651978%2FphpX4CpFQ.png look at the question link https://www.chegg.com/homework-help/...s-ch-q10531553 ,
so when it is external, the images do not appear in the scraping process
errors console
GET file://d2vlcm61l7u1fs.cloudfront.net/media%2F078%2F078e768f-d236-48fa-aff9-3365467e00d3%2FphpjRcT9F.png net::ERR_INVALID_URL
....
my code
Expand|Select|Wrap|Line Numbers
  1. url=''
  2.     headers = {
  3.         'authority': 'www.chegg.com',
  4.        ....
  5. ...
  6.     }
  7. a = scraper.get(url, headers=headers)
  8. b =r.content
  9. soup = BeautifulSoup(b, "html.parser")
  10. c= soup.find("div", {"class": "rKMzl"})
  11. with open("d.html", "w", encoding = 'utf-8') as file:
  12.  
  13.  
  14.        file.write(str(c))
  15.  
  16.  
Any suggestion I would appreciate it
Jul 27 '22 #1
1 8169
dev7060
636 Expert 512MB
errors console
GET file://xxxxxx.xxxxxxxxx.xxx/xxxxxxxxxxx.png net::ERR_INVALID_URL
file:// is probably a way to look in the local storage. A protocol like HTTPS may be required in a valid URL to access the image.
Jul 28 '22 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: David Jones | last post by:
Hi, I'm interested in learning about web scraping/site scraping using Python. Does anybody know of some online resources or have any modules that are available to help out. O'Reilly published an...
1
by: Dan Stromberg | last post by:
Has anyone tried to construct an HTML janitor script using BeautifulSoup? My situation: I'm trying to convert a series of web pages from .html to palmdoc format, using plucker, which is...
3
by: Sanjay Arora | last post by:
We are looking to select the language & toolset more suitable for a project that requires getting data from several web-sites in real- time....html parsing/scraping. It would require full emulation...
7
by: Gonzillaaa | last post by:
I'm trying to get the data on the "Central London Property Price Guide" box at the left hand side of this page http://www.findaproperty.com/regi0018.html I have managed to get the data :) but...
3
by: ArKane | last post by:
Hello all, I've been hacking away at perl for a few months now, mainly using the LWP module, used for web scraping. Amoung its capabilities include support for HTTPS and proxies, authentication,...
7
by: ljr2600 | last post by:
Hello, I'm very new to python and still familiarizing myself with the language, sorry if the post seems moronic or simple. For a side project I'm working on I need to be able to scrape a...
3
by: bruce | last post by:
Hi... got a short test app that i'm playing with. the goal is to get data off the page in question. basically, i should be able to get a list of "tr" nodes, and then to iterate/parse them....
0
by: JayMartMedia | last post by:
In this article I will be showing you how to use PHP to scrape a web page. There is a video version of this tutorial on YouTube at https://youtu.be/Uc5mfudMTKE if you prefer learning in a video...
0
by: autodeveloper | last post by:
Background Recently, I to monitor contact info of some slack channels, but I don't have API to invoke. So I need use UI Automation to finish this task. I found this library recently Clicknium,...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.