471,049 Members | 1,527 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,049 software developers and data experts.

scraping image by beautifulsoup

1
hi all
I am trying to scraping question from chegg.com site and save it
as html file
the web site when contains images .
The images link is either internal as https://media.cheggcdn.com/media/eb7...0307/phpDbKTCI look at the question link https://www.chegg.com/homework-help/...t2-u-q69085812
or external as //d2vlcm61l7u1fs.cloudfront.net/media%2Fb2b%2Fb2b8dcb5-ae0d-4ad1-9156-eda0dd651978%2FphpX4CpFQ.png look at the question link https://www.chegg.com/homework-help/...s-ch-q10531553 ,
so when it is external, the images do not appear in the scraping process
errors console
GET file://d2vlcm61l7u1fs.cloudfront.net/media%2F078%2F078e768f-d236-48fa-aff9-3365467e00d3%2FphpjRcT9F.png net::ERR_INVALID_URL
....
my code
Expand|Select|Wrap|Line Numbers
  1. url=''
  2.     headers = {
  3.         'authority': 'www.chegg.com',
  4.        ....
  5. ...
  6.     }
  7. a = scraper.get(url, headers=headers)
  8. b =r.content
  9. soup = BeautifulSoup(b, "html.parser")
  10. c= soup.find("div", {"class": "rKMzl"})
  11. with open("d.html", "w", encoding = 'utf-8') as file:
  12.  
  13.  
  14.        file.write(str(c))
  15.  
  16.  
Any suggestion I would appreciate it
1 Week Ago #1
1 7000
dev7060
581 Expert 512MB
errors console
GET file://xxxxxx.xxxxxxxxx.xxx/xxxxxxxxxxx.png net::ERR_INVALID_URL
file:// is probably a way to look in the local storage. A protocol like HTTPS may be required in a valid URL to access the image.
1 Week Ago #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

4 posts views Thread by David Jones | last post: by
1 post views Thread by Dan Stromberg | last post: by
3 posts views Thread by Sanjay Arora | last post: by
7 posts views Thread by Gonzillaaa | last post: by
7 posts views Thread by ljr2600 | last post: by
3 posts views Thread by bruce | last post: by
reply views Thread by JayMartMedia | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.