469,282 Members | 1,655 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,282 developers. It's quick & easy.

BeautifulSoup extract certain information after located text?

From html page:

Expand|Select|Wrap|Line Numbers
  1. <div class="peoples-info">
  2. <ul>
  3. <li><strong>Gender:</strong> F</li>
  4. <li><strong>Birthdate:</strong> 00/00/2000</li>
  5. <li><strong>Family Phone:</strong> 000-000-0000</li>
  6. <li><strong>Personal Phone:</strong> 000-000-0000</li>
  7. </ul>
  8. </div>
  9. </div>
  10. <div>
I wanted to extract using BeautifulSoup's find_next function, but I could only do tables such as:

Expand|Select|Wrap|Line Numbers
  1. for gender in soup.find('td', text='gender:'):
  2.     print(gender.find_next("td").text)
Which does not work with div when I replace "td" with "li"; also, title and number are in the same line with only the format changed a bit. Is there a way to extract only information such as phone numbers and birthdays without their titles ("000-000-0000")? Thanks!
Aug 8 '21 #1
1 2352
SioSio
258 256MB
This is a brute force way,
Expand|Select|Wrap|Line Numbers
  1. peoplesinfo = soup.find('div', class_='peoples-info')
  2. for element in peoplesinfo.find_all("li"):
  3.     el = element.find_all("strong")
  4.     print(element.text.replace(el[0].text, ''))
Aug 16 '21 #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

8 posts views Thread by Fabian Braennstroem | last post: by
1 post views Thread by gcmartijn | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.