473,569 Members | 2,870 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

BeautifulSoup extract certain information after located text?

1 New Member
From html page:

Expand|Select|Wrap|Line Numbers
  1. <div class="peoples-info">
  2. <ul>
  3. <li><strong>Gender:</strong> F</li>
  4. <li><strong>Birthdate:</strong> 00/00/2000</li>
  5. <li><strong>Family Phone:</strong> 000-000-0000</li>
  6. <li><strong>Personal Phone:</strong> 000-000-0000</li>
  7. </ul>
  8. </div>
  9. </div>
  10. <div>
I wanted to extract using BeautifulSoup's find_next function, but I could only do tables such as:

Expand|Select|Wrap|Line Numbers
  1. for gender in soup.find('td', text='gender:'):
  2.     print(gender.find_next("td").text)
Which does not work with div when I replace "td" with "li"; also, title and number are in the same line with only the format changed a bit. Is there a way to extract only information such as phone numbers and birthdays without their titles ("000-000-0000")? Thanks!
Aug 8 '21 #1
1 2861
SioSio
272 Contributor
This is a brute force way,
Expand|Select|Wrap|Line Numbers
  1. peoplesinfo = soup.find('div', class_='peoples-info')
  2. for element in peoplesinfo.find_all("li"):
  3.     el = element.find_all("strong")
  4.     print(element.text.replace(el[0].text, ''))
Aug 16 '21 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

0
1609
by: Vjay77 | last post by:
I posted this question, but I pressed 'post' and it disappeared. So once again: Problem: I need to go to lets say www.site.com/page.html Imagine that this html code is 6 mb long. I need to extract information between bytes 5000 and 5020.
7
2874
by: teo | last post by:
hallo, I need to extract a word and few text that precedes and follows it (about 30 + 30 chars) from a long textual document. Like the description that Google returns when it has found a given word. In example from:
8
2820
by: Fabian Braennstroem | last post by:
Hi, I would like to remove certain lines from a log files. I had some sed/awk scripts for this, but now, I want to use python with its re module for this task. Actually, I have two different log files. The first file looks like: ...
1
3345
by: gcmartijn | last post by:
I'm trying to extract something like this: <object classid=clsid:D27CDB6E-AE6D-11cf-96B8-444553540000 codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/ swflash.cab#version=7,0,19,0" width=640 height=400> <param name=movie value=url> <param name=quality value=high><param name=SCALE value=showall> <embed src=url...
1
1636
by: Computernut234 | last post by:
Hi, i'm doing a project for my Java class and I know how to add text to a .txt document and remove the entire text but I do not know how to only remove certain parts of text. My code is supposed to allow users to add/remove text depending on the condition. The code is import java.io.*; import java.util.*; public class texteditor
0
1425
by: wbw | last post by:
I am trying to extract capitalized words from text in Excel. I have a list of a combination of brands and products and I am trying to extract out the product attribute from the text. Since the text varies in length, I cannot use standard text parsing excel functions to extract the product from the text. I could use text to columns but that gets...
12
14441
by: maximus tee | last post by:
i want to extract certain section of the text file. my input file: -- num cell port function safe "17 (BC_1, CLK, input, X)," & "16 (BC_1, OC_NEG, input, X), " &-- Merged input/ " 8 (BC_1, D(8), input, X)," & -- cell 16 @ 1 -> Hi-Z " 7 (BC_1, Q(1), output3, X, 16, 1, Z)," & " 0 (BC_1, Q(8), output3, X, 16, 1, Z)"; and i...
8
5054
by: ryaanmichael | last post by:
VBA question for Access 2010: I have a MEMO field in a form where users will paste a few lines of information. The structure/characters of the pasted information will always be in the same format, however the data will change (For example, 05/12/2012 may be 01/01/2013, but it will always be in the format of 99/99/9999. The information will also...
5
1095
by: jdb1229 | last post by:
I am completely new to Access. I received some great help with another question I had so I was hoping to get some assistance with another. I have never coded before I started this project so I apologize, my verbage and usage of the information you give me may be in correct and I hope you can be patient with me. Thank you for anyone's time who...
0
7703
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7618
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7926
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7679
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7983
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6287
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5223
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
1
1228
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
946
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.