BeautifulSoup parent tag

I have some html that I want to extract text from. Here's an example of the html:

<p>TEXT I WANT <i> &#8211; </i></p>

Now, there are, obviously, lots of <p> tags in this document. So, find('p') is not a good way to get at the text I want to extract. However, that <i> tag is the only one in the document. So, I thought I could just find the <i> and then go to the parent.

I've tried:

up = soup.select('p i').parent

and

up = soup.select('i')
print(up.parent)

and I've tried it with .parents, I've tried find_all('i'), find('i')... But I always get:

'list' object has no attribute "parent"

What am I doing wrong?

4 Answers

find_all() returns a list. find('i') returns the first matching element, or None.

Thus, use:

try: up = soup.find('i').parent
except AttributeError: # no <i> element

Demo:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<p>TEXT I WANT <i> &#8211; </i></p>')
>>> soup.find('i').parent
<p>TEXT I WANT <i> – </i></p>
>>> soup.find('i').parent.text
u'TEXT I WANT \u2013 '
3

This works:

i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()

output:

'TEXT I WANT'

As mentioned in other answers, find_all() returns a list, whereas find() returns the first match or None

If you are unsure about the presence of an i tag you could simply use a try/except block

1

Both select() and find_all() return you an array of elements. You should do like follow:

for el in soup.select('i'): print el.parent.text
1

soup.select() returns a Python List. So you have 'unlist' the variable e.g.:

>>> [up] = soup.select('i')
>>> print(up.parent)

or

>>> up = soup.select('i')
>>> print(up[0].parent)

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like