上海网站建设 网站开发肇庆seo优化
已更新以使用更通用的方法(请参阅编辑历史记录以获取原始答案):
您可以通过测试它们是否为NavigableString的实例来提取外部div的子元素.
from bs4 import BeautifulSoup, NavigableString
html = '''
this is the text i do NOT want
this is the text i want here
soup = BeautifulSoup(html)
outer = soup.div
inner_text = [element for element in outer if isinstance(element, NavigableString)]
这导致外部div元素中包含的字符串列表.
>>> inner_text
[u'
', u'
this is the text i want here
']
>>> ''.join(inner_text)
u'
this is the text i want here
'
对于你的第二个例子:
html = '''
this is the text i want here
soup2 = BeautifulSoup(html)
outer = soup2.div
inner_text = [element for element in outer if isinstance(element, NavigableString)]
>>> inner_text
[u'
this is the text i want here
']
这也适用于其他情况,例如外部div的文本元素在任何子标记之前,子标记之间,多个文本元素之间或根本不存在.