当前位置：首页 > news >正文

网上给别人做设计的网站/网站建设报价单

news 2025/8/13 13:36:06

网上给别人做设计的网站,网站建设报价单,房产最新政策,网络安全形势下怎么建设学校网站我正在尝试使用NLTK库训练数据。我遵循一个逐步的过程。我做了第一步，但是在做第二步时，出现以下错误：TypeError: a bytes-like object is required, not list我已尽力纠正了它，但又遇到了同样的错误。这是我的代码：…

我正在尝试使用NLTK库训练数据。我遵循一个逐步的过程。我做了第一步，但是在做第二步时，出现以下错误：

TypeError: a bytes-like object is required, not 'list'

我已尽力纠正了它，但又遇到了同样的错误。

这是我的代码：

from bs4 import BeautifulSoup

import urllib.request

response = urllib.request.urlopen('http://php.net/')

html = response.read()

soup = BeautifulSoup(html,"html5lib")

text = soup.get_text(strip=True)

print (text)

这是我的错误

C:\python\lib\site-packages\bs4\__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 8 of the file E:/secure secure/chatbot-master/nltk.py. To get rid of this warning, change code that looks like this:

BeautifulSoup(YOUR_MARKUP})

to this:

BeautifulSoup(YOUR_MARKUP,"html5lib")

markup_type=markup_type))

Traceback (most recent call last):

File"E:/secure secure/chatbot-master/nltk.py", line 8, in

soup = BeautifulSoup(html)

File"C:\python\lib\site-packages\bs4\__init__.py", line 228, in __init__

self._feed()

File"C:\python\lib\site-packages\bs4\__init__.py", line 289, in _feed

self.builder.feed(self.markup)

File"C:\python\lib\site-packages\bs4\builder\_html5lib.py", line 72, in feed

doc = parser.parse(markup, **extra_kwargs)

File"C:\python\lib\site-packages\html5lib\html5parser.py", line 236, in parse

parseMeta=parseMeta, useChardet=useChardet)

File"C:\python\lib\site-packages\html5lib\html5parser.py", line 89, in _parse

parser=self, **kwargs)

File"C:\python\lib\site-packages\html5lib\tokenizer.py", line 40, in __init__

self.stream = HTMLInputStream(stream, encoding, parseMeta, useChardet)

File"C:\python\lib\site-packages\html5lib\inputstream.py", line 148, in HTMLInputStream

return HTMLBinaryInputStream(source, encoding, parseMeta, chardet)

File"C:\python\lib\site-packages\html5lib\inputstream.py", line 416, in __init__

self.rawStream = self.openStream(source)

File"C:\python\lib\site-packages\html5lib\inputstream.py", line 453, in openStream

stream = BytesIO(source)

TypeError: a bytes-like object is required, not 'list'

您是否看过这篇文章：stackoverflow.com/questions/16206380/？您可以尝试get_text：crummy.com/software/BeautifulSoup/bs4/doc/#get-text

我尝试运行您的脚本，它返回文本就好了吗？您可以发布详细的错误消息吗？

在运行即时消息时出现这样的错误

TypeError：需要一个类似字节的对象，而不是列表

脚本运行正常，请编辑问题并添加错误消息。

我试图粘贴完整的错误，但它没有发布@ sid2491

您可以通过实现一个简单的标签剥离器来实现。

def strip_tags(html, invalid_tags):

soup = BeautifulSoup(html)

for tag in soup.findAll(True):

if tag.name in invalid_tags:

s =""

for c in tag.contents:

if not isinstance(c, NavigableString):

c = strip_tags(unicode(c), invalid_tags)

s += unicode(c)

tag.replaceWith(s)

return soup

html ="

Love, Hate, and Happinessy

invalid_tags = ['b', 'i', 'u']

print strip_tags(html, invalid_tags)

结果是：

Love, Hate, and Happiness

您的代码按原样工作。

UserWarning: No parser was explicitly specified是您的语句为soup = BeautifulSoup(html)的时间。

TypeError: a bytes-like object is required, not 'list'错误可能是由于依赖关系问题引起的。

bs4文档说如果不指定解析器(如BeautifulSoup(markup))，它将使用系统上安装的最佳HTML解析器：

If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser.

在我的系统上，使用BeautifulSoup(html,"html.parser")效果很好，速度不错，没有任何警告。 html.parser带有Python的标准库。

该文档还总结了每个解析器库的优缺点：

尝试BeautifulSoup(html,"html.parser")。它应该工作。

如果需要速度，可以尝试BeautifulSoup(html,"lxml")。如果您没有lxml的HTML解析器，则在Windows上，可能需要使用pip install lxml进行安装。

对于寻找在python 3中有效的答案的任何人

invalidTags = ['br','b','font']

def stripTags(html, invalid_tags):

soup = BeautifulSoup(html,"lxml")

for tag in soup.findAll(True):

if tag.name in invalid_tags:

s ="::"

for c in tag.contents: