当前位置：首页 > news >正文

商丘网络有限公司/厦门seo搜索排名

news 2025/8/14 21:55:45

商丘网络有限公司,厦门seo搜索排名,蒙城做网站的公司,学校网站 cmsTesseract的OCR引擎目前已作为开源项目发布在Google Project，其项目主页在这里查看 https://github.com/tesseract-ocr， 它支持中文OCR，并提供了一个命令行工具。python中对应的包是pytesseract. 通过这个工具我们可以识别图片上的文字。 1…

Tesseract的OCR引擎目前已作为开源项目发布在Google Project，其项目主页在这里查看 https://github.com/tesseract-ocr，
它支持中文OCR，并提供了一个命令行工具。python中对应的包是pytesseract. 通过这个工具我们可以识别图片上的文字。

1、安装tesseract

yum install tesseract

2、安装pytesseract

pip install pytesseract

3、下载对应的中文训练集：https://github.com/tesseract-ocr/tessdata，下载”chi_sim.traineddata”，然后copy到训练数据集的存放路径。我安装后拷贝的路径是：

/usr/share/tesseract/tessdata

4、使用python调用识别中文

import pytesseract
from PIL import Imageimage = Image.open('data/ocr.png')
print pytesseract.image_to_string(image, lang = 'chi_sim')

5、图片识别准确率不高问题

from PIL import Image, ImageEnhanceimage = Image.open('data/tesseract.png')
enhancer = ImageEnhance.Contrast(image)
image = enhancer.enhance(4)

windows安装tesseract并配置环境参考这篇文章：https://segmentfault.com/a/1190000014086067

windows测试遇到的问题：pytesseract.pytesseract.TesseractError: (1, u'Error opening data file C:\\Progra......

import platform
import pytesseract
from PIL import Image, ImageEnhanceimage = Image.open('data/tesseract.png')
enhancer = ImageEnhance.Contrast(image)
image = enhancer.enhance(4)if platform.system() == 'Windows':tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'print pytesseract.image_to_string(image, lang = 'chi_sim', config = tessdata_dir_config)
else:print pytesseract.image_to_string(image, lang = 'chi_sim')

如果需要使用更高精度的可以尝试百度的API：https://cloud.baidu.com/doc/OCR/OCR-Python-SDK.html#.E9.85.8D.E7.BD.AEAipOcr

参考：https://blog.csdn.net/hk_jh/article/details/8961449

查看全文

http://www.lbrq.cn/news/771445.html