中文分词工具比较
五款中文分词工具的比较,尝试的有jieba,SnowNLP,thulac(清华大学自然语言处理与社会人文计算实验室),StanfordCoreNLP,pyltp(哈工大语言云),环境是Win10,anaconda3.7 1.安装 Jieba: pip install jieba SnowNLP: pip install snownlp thulac: pip install thulac StanfordCoreNLP: pip install stanfordcorenlp 下载 CoreNLP 并解压,将中文包下载并解压至 CoreNLP 文件夹 pyltp: pip install pyltp,安装失败提示c++14 missing,手动编译失败,换成centos安装依然失败 2. 运行 a = 'Jimmy你怎么看' import jieba.posseg as pseg ws = pseg.cut(a) for i in ws: print(i) import thulac thu1 = thulac.thulac() text = thu1.cut(a) print(te...