【语音识别】从入门到精通——最全干货大合集！-低调大师

【语音识别】从入门到精通——最全干货大合集！

2018-11-05 1025

入门学习

语音识别研究的四大前沿方

https://blog.csdn.net/haima1998/article/details/79094341

深度学习入门论文（语音识别领域）

https://blog.csdn.net/youyuyixiu/article/details/53764218

论语音识别三大关键技术

https://blog.csdn.net/qq_34231800/article/details/80189617

深度学习与语音识别—常用声学模型简介

https://blog.csdn.net/dujiajiyi_xue5211314/article/details/53943313

有趣的开源软件：语音识别工具Kaldi

https://blog.csdn.net/AMDS123/article/details/70313780

神经网络-CNN结构和语音识别应用

https://blog.csdn.net/xmdxcsj/article/details/54695995

语音识别概述

https://blog.csdn.net/shichaog/article/details/72528637

端到端语音识别

https://blog.csdn.net/xmdxcsj/article/details/70300546

Attention在语音识别中的应用

https://blog.csdn.net/quheDiegooo/article/details/76842201

语音合成技术

https://blog.csdn.net/wja8a45TJ1Xa/article/details/78599509?locationNum=8&fps=1

深度学习于语音合成研究综述

https://blog.csdn.net/weixin_37598106/article/details/81513816

端到端的TTS深度学习模型tacotron(中文语音合成)

https://blog.csdn.net/yunnangf/article/details/79585089

TACOTRON:端到端的语音合成

https://blog.csdn.net/Left_Think/article/details/74905928

声纹识别技术简介

https://www.cnblogs.com/wuxian11/p/6498699.html

声纹识别技术的现状、局限与趋势

https://blog.csdn.net/jojozhangju/article/details/78637221

声纹识别

https://www.jianshu.com/p/513dadeef1fd

Deep speaker介绍

https://blog.csdn.net/Lauyeed/article/details/79936632

论文

语音识别 DNN

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al.

https://ieeexplore.ieee.org/document/5740583/?part=1

Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6296526

语音识别 CNN

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6288864

Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639347

Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al.

https://infoscience.epfl.ch/record/210029/files/Palaz_INTERSPEECH_2015.pdf

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al.

https://pdfs.semanticscholar.org/8043/cbfed66c98d2255ea79254de620837478099.pdf

Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al.

https://arxiv.org/pdf/1509.08967.pdf

Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al.

https://arxiv.org/pdf/1604.01792.pdf

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al.

https://pdfs.semanticscholar.org/716e/60cbbdacf01b3148e91a555358a96308b770.pdf?_ga=2.38333155.198966451.1540996486-1278087525.1535180761

语音识别 LSTM

Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al.

https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43905.pdf

Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al.

https://arxiv.org/pdf/1703.07090.pdf

English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al.

https://arxiv.org/pdf/1703.02136.pdf

语音识别 CTC

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.6306&rep=rep1&type=pdf

Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al.

http://proceedings.mlr.press/v32/graves14.pdf

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al.

https://arxiv.org/pdf/1408.2873.pdf

Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al.

https://arxiv.org/pdf/1412.5567.pdf

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al.

https://arxiv.org/pdf/1511.06841.pdf

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al.

https://arxiv.org/pdf/1507.06947.pdf

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al.

https://arxiv.org/pdf/1609.06773.pdf

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al.

http://proceedings.mlr.press/v48/amodei16.pdf

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al.

https://arxiv.org/pdf/1609.03193.pdf

Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al.

https://arxiv.org/pdf/1702.06378.pdf

Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al.`

https://arxiv.org/pdf/1702.07793.pdf

语音识别 Sequence Transduction

Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al.

https://arxiv.org/pdf/1211.3711.pdf

语音识别 attention

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al.

https://arxiv.org/pdf/1412.1602.pdf

Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al.

https://arxiv.org/pdf/1506.07503.pdf

End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al.

https://arxiv.org/pdf/1508.04395.pdf

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al.

https://arxiv.org/pdf/1508.01211.pdf

End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian.

https://arxiv.org/pdf/1610.05361.pdf

Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al.

https://arxiv.org/pdf/1703.07754.pdf

语音识别多通道

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al.

http://www.ee.columbia.edu/~ronw/pubs/taslp2017-multichannel.pdf

Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al.

https://arxiv.org/pdf/1703.04783.pdf

语音合成 SampleRNN

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model(2016), Soroush Mehri et al.

https://arxiv.org/pdf/1612.07837.pdf

语音合成 WaveNet

WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al.

https://arxiv.org/pdf/1609.03499.pdf

语音合成 Deep Voice

Deep Voice: Real-time Neural Text-to-Speech(2017), Sercan O. Arik et al.

https://arxiv.org/pdf/1702.07825.pdf

语音合成 Deep Voice 2

Deep Voice 2: Multi-Speaker Neural Text-to-Speech(2017), Sercan Arik et al.

https://arxiv.org/pdf/1705.08947.pdf

语音合成 Tacotron

Tacotron: Towards End-to-End Speech Synthesis(2017), Yuxuan Wang et al.

https://pdfs.semanticscholar.org/f258/f0d3260e7fbdd961993086aaafa2afc714c9.pdf

语音合成 Tacotron 2

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions(2018), Jonathan Shen et al.

https://sigport.org/sites/default/files/docs/ICASSP%202018%20-%20Tacotron%202.pdf

语音合成 Voiceloop

Voiceloop: Voice Fitting and Synthesis via a Phonological Loop(2018), Yaniv Taigman et al.

https://arxiv.org/pdf/1707.06588.pdf

声纹识别 x-vector 使用TDNN提取语音的embedding

Deep Neural Network Embeddings for Text-Independent Speaker Veriﬁcation(2017), David Snyder et al.

http://danielpovey.com/files/2017_interspeech_embeddings.pdf

百度端到端声纹识别 Triplet Loss

Deep Speaker: an End-to-End Neural Speaker Embedding System(2017), Chao Li et al.

https://arxiv.org/pdf/1705.02304.pdf

声纹识别 3D卷积网络

Text-independent speaker verification using 3d convolutional neural networks(2018), Amirsina Torﬁ et al.

https://arxiv.org/pdf/1705.09422.pdf

声纹识别端到端 GE2E

Generalized End-to-End Loss for Speaker Verfication(2018) Wan L et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8462665

代码

kaldi

使用广泛的语音工具包

https://github.com/kaldi-asr/kaldi

A TensorFlow implementation of Baidu's DeepSpeech architecture

语音识别 Baidu DeepSpeech TensorFlow实现

https://github.com/mozilla/DeepSpeech

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

语音识别 DeepMind's WaveNet TensorFlow实现

https://github.com/buriburisuri/speech-to-text-wavenet

End-to-end automatic speech recognition system implemented in TensorFlow.

端到端语音识别 TensorFlow实现

https://github.com/zzw922cn/Automatic_Speech_Recognition

A PyTorch Implementation of End-to-End Models for Speech-to-Text

端到端语音识别 PyTorch实现

https://github.com/awni/speech

A PaddlePaddle implementation of DeepSpeech2 architecture for ASR.

语音识别 DeepSpeech2 PaddlePaddle实现

https://github.com/PaddlePaddle/DeepSpeech

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

语音合成 Tacotron TensorFlow实现

https://github.com/Kyubyong/tacotron

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

语音合成 Tacotron2 PyTorch实现

https://github.com/NVIDIA/tacotron2

Deep neural networks for voice conversion (voice style transfer) in Tensorflow

语音合成 Deep-voice TensorFlow实现

https://github.com/andabi/deep-voice-conversion

A method to generate speech across multiple speakers

语音合成 facebook PyTorch实现

https://github.com/facebookresearch/loop

Speaker embedding(verification and recognition) using Pytorch

声纹识别 PyTorch实现

https://github.com/qqueing/DeepSpeaker-pytorch

Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

声纹识别 3D卷积 TensorFlow实现

https://github.com/astofi/3D-convolutional-speaker-recognition

产品应用

百度语音官网

http://yuyin.baidu.com/

腾讯AI开放平台

https://ai.qq.com/product/aaiasr.shtml

讯飞开放平台

https://xfyun.cn/services/voicedictation

必应语音

https://azure.microsoft.com/zh-cn/services/cognitive-services/speech/

原文发布时间为：2018-11-5

本文作者：刘斌

本文来自云栖社区合作伙伴“ 专知”，了解相关信息可以关注“ 专知”。

微信关注我们

原文链接：https://yq.aliyun.com/articles/665231

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

唇语识别技术的开源教程，听不见声音我也能知道你说什么！

唇语识别并非最近才出现的技术，早在 2003 年，Intel 就开发了唇语识别软件 Audio Visual Speech Recognition（AVSR），开发者得以能够研发可以进行唇语识别的计算机；2016 年 Google DeepMind 的唇语识别技术就已经可以支持 17500 个词，新闻测试集识别准确率达到了 50% 以上。大家一定很好奇唇语识别系统要怎么实现。Amirsina Torfi 等人实现了使用 3D 卷积神经网络的交叉视听识别技术进行唇语识别，并将代码托管到 GitHub 上开源：传送门： https://github.com/astorfi/lip-reading-deeplearning 接下来就为大家介绍如何使用 3D 卷积神经网络的交叉视听识别技术进行唇语识别，完整的论文可参阅： https://ieeexplore.ieee.org/document/8063416 下面是进行唇语识别的简单实现方法。用户需要按照格式准备输入数据。该项目使用耦合 3D 卷积神经网络实现了视听匹配（audio-visual matching）。唇语识别就是这个项目...

2018-11-05

723

HanLP（Han Language Processing）是由一系列模型与算法组成的Java工具包，目标是普及自然语言处理在生产环境中的应用。 HanLP具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点。环境搭建 1.创建java项目，导入HanLP必要的包 2.把对应的配置文件放置在src下 3.修改hanlp.properties配置文件，使其指向data（data中包含词典和模型）的上级路径,修改如下，代码运行 1.第一个Demo System.out.println(HanLP.segment("你好，欢迎使用HanLP汉语处理包！"));//标准分词List standardList = StandardTokenizer.segment("商品和服务");System.out.println(standardList);结果：注意：HanLP.segment其实是对StandardTokenizer.segment的包装。 2.索引分词 List indexList = IndexTokenizer.segment("主副食品");for (Term te...

2018-11-06

602

资源下载

更多资源

Mario

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长，特征是大鼻子、头戴帽子、身穿背带裤，还留着胡子。与他的双胞胎兄弟路易基一起，长年担任任天堂的招牌角色。

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称，一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集，帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能，例如代码缩略图，Python的插件，代码段等。还可自定义键绑定，菜单和工具栏。Sublime Text 的主要功能包括：拼写检查，书签，完整的 Python API ， Goto 功能，即时项目切换，多选择，多窗口等等。Sublime Text 是一个跨平台的编辑器，同时支持Windows、Linux、Mac OS X等操作系统。