From db1971b21587770ce75abeed093d83dd15231a5a Mon Sep 17 00:00:00 2001 From: Ming Date: Fri, 19 Aug 2022 15:19:00 +0800 Subject: [PATCH] update readme and add aistudio demo, test=doc (#2270) --- README.md | 185 +++++++++++++++++++++++++++++++++++++++++++-------- README_cn.md | 6 +- 2 files changed, 162 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index e35289e2b2c..3d1859f1f09 100644 --- a/README.md +++ b/README.md @@ -180,62 +180,191 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision ## Installation We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.3.1*. -Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md). + +### **Dependency Introduction** + ++ gcc >= 4.8.5 ++ paddlepaddle >= 2.3.1 ++ python >= 3.7 ++ OS support: Linux(recommend), Windows, Mac OSX + +PaddleSpeech depends on paddlepaddle. For installation, please refer to the official website of [paddlepaddle](https://www.paddlepaddle.org.cn/en) and choose according to your own machine. Here is an example of the cpu version. + +```bash +pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple +``` + +There are two quick installation methods for PaddleSpeech, one is pip installation, and the other is source code compilation (recommended). +### pip install + +```shell +pip install pytest-runner +pip install paddlespeech +``` + +### source code compilation + +```shell +git clone https://github.com/PaddlePaddle/PaddleSpeech.git +cd PaddleSpeech +pip install pytest-runner +pip install . +``` + +For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems ## Quick Start -Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text. +Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md) or Python. Change `--input` to test your own audio/text and support 16k wav format audio. + +**You can also quickly experience it in AI Studio 👉🏻 [PaddleSpeech API Demo](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660876445786)** + + +Test audio sample download -**Audio Classification** ```shell -paddlespeech cls --input input.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav +wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav ``` -**Speaker Verification** +### Automatic Speech Recognition + +
 (Click to expand)Open Source Speech Recognition + +**command line experience** + +```shell +paddlespeech asr --lang zh --input zh.wav ``` -paddlespeech vector --task spk --input input_16k.wav + +**Python API experience** + +```python +>>> from paddlespeech.cli.asr.infer import ASRExecutor +>>> asr = ASRExecutor() +>>> result = asr(audio_file="zh.wav") +>>> print(result) +我认为跑步最重要的就是给我带来了身体健康 ``` +
+ +### Text-to-Speech + +
 Open Source Speech Synthesis + +Output 24k sample rate wav format audio + + +**command line experience** -**Automatic Speech Recognition** ```shell -paddlespeech asr --lang zh --input input_16k.wav +paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav ``` -- web demo for Automatic Speech Recognition is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [ASR Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) -**Speech Translation** (English to Chinese) -(not support for Mac and Windows now) +**Python API experience** + +```python +>>> from paddlespeech.cli.tts.infer import TTSExecutor +>>> tts = TTSExecutor() +>>> tts(text="今天天气十分不错。", output="output.wav") +``` +- You can experience in [Huggingface Spaces](https://huggingface.co/spaces) [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) + +
+ +### Audio Classification + +
 An open-domain sound classification tool + +Sound classification model based on 527 categories of AudioSet dataset + +**command line experience** + ```shell -paddlespeech st --input input_16k.wav +paddlespeech cls --input zh.wav ``` -**Text-to-Speech** +**Python API experience** + +```python +>>> from paddlespeech.cli.cls.infer import CLSExecutor +>>> cls = CLSExecutor() +>>> result = cls(audio_file="zh.wav") +>>> print(result) +Speech 0.9027186632156372 +``` + +
+ +### Voiceprint Extraction + +
 Industrial-grade voiceprint extraction tool + +**command line experience** + ```shell -paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav +paddlespeech vector --task spk --input zh.wav ``` -- web demo for Text to Speech is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) -**Text Postprocessing** -- Punctuation Restoration - ```bash - paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 - ``` +**Python API experience** -**Batch Process** +```python +>>> from paddlespeech.cli.vector import VectorExecutor +>>> vec = VectorExecutor() +>>> result = vec(audio_file="zh.wav") +>>> print(result) # 187维向量 +[ -0.19083306 9.474295 -14.122263 -2.0916545 0.04848729 + 4.9295826 1.4780062 0.3733844 10.695862 3.2697146 + -4.48199 -0.6617882 -9.170393 -11.1568775 -1.2358263 ...] ``` -echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts + +
+ +### Punctuation Restoration + +
 Quick recovery of text punctuation, works with ASR models + +**command line experience** + +```shell +paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭 ``` -**Shell Pipeline** -- ASR + Punctuation Restoration +**Python API experience** + +```python +>>> from paddlespeech.cli.text.infer import TextExecutor +>>> text_punc = TextExecutor() +>>> result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭") +今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。 ``` -paddlespeech asr --input ./zh.wav | paddlespeech text --task punc + +
+ +### Speech Translation + +
 End-to-end English to Chinese Speech Translation Tool + +Use pre-compiled kaldi related tools, only support experience in Ubuntu system + +**command line experience** + +```shell +paddlespeech st --input en.wav ``` -For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos) +**Python API experience** -If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md). +```python +>>> from paddlespeech.cli.st.infer import STExecutor +>>> st = STExecutor() +>>> result = st(audio_file="en.wav") +['我 在 这栋 建筑 的 古老 门上 敲门 。'] +``` + +
@@ -243,6 +372,8 @@ If you want to try more functions like training and tuning, please have a look a Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md). +**You can try it quickly in AI Studio (recommend): [SpeechServer](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660877827034)** + **Start server** ```shell diff --git a/README_cn.md b/README_cn.md index 1c6a949fd78..e18823151f6 100644 --- a/README_cn.md +++ b/README_cn.md @@ -225,7 +225,7 @@ pip install . 安装完成后,开发者可以通过命令行或者Python快速开始,命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试,支持16k wav格式音频。 -你也可以在`aistudio`中快速体验 👉🏻[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)。 +你也可以在`aistudio`中快速体验 👉🏻[一键预测,快速上手Speech开发任务](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660878142250)。 测试音频示例下载 ```shell @@ -373,7 +373,9 @@ python API 一键预测 ## 快速使用服务 -安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类三种服务。 +安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类等多种服务。 + +你可以在 AI Studio 中快速体验:[SpeechServer一键部署](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660878208266) **启动服务** ```shell