Skip to content

Commit

Permalink
update readme and add aistudio demo, test=doc (#2270)
Browse files Browse the repository at this point in the history
  • Loading branch information
iftaken authored and yt605155624 committed Aug 19, 2022
1 parent 9203007 commit db1971b
Show file tree
Hide file tree
Showing 2 changed files with 162 additions and 29 deletions.
185 changes: 158 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,69 +180,200 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
## Installation

We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.3.1*.
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).

### **Dependency Introduction**

+ gcc >= 4.8.5
+ paddlepaddle >= 2.3.1
+ python >= 3.7
+ OS support: Linux(recommend), Windows, Mac OSX

PaddleSpeech depends on paddlepaddle. For installation, please refer to the official website of [paddlepaddle](https://www.paddlepaddle.org.cn/en) and choose according to your own machine. Here is an example of the cpu version.

```bash
pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```

There are two quick installation methods for PaddleSpeech, one is pip installation, and the other is source code compilation (recommended).
### pip install

```shell
pip install pytest-runner
pip install paddlespeech
```

### source code compilation

```shell
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
pip install pytest-runner
pip install .
```

For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems


<a name="quickstart"></a>
## Quick Start

Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text.
Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md) or Python. Change `--input` to test your own audio/text and support 16k wav format audio.

**You can also quickly experience it in AI Studio 👉🏻 [PaddleSpeech API Demo](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660876445786)**


Test audio sample download

**Audio Classification**
```shell
paddlespeech cls --input input.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
```

**Speaker Verification**
### Automatic Speech Recognition

<details><summary>&emsp;(Click to expand)Open Source Speech Recognition</summary>

**command line experience**

```shell
paddlespeech asr --lang zh --input zh.wav
```
paddlespeech vector --task spk --input input_16k.wav

**Python API experience**

```python
>>> from paddlespeech.cli.asr.infer import ASRExecutor
>>> asr = ASRExecutor()
>>> result = asr(audio_file="zh.wav")
>>> print(result)
我认为跑步最重要的就是给我带来了身体健康
```
</details>

### Text-to-Speech

<details><summary>&emsp;Open Source Speech Synthesis</summary>

Output 24k sample rate wav format audio


**command line experience**

**Automatic Speech Recognition**
```shell
paddlespeech asr --lang zh --input input_16k.wav
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav
```
- web demo for Automatic Speech Recognition is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [ASR Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR)

**Speech Translation** (English to Chinese)
(not support for Mac and Windows now)
**Python API experience**

```python
>>> from paddlespeech.cli.tts.infer import TTSExecutor
>>> tts = TTSExecutor()
>>> tts(text="今天天气十分不错。", output="output.wav")
```
- You can experience in [Huggingface Spaces](https://huggingface.co/spaces) [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)

</details>

### Audio Classification

<details><summary>&emsp;An open-domain sound classification tool</summary>

Sound classification model based on 527 categories of AudioSet dataset

**command line experience**

```shell
paddlespeech st --input input_16k.wav
paddlespeech cls --input zh.wav
```

**Text-to-Speech**
**Python API experience**

```python
>>> from paddlespeech.cli.cls.infer import CLSExecutor
>>> cls = CLSExecutor()
>>> result = cls(audio_file="zh.wav")
>>> print(result)
Speech 0.9027186632156372
```

</details>

### Voiceprint Extraction

<details><summary>&emsp;Industrial-grade voiceprint extraction tool</summary>

**command line experience**

```shell
paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav
paddlespeech vector --task spk --input zh.wav
```
- web demo for Text to Speech is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)

**Text Postprocessing**
- Punctuation Restoration
```bash
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
**Python API experience**

**Batch Process**
```python
>>> from paddlespeech.cli.vector import VectorExecutor
>>> vec = VectorExecutor()
>>> result = vec(audio_file="zh.wav")
>>> print(result) # 187维向量
[ -0.19083306 9.474295 -14.122263 -2.0916545 0.04848729
4.9295826 1.4780062 0.3733844 10.695862 3.2697146
-4.48199 -0.6617882 -9.170393 -11.1568775 -1.2358263 ...]
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts

</details>

### Punctuation Restoration

<details><summary>&emsp;Quick recovery of text punctuation, works with ASR models</summary>

**command line experience**

```shell
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```

**Shell Pipeline**
- ASR + Punctuation Restoration
**Python API experience**

```python
>>> from paddlespeech.cli.text.infer import TextExecutor
>>> text_punc = TextExecutor()
>>> result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc

</details>

### Speech Translation

<details><summary>&emsp;End-to-end English to Chinese Speech Translation Tool</summary>

Use pre-compiled kaldi related tools, only support experience in Ubuntu system

**command line experience**

```shell
paddlespeech st --input en.wav
```

For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
**Python API experience**

If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).
```python
>>> from paddlespeech.cli.st.infer import STExecutor
>>> st = STExecutor()
>>> result = st(audio_file="en.wav")
['我 在 这栋 建筑 的 古老 门上 敲门 。']
```

</details>


<a name="quickstartserver"></a>
## Quick Start Server

Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).

**You can try it quickly in AI Studio (recommend): [SpeechServer](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660877827034)**

**Start server**

```shell
Expand Down
6 changes: 4 additions & 2 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ pip install .

安装完成后,开发者可以通过命令行或者Python快速开始,命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试,支持16k wav格式音频。

你也可以在`aistudio`中快速体验 👉🏻[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)
你也可以在`aistudio`中快速体验 👉🏻[一键预测,快速上手Speech开发任务](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660878142250)

测试音频示例下载
```shell
Expand Down Expand Up @@ -373,7 +373,9 @@ python API 一键预测

<a name="快速使用服务"></a>
## 快速使用服务
安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类三种服务。
安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类等多种服务。

你可以在 AI Studio 中快速体验:[SpeechServer一键部署](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660878208266)

**启动服务**
```shell
Expand Down

0 comments on commit db1971b

Please sign in to comment.