Kuon is an in-development voice assistant that supports both language and text input, and provides text and voice output. It can integrate with the openai interface for dialogue modeling. The various functional modules are independent and can be freely combined.
Based on previous branches, Kuon has been restructured extensively, with the main focus on separating the various functional modules and using an interface-based approach instead of the previous integrated approach, greatly improving speed. This also allows for running on different terminals in the future.
- Communication using text input, with text and voice output
- Communication using voice input, with text and voice output
- Check function to test if each part is working properly
- Development of Kuon's character using anime subtitles as prompts (in progress)
- Train a better VITS model
- Text commands
- Voice commands
- Assistants mode
Since the various functions have been separated as much as possible, the configuration has also changed from one to multiple, but it's not too difficult.
-
Conda virtual environment
conda create -n kuon python=3.10 conda activate kuon
-
PyTorch (not necessary if using the interface approach):
# CUDA 11.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-
Install basic packages:
pip install -r requirements.txt
For language playback, the
playsound
package is used. If you are using Windows, you need to modify the source code and remove the part that decodes using UTF-16 inLib\site-packages\playsound.py
. -
Alternatively, clone the conda configuration file (instead of the previous steps) Export:
conda create --name kuon --file environment.yml
Export:
conda env export > environment.yml
This functionality has been separated into the repository AutomaticSpeechRecognition. The relevant code has been added to the kuonasr
folder in this repository. By default, the funasr
interface is used. For service deployment, please refer to the instructions in the AutomaticSpeechRecognition repository. If you want to use a different method, please refer to the instructions in that repository.
Before using, please modify the configuration file kuonasr/config.yaml
according to the location of the funasr service:
funasr:
url: ws://172.0.0.1:1234
You can use the following script to fetch the latest code (usually not necessary):
```bash
cd script
python asr_update.py
This functionality has been separated into the repository TextToSpeech. The repository is divided into the fineturn part for training models and the kuontts part for inference. The inference code has been added to the kuontts folder in this repository.
By default, the interface approach is used, so there is no need to install any environment or include any models here. You need to modify the request URL and speaker in the kuontts/config.yaml file according to the deployed service. For service deployment and other issues, please refer to the instructions in the TextToSpeech repository.
If you want to use the offline approach, modify the configuration file kuontts/config.yaml by changing online to offline, and place the model in kuontts/offline/OUTPUT_MODEL. For now, I have trained a model for Paimon to use.
If you need to update the code, you can execute:
cd script
python tts_update.py
The dialogue model is integrated using the openai API. Specify the integration object in the llm/config.yaml file.
url : http://172.0.0.1:1234/v1/chat/completions
key : qmdr-xxxx
model : gpt-3.5-turbo-16k
timeout : 120 # 历史对话的遗忘时间
preset : ./llm/kuon.json
The timeout parameter represents the time after which historical conversations are forgotten. The preset parameter is used for role-playing prompts.
Currently, there are other open-source models that can be deployed using the openai API. After deploying them, modify this configuration file accordingly. I tried using qwen-7b, but the role-playing effect was very poor. Currently, gpt-4 is the best solution.
Before using, you can test each function to see if it is working properly. There are tests for voice input, speech-to-text, dialogue model, text-to-speech, and playback. Each test is independent and can be skipped.
python check.py
By default, all logs will be printed. You can modify the log filter section in the configuration file to only output error logs, for example:
log_filter : True
log_filter_level : WARNING
Modify some default options in the configuration file in the root directory:
# 是否开启语音输出
voice_output_sw : True
# 是否开启文本输出
text_output_sw : True
# 是否启动时开启语音输入
audio_input_sw : False
Start the program:
python kuon.py