Skip to content

Commit

Permalink
Allow default inline lexicon, --ssml reads entire input
Browse files Browse the repository at this point in the history
  • Loading branch information
synesthesiam committed Nov 10, 2021
1 parent c742711 commit cd4f7e3
Show file tree
Hide file tree
Showing 7 changed files with 273 additions and 55 deletions.
1 change: 1 addition & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
### Changed

- Moved English data files to separate Python package so core can be updated without large download
- With --ssml, input from stdin is assumed to be one document instead of lines (override with --stdin-format lines)

### Fixed

Expand Down
94 changes: 94 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -321,11 +321,105 @@ A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) is supported:
* `<phoneme ph="...">` - supply phonemes for inner text
* `ph` - phonemes for each word of inner text, separated by whitespace
* `alphabet` - if "ipa", phonemes are intelligently split ("aːˈb" -> "aː", "ˈb")
* `<lexicon id="...">` - inline pronunciation lexicon
* `id` - unique id of lexicon (used in `<lookup ref="...">`)
* One or more `<lexeme>` child elements with:
* `<grapheme role="...">WORD</grapheme>` - word text (optional [role][#word-roles])
* `<phoneme>P H O N E M E S</phoneme>` - word pronunciation (phonemes separated by whitespace)
* `<lookup ref="...">` - use inline pronunciation lexicon for child elements
* `ref` - id from a `<lexicon id="...">`

#### Word Roles

During phonemization, word roles are used to disambiguate pronunciations. Unless manually specified, a word's role is derived from its part of speech tag as `gruut:<TAG>`. For initialisms and `spell-out`, the role `gruut:letter` is used to indicate that e.g., "a" should be spoken as `/eɪ/` instead of `/ə/`.

For `en-us`, the following additional roles are available from the part-of-speech tagger:

* `gruut:CD` - number
* `gruut:DT` - determiner
* `gruut:IN` - preposition or subordinating conjunction
* `gruut:JJ` - adjective
* `gruut:NN` - noun
* `gruut:PRP` - personal pronoun
* `gruut:RB` - adverb
* `gruut:VB` - verb
* `gruut:VB` - verb (past tense)

#### Inline Lexicons

Inline [pronunciation lexicons](https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/) are supported via the `<lexicon>` and `<lookup>` tags. gruut diverges slightly from the [SSML standard](https://www.w3.org/TR/speech-synthesis11/) here by only allowing lexicons to be defined within the SSML document itself. Additionally, the `id` attribute of the `<lexicon>` element can be left off to indicate a "default" inline lexicon that does not require a corresponding `<lookup>` tag.

For example, the following document will yield three different pronunciations for the word "tomato":

``` xml
<?xml version="1.0"?>
<speak version="1.1"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
xml:lang="en-US">

<lexicon xml:id="test" alphabet="ipa">
<lexeme>
<grapheme>
tomato
</grapheme>
<phoneme>
<!-- Individual phonemes are separated by whitespace -->
t ə m ˈɑ t oʊ
</phoneme>
</lexeme>
<lexeme>
<grapheme role="fake-role">
tomato
</grapheme>
<phoneme>
<!-- Made up pronunciation for fake word role -->
t ə m ˈi t oʊ
</phoneme>
</lexeme>
</lexicon>

<w>tomato</w>
<lookup ref="test">
<w>tomato</w>
<w role="fake-role">tomato</w>
</lookup>
</speak>
```

The first "tomato" will be looked up in the U.S. English lexicon (`/t ə m ˈeɪ t oʊ/`). Within the `<lookup>` tag's scope, the second and third "tomato" words will be looked up in the inline lexicon. The third "tomato" word has a [role](#word-roles) attached (selecting a made up pronunciation in this case).

Even further from the SSML standard, gruut allows you to leave off the `<lexicon>` id entirely. With no `id`, a `<lookup>` tag is no longer needed, allowing you to override the pronunciation of any word in the document:

``` xml
<?xml version="1.0"?>
<speak version="1.1"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
xml:lang="en-US">

<!-- No id means change all words without a lookup -->
<lexicon>
<lexeme>
<grapheme>
tomato
</grapheme>
<phoneme>
t ə m ˈɑ t oʊ
</phoneme>
</lexeme>
</lexicon>

<w>tomato</w>
</speak>
```

This will yield a pronunciation of `/t ə m ˈɑ t oʊ/` for all instances of "tomato" in the document (unless they have a `<lookup>`).

## Intended Audience

gruut is useful for transforming raw text into phonetic pronunciations, similar to [phonemizer](https://github.com/bootphon/phonemizer). Unlike phonemizer, gruut looks up words in a pre-built lexicon (pronunciation dictionary) or guesses word pronunciations with a pre-trained grapheme-to-phoneme model. Phonemes for each language come from a [carefully chosen inventory](https://en.wikipedia.org/wiki/Template:Language_phonologies).
Expand Down
34 changes: 33 additions & 1 deletion gruut/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import logging
import os
import sys
from enum import Enum
from pathlib import Path

import jsonlines
Expand All @@ -21,6 +22,20 @@
# Path to gruut base directory
_DIR = Path(__file__).parent


class StdinFormat(str, Enum):
"""Format of standard input"""

AUTO = "auto"
"""Choose based on SSML state"""

LINES = "lines"
"""Each line is a separate sentence/document"""

DOCUMENT = "document"
"""Entire input is one document"""


# -----------------------------------------------------------------------------


Expand Down Expand Up @@ -62,7 +77,18 @@ def main():
lines = args.text
else:
# Use stdin
lines = sys.stdin
stdin_format = StdinFormat.LINES

if (args.stdin_format == StdinFormat.AUTO) and args.ssml:
# Assume SSML input is entire document
stdin_format = StdinFormat.DOCUMENT

if stdin_format == StdinFormat.DOCUMENT:
# One big line
lines = [sys.stdin.read()]
else:
# Multiple lines
lines = sys.stdin

if os.isatty(sys.stdin.fileno()):
print("Reading input from stdin...", file=sys.stderr)
Expand Down Expand Up @@ -175,6 +201,12 @@ def get_args() -> argparse.Namespace:
parser.add_argument(
"--ssml", action="store_true", help="Input text is SSML",
)
parser.add_argument(
"--stdin-format",
choices=[str(v.value) for v in StdinFormat],
default=StdinFormat.AUTO,
help="Format of stdin text (default: auto)",
)

# Disable features
parser.add_argument(
Expand Down
3 changes: 3 additions & 0 deletions gruut/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,9 @@ class InterpretAs(str, Enum):
TIME = "time"
"""Word should be interpreted as a time on the clock"""

WORD = "word"
"""Interpret as regular word"""


class InterpretAsFormat(str, Enum):
"""Supported options for format attribute of <say-as>"""
Expand Down
Loading

0 comments on commit cd4f7e3

Please sign in to comment.