Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech accents #34

Open
Fogapod opened this issue Nov 2, 2023 · 1 comment · May be fixed by #42
Open

Speech accents #34

Fogapod opened this issue Nov 2, 2023 · 1 comment · May be fixed by #42

Comments

@Fogapod
Copy link
Contributor

Fogapod commented Nov 2, 2023

Summary

Accent system is used to modify speech before it is sent to chat to
simulate speech defects or status effects. Text replacement rules are
defined using special format.

Motivation

While it is possible to type any accent manually, it is handy to have
some automatic system. Additionally accents can act as limitations like
vision, hearing and other impairments.

Custom format should simplify accent creation by focusing on rules.

The result of this should at least have feature parity with
Unitystation accents, otherwise it is not worth the effort.

Guide-level explanation

Accents modify player speech in chat. Multiple accents can be applied on
top of each other, making message much less comprehensible.

Accents can be acquired in multiple ways: selected accent(s) during
character creation, wearing items items (clown mask), status effects
(alcohol consumption, low health) and maybe others.

Replacements are found in multiple passes. Each pass inside accent
has a name and consists of multiple rules which are combined into a
single regex. A rule says what to replace with what tag. Simplest
example of rule is: replace hello with Literal("bonjour").
Literal is one of the tags, it replaces original with given string.
Note that hello is actually a regex pattern, more complex things can
be matched.

Some of the tags are:

  • Original: does not replace (leaves original match as is)
  • Literal: puts given string
  • Any: selects random inner replacement with equal weights
  • Upper: converts inner result to uppercase
  • Lower: converts inner result to lowercase
  • Concat: runs left and right inner tags and adds them together

Some tags take others as an argument. For example, Upper:
Upper(Literal("bonjour")) will result in hello being replaced with
BONJOUR.

It is possible to define multiple intenisty levels of accent in the
same file. You can make accent get progressively worse as intensity goes
higher. Intensity can be either randomly assigned or get worse as effect
progresses (you get more drunk).

Ron example:

// This accent adds honks at the end of your messages (regex anchor $)
// On intencity 1+ it adds more honks and UPPERCASES EVERYTHING YOU SAY
(
    accent: {
        // `ending` pass. all regexes inside pass are merged. make sure to avoid overlaps
        "ending": (
            rules: {
                // 1 or 2 honks on default intensity of 0
                "$": {"Any": [
                    {"Literal": " HONK!"},
                    {"Literal": " HONK HONK!"},
                ]},
            },
        ),
    },
    intensities: {
        1: Extend({
            // merges with `ending` pass from accent body (intensity 0 implicitly)
            "ending": (
                rules: {
                    // overwrite "$" to be 2 to 3 honks
                    "$": {"Any": [
                        {"Literal": " HONK HONK!"},
                        {"Literal": " HONK HONK HONK!"},
                    ]}),
                },
            ),
            // gets placed at the end as new pass because `main` did not exist previously
            "main": (
                rules: {
                    // uppercase everything you say
                    ".+": {"Upper": {"Original": ()}}),
                },
            ),
        }),
    },
)

Reference-level explanation

General structure

Accent consists of 2 parts:

  • accent: intensity 0
  • intensities: a map from level to enum of Extend or Replace, containing intensity definition inside, same as accent

Accent is executed from top to bottom sequentially.

Regex patterns

Every pattern is compiled into regex meaning it has to be valid
rust regex syntax. While some
features are missing, regex crate provides excellent linear performance.

By default every regex is compiled with (?mi) flags (can be opted out by
writing (?-m).

Regexes inside each pass are merged which significantly improves perfomance
(~54x improvement for scotsman with 600+ rules) but does not handle overlaps.
If you have overlapping regexes, those must be placed into separate passes.

Case mimicking

Messages look much better if you copy original letter case. If user was
SCREAMING, you want your replacement to scream as well. If use
Capitalized something, you ideally want to preserve that. Best effort case
mimicking is enabled for literal. This currently includes:

  • do nothing if input is full lowercase
  • if input is all uppercase, convert output to full uppercase
  • if input and output have same lengths, copy case for each letter

This is currently ASCII only!!

Regex templating

Regex provides a powerful templating feature for free. It allows
capturing parts of regex into named or numbered groups and reusing them
as parts of replacement.
For example, Original is Literal("$0") where $0 expands to entire
regex match.

Tag trait

There are multiple default tags but when they are not enough, Tag can be
implemented which would automatically allow deserializing implementation
name. Implementation of Tag could look like this (not final):

use sayit::{
    Accent,
    Match,
    Tag,
};

// Deserialize is only required with `deserialize` crate feature
#[derive(Clone, Debug, serde::Deserialize)]
// transparent allows using `true` directly instead of `(true)`
#[serde(transparent)]
pub struct StringCase(bool);

// `typetag` is only required with `deserialize` crate feature
#[typetag::deserialize]
impl Tag for StringCase {
    fn generate<'a>(&self, m: &Match<'a>) -> std::borrow::Cow<'a, str> {
        if self.0 {
            m.get_match().to_uppercase()
        } else {
            m.get_match().to_lowercase()
        }.into()
    }
}

// construct accent that will uppercase all instances of "a" and lowercase all "b"
let accent = ron::from_str::<Accent>(
    r#"
(
    accent: {
        "main": (
            rules: {
                "a": {"StringCase": true},
                "b": {"StringCase": false},
            }
        ),
    }
)
"#,
)
.expect("accent did not parse");

assert_eq!(accent.say_it("abab ABAB Hello", 0), "AbAb AbAb Hello");

Intensities

Default intensity is 0 and it is always present in accent. Higher
intensities can be declared in optional intensities top level struct.
Key is intensity. This map is sparse meaning you can skip levels.
The highest possible level is selected.

There is 2 ways to define intensity:

Replace starts from scratch and only has its own set of rules.
Extend recursively looks at lower intensities up to 0 and merges them
together. If pattern conflicts with existing pattern on lower level it
is replaced (its relative position remains the same). All new rules are
added at the end of merged words and patterns arrays.

Drawbacks

Accent system as a whole

Some people might find accents annoying.
Impacts server performance by ~0.0001%

Tag system perfomance

This is mostly mitigated by merging regexes.

List of regular expressions will never be as performant as static
replacements. There are some potential optimizations like merging
patterns without any regex escape codes or some smart way to run
replacements in parallel, but list of static strings can be
replaced efficiently.

Other aspect of tag system is layers which add some overhead unless
compiled down but even then some tags might need nesting.

While these can be partially mitigated, it would increase code
complexity significantly.

Memory footprint

Compiled regexes are pretty large. Scotsman accent alone in CLI tool on release build shows up as ~130mb. Although not sure i measured it correctly.

Executable size / extra dependencies

Library was made as minimal as possible with 37 dependencies and ~1.1M
.rlib size. Further size decrease is possible by disabling regex optimizations.

Due to complexity of deserializable trait and dependency on regex there
~~are ~40 total dependencies in current WIP implementation and .rlib~~
~~release file is ~1.2M (unsure if it's correct way to measure binary~~
size).

Regex rule overlaps

This has been solved by regex passes.

It is harder (or maybe even impossible) to detect overlaps between regex
patterns as opposed to static strings. Users must be careful to not
overwrite other rules.

Patterns overwrite words

This has been solved by regex passes.

This problem is essentially the same as previous one. Rules are executed
top to bottom, words first and then patterns. It makes it hard or in
some cases even impossible to adequately combine words and single/double
character replacements.

Extreme verbosity

Even simplest tags like {"Literal": "..."} are extremely verbose. Ideally i would want
to deserialize String -> Literal, Vec<Box<dyn Tag>> -> Any, Map<u64, Box<dyn Tag>> -> Weights
but i did not find a way to do this yet. Not sure if it is possible.

Additionally there is a lot of nesting. I tried my best to keep accent as flat as possible
but there is simply too much going on.

Rationale and alternatives

Accent system as a whole

Alternative to not having accents is typing everything by hand all the
time and hoping players roleplay status effects.

Tag system

As for tag system, it potentially allows expressing very complex
patterns including arbitrary code via Custom tag impls that could in theory
even make http request or run LLM (lets not do that).

While being powerful and extensible, tag syntax remains readable.

Regex patterns

While being slower than static strings, regex is a powerful tool that
can simplify many accents.

Prior art

Other games

SS13

As far as I know, byond stations usually use json files with rules.

This works but has limitations.

Unitystation

Unitystation uses some proprietary Unity yaml asset format which they
use to define lists of replacements - words and patterns. After all
replacements custom code optionally runs.

Accent code: https://github.com/unitystation/unitystation/blob/be67b387b503f57c540b3311028ca4bf965dbfb0/UnityProject/Assets/Scripts/ScriptableObjects/SpeechModifier.cs
Folder with accents (see .asset files): https://github.com/unitystation/unitystation/tree/develop/UnityProject/Assets/ScriptableObjects/Speech

This is same system as byond and it has limitations.

SS14

Space Station 14 does not have any format. They define all accents with
pure c#.

Spanish accent: https://github.com/space-wizards/space-station-14/blob/effcc5d8277cd28f9739359e50fc268ada8f4ea6/Content.Server/Speech/EntitySystems/SpanishAccentSystem.cs#L5

This is simplest to implement but results in repetitive code and is
harder to read. This code is also hard to keep uniform across different
accents.

There is a helper method that handles world replacements with localization and case mimicking: https://github.com/space-wizards/space-station-14/blob/a0d159bac69169434a38500b386476c7affccf3d/Content.Server/Speech/EntitySystems/ReplacementAccentSystem.cs

Similar behaviour might be possible with custom Tag implementation that looks up localized string at creation time and seeds internal Literal with it.

Unresolved questions

  • Tag trait!!!
  • How to integrate this with SSNT
  • Custom trait options/message passing/generic over settings - likely
    impossible
  • Do benefits of tag system overweight the complexity that comes with it
  • Minimal set of replacement tags
  • Maybe a way to completely redefine accent / extend it like default
    Unitystation behaviour where custom code runs after all rules
    this is likely covered by passes/custom Tag implementations
  • How complex should be string case mimicking
  • The optimal way to do repetitions
  • Reusing data: you might want to add 2 items to array of 1000 words in
    next intensity level or use said array between multiple rules
  • Do tags need to have access to some state/context not now

Future possibilities

Accent system could possibly be reused for speech jumbling system:
turning speech into junk for non-speakers. One (bad) example might be
robot communications visible as ones and zeros for humans.

@Fogapod
Copy link
Contributor Author

Fogapod commented Nov 2, 2023

I am currently working on proof of concept for this tag system at https://git.based.computer/fogapod/sayit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant