You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Accent system is used to modify speech before it is sent to chat to
simulate speech defects or status effects. Text replacement rules are
defined using special format.
Motivation
While it is possible to type any accent manually, it is handy to have
some automatic system. Additionally accents can act as limitations like
vision, hearing and other impairments.
Custom format should simplify accent creation by focusing on rules.
The result of this should at least have feature parity with
Unitystation accents, otherwise it is not worth the effort.
Guide-level explanation
Accents modify player speech in chat. Multiple accents can be applied on
top of each other, making message much less comprehensible.
Accents can be acquired in multiple ways: selected accent(s) during
character creation, wearing items items (clown mask), status effects
(alcohol consumption, low health) and maybe others.
Replacements are found in multiple passes. Each pass inside accent
has a name and consists of multiple rules which are combined into a
single regex. A rule says what to replace with what tag. Simplest
example of rule is: replace hello with Literal("bonjour"). Literal is one of the tags, it replaces original with given string.
Note that hello is actually a regex pattern, more complex things can
be matched.
Some of the tags are:
Original: does not replace (leaves original match as is)
Literal: puts given string
Any: selects random inner replacement with equal weights
Upper: converts inner result to uppercase
Lower: converts inner result to lowercase
Concat: runs left and right inner tags and adds them together
Some tags take others as an argument. For example, Upper: Upper(Literal("bonjour")) will result in hello being replaced with BONJOUR.
It is possible to define multiple intenisty levels of accent in the
same file. You can make accent get progressively worse as intensity goes
higher. Intensity can be either randomly assigned or get worse as effect
progresses (you get more drunk).
Ron example:
// This accent adds honks at the end of your messages (regex anchor $)// On intencity 1+ it adds more honks and UPPERCASES EVERYTHING YOU SAY(
accent:{// `ending` pass. all regexes inside pass are merged. make sure to avoid overlaps"ending":(
rules:{// 1 or 2 honks on default intensity of 0"$":{"Any":[{"Literal":" HONK!"},{"Literal":" HONK HONK!"},]},},),},intensities:{1:Extend({// merges with `ending` pass from accent body (intensity 0 implicitly)"ending":(
rules:{// overwrite "$" to be 2 to 3 honks"$":{"Any":[{"Literal":" HONK HONK!"},{"Literal":" HONK HONK HONK!"},]}),},),// gets placed at the end as new pass because `main` did not exist previously
"main":(
rules:{// uppercase everything you say".+":{"Upper":{"Original":()}}),},),}),},)
Reference-level explanation
General structure
Accent consists of 2 parts:
accent: intensity 0
intensities: a map from level to enum of Extend or Replace, containing intensity definition inside, same as accent
Accent is executed from top to bottom sequentially.
Regex patterns
Every pattern is compiled into regex meaning it has to be valid rust regex syntax. While some
features are missing, regex crate provides excellent linear performance.
By default every regex is compiled with (?mi) flags (can be opted out by
writing (?-m).
Regexes inside each pass are merged which significantly improves perfomance
(~54x improvement for scotsman with 600+ rules) but does not handle overlaps.
If you have overlapping regexes, those must be placed into separate passes.
Case mimicking
Messages look much better if you copy original letter case. If user was
SCREAMING, you want your replacement to scream as well. If use
Capitalized something, you ideally want to preserve that. Best effort case
mimicking is enabled for literal. This currently includes:
do nothing if input is full lowercase
if input is all uppercase, convert output to full uppercase
if input and output have same lengths, copy case for each letter
This is currently ASCII only!!
Regex templating
Regex provides a powerful templating feature for free. It allows
capturing parts of regex into named or numbered groups and reusing them
as parts of replacement.
For example, Original is Literal("$0") where $0 expands to entire
regex match.
Tag trait
There are multiple default tags but when they are not enough, Tag can be
implemented which would automatically allow deserializing implementation
name. Implementation of Tag could look like this (not final):
use sayit::{Accent,Match,Tag,};// Deserialize is only required with `deserialize` crate feature#[derive(Clone,Debug, serde::Deserialize)]// transparent allows using `true` directly instead of `(true)`#[serde(transparent)]pubstructStringCase(bool);// `typetag` is only required with `deserialize` crate feature#[typetag::deserialize]implTagforStringCase{fngenerate<'a>(&self,m:&Match<'a>) -> std::borrow::Cow<'a,str>{ifself.0{
m.get_match().to_uppercase()}else{
m.get_match().to_lowercase()}.into()}}// construct accent that will uppercase all instances of "a" and lowercase all "b"let accent = ron::from_str::<Accent>(r#"( accent: { "main": ( rules: { "a": {"StringCase": true}, "b": {"StringCase": false}, } ), })"#,).expect("accent did not parse");assert_eq!(accent.say_it("abab ABAB Hello",0),"AbAb AbAb Hello");
Intensities
Default intensity is 0 and it is always present in accent. Higher
intensities can be declared in optional intensities top level struct.
Key is intensity. This map is sparse meaning you can skip levels.
The highest possible level is selected.
There is 2 ways to define intensity:
Replace starts from scratch and only has its own set of rules. Extend recursively looks at lower intensities up to 0 and merges them
together. If pattern conflicts with existing pattern on lower level it
is replaced (its relative position remains the same). All new rules are
added at the end of merged words and patterns arrays.
Drawbacks
Accent system as a whole
Some people might find accents annoying.
Impacts server performance by ~0.0001%
Tag system perfomance
This is mostly mitigated by merging regexes.
List of regular expressions will never be as performant as static replacements. There are some potential optimizations like merging patterns without any regex escape codes or some smart way to run replacements in parallel, but list of static strings can be replaced efficiently.
Other aspect of tag system is layers which add some overhead unless compiled down but even then some tags might need nesting.
While these can be partially mitigated, it would increase code complexity significantly.
Memory footprint
Compiled regexes are pretty large. Scotsman accent alone in CLI tool on release build shows up as ~130mb. Although not sure i measured it correctly.
Executable size / extra dependencies
Library was made as minimal as possible with 37 dependencies and ~1.1M
.rlib size. Further size decrease is possible by disabling regex optimizations.
Due to complexity of deserializable trait and dependency on regex there
~~are ~40 total dependencies in current WIP implementation and .rlib~~
~~release file is ~1.2M (unsure if it's correct way to measure binary~~ size).
Regex rule overlaps
This has been solved by regex passes.
It is harder (or maybe even impossible) to detect overlaps between regex patterns as opposed to static strings. Users must be careful to not overwrite other rules.
Patterns overwrite words
This has been solved by regex passes.
This problem is essentially the same as previous one. Rules are executed top to bottom, words first and then patterns. It makes it hard or in some cases even impossible to adequately combine words and single/double character replacements.
Extreme verbosity
Even simplest tags like {"Literal": "..."} are extremely verbose. Ideally i would want
to deserialize String -> Literal, Vec<Box<dyn Tag>> -> Any, Map<u64, Box<dyn Tag>> -> Weights
but i did not find a way to do this yet. Not sure if it is possible.
Additionally there is a lot of nesting. I tried my best to keep accent as flat as possible
but there is simply too much going on.
Rationale and alternatives
Accent system as a whole
Alternative to not having accents is typing everything by hand all the
time and hoping players roleplay status effects.
Tag system
As for tag system, it potentially allows expressing very complex
patterns including arbitrary code via Custom tag impls that could in theory
even make http request or run LLM (lets not do that).
While being powerful and extensible, tag syntax remains readable.
Regex patterns
While being slower than static strings, regex is a powerful tool that
can simplify many accents.
Prior art
Other games
SS13
As far as I know, byond stations usually use json files with rules.
This works but has limitations.
Unitystation
Unitystation uses some proprietary Unity yaml asset format which they
use to define lists of replacements - words and patterns. After all
replacements custom code optionally runs.
Similar behaviour might be possible with custom Tag implementation that looks up localized string at creation time and seeds internal Literal with it.
Unresolved questions
Tag trait!!!
How to integrate this with SSNT
Custom trait options/message passing/generic over settings - likely
impossible
Do benefits of tag system overweight the complexity that comes with it
Minimal set of replacement tags
Maybe a way to completely redefine accent / extend it like default Unitystation behaviour where custom code runs after all rules
this is likely covered by passes/custom Tag implementations
How complex should be string case mimicking
The optimal way to do repetitions
Reusing data: you might want to add 2 items to array of 1000 words in
next intensity level or use said array between multiple rules
Do tags need to have access to some state/context not now
Future possibilities
Accent system could possibly be reused for speech jumbling system:
turning speech into junk for non-speakers. One (bad) example might be
robot communications visible as ones and zeros for humans.
The text was updated successfully, but these errors were encountered:
Summary
Accent system is used to modify speech before it is sent to chat to
simulate speech defects or status effects. Text replacement rules are
defined using special format.
Motivation
While it is possible to type any accent manually, it is handy to have
some automatic system. Additionally accents can act as limitations like
vision, hearing and other impairments.
Custom format should simplify accent creation by focusing on rules.
The result of this should at least have feature parity with
Unitystation accents, otherwise it is not worth the effort.
Guide-level explanation
Accents modify player speech in chat. Multiple accents can be applied on
top of each other, making message much less comprehensible.
Accents can be acquired in multiple ways: selected accent(s) during
character creation, wearing items items (clown mask), status effects
(alcohol consumption, low health) and maybe others.
Replacements are found in multiple passes. Each pass inside accent
has a name and consists of multiple rules which are combined into a
single regex. A rule says what to replace with what tag. Simplest
example of rule is: replace
hello
withLiteral("bonjour")
.Literal
is one of the tags, it replaces original with given string.Note that
hello
is actually a regex pattern, more complex things canbe matched.
Some of the tags are:
Some tags take others as an argument. For example,
Upper
:Upper(Literal("bonjour"))
will result inhello
being replaced withBONJOUR
.It is possible to define multiple
intenisty
levels of accent in thesame file. You can make accent get progressively worse as intensity goes
higher. Intensity can be either randomly assigned or get worse as effect
progresses (you get more drunk).
Ron example:
Reference-level explanation
General structure
Accent consists of 2 parts:
accent
: intensity 0intensities
: a map from level to enum ofExtend
orReplace
, containing intensity definition inside, same asaccent
Accent is executed from top to bottom sequentially.
Regex patterns
Every pattern is compiled into regex meaning it has to be valid
rust regex syntax. While some
features are missing, regex crate provides excellent linear performance.
By default every regex is compiled with
(?mi)
flags (can be opted out bywriting
(?-m)
.Regexes inside each pass are merged which significantly improves perfomance
(~54x improvement for scotsman with 600+ rules) but does not handle overlaps.
If you have overlapping regexes, those must be placed into separate passes.
Case mimicking
Messages look much better if you copy original letter case. If user was
SCREAMING, you want your replacement to scream as well. If use
Capitalized something, you ideally want to preserve that. Best effort case
mimicking is enabled for literal. This currently includes:
This is currently ASCII only!!
Regex templating
Regex provides a powerful templating feature for free. It allows
capturing parts of regex into named or numbered groups and reusing them
as parts of replacement.
For example,
Original
isLiteral("$0")
where$0
expands to entireregex match.
Tag trait
There are multiple default tags but when they are not enough, Tag can be
implemented which would automatically allow deserializing implementation
name. Implementation of Tag could look like this (not final):
Intensities
Default intensity is 0 and it is always present in accent. Higher
intensities can be declared in optional
intensities
top level struct.Key is intensity. This map is sparse meaning you can skip levels.
The highest possible level is selected.
There is 2 ways to define intensity:
Replace
starts from scratch and only has its own set of rules.Extend
recursively looks at lower intensities up to 0 and merges themtogether. If pattern conflicts with existing pattern on lower level it
is replaced (its relative position remains the same). All new rules are
added at the end of merged
words
andpatterns
arrays.Drawbacks
Accent system as a whole
Some people might find accents annoying.
Impacts server performance by ~0.0001%
Tag system perfomance
This is mostly mitigated by merging regexes.
List of regular expressions will never be as performant as staticreplacements. There are some potential optimizations like mergingpatterns without any regex escape codes or some smart way to runreplacements in parallel, but list of static strings can bereplaced efficiently.Other aspect of tag system is layers which add some overhead unlesscompiled down but even then some tags might need nesting.While these can be partially mitigated, it would increase codecomplexity significantly.Memory footprint
Compiled regexes are pretty large. Scotsman accent alone in CLI tool on release build shows up as
~130mb
. Although not sure i measured it correctly.Executable size / extra dependencies
Library was made as minimal as possible with 37 dependencies and ~1.1M
.rlib size. Further size decrease is possible by disabling regex optimizations.
Due to complexity of deserializable trait and dependency on regex there~~are ~40 total dependencies in current WIP implementation and .rlib~~
~~release file is ~1.2M (unsure if it's correct way to measure binary~~
size).Regex rule overlaps
This has been solved by regex passes.
It is harder (or maybe even impossible) to detect overlaps between regexpatterns as opposed to static strings. Users must be careful to notoverwrite other rules.Patterns overwrite words
This has been solved by regex passes.
This problem is essentially the same as previous one. Rules are executedtop to bottom, words first and then patterns. It makes it hard or insome cases even impossible to adequately combine words and single/doublecharacter replacements.Extreme verbosity
Even simplest tags like
{"Literal": "..."}
are extremely verbose. Ideally i would wantto deserialize String -> Literal,
Vec<Box<dyn Tag>>
-> Any,Map<u64, Box<dyn Tag>>
-> Weightsbut i did not find a way to do this yet. Not sure if it is possible.
Additionally there is a lot of nesting. I tried my best to keep accent as flat as possible
but there is simply too much going on.
Rationale and alternatives
Accent system as a whole
Alternative to not having accents is typing everything by hand all the
time and hoping players roleplay status effects.
Tag system
As for tag system, it potentially allows expressing very complex
patterns including arbitrary code via Custom tag impls that could in theory
even make http request or run LLM (lets not do that).
While being powerful and extensible, tag syntax remains readable.
Regex patterns
While being slower than static strings, regex is a powerful tool that
can simplify many accents.
Prior art
Other games
SS13
As far as I know, byond stations usually use json files with rules.
This works but has limitations.
Unitystation
Unitystation uses some proprietary Unity yaml asset format which they
use to define lists of replacements - words and patterns. After all
replacements custom code optionally runs.
Accent code: https://github.com/unitystation/unitystation/blob/be67b387b503f57c540b3311028ca4bf965dbfb0/UnityProject/Assets/Scripts/ScriptableObjects/SpeechModifier.cs
Folder with accents (see
.asset
files): https://github.com/unitystation/unitystation/tree/develop/UnityProject/Assets/ScriptableObjects/SpeechThis is same system as byond and it has limitations.
SS14
Space Station 14 does not have any format. They define all accents with
pure c#.
Spanish accent: https://github.com/space-wizards/space-station-14/blob/effcc5d8277cd28f9739359e50fc268ada8f4ea6/Content.Server/Speech/EntitySystems/SpanishAccentSystem.cs#L5
This is simplest to implement but results in repetitive code and is
harder to read. This code is also hard to keep uniform across different
accents.
There is a helper method that handles world replacements with localization and case mimicking: https://github.com/space-wizards/space-station-14/blob/a0d159bac69169434a38500b386476c7affccf3d/Content.Server/Speech/EntitySystems/ReplacementAccentSystem.cs
Similar behaviour might be possible with custom Tag implementation that looks up localized string at creation time and seeds internal
Literal
with it.Unresolved questions
Tag trait!!!How to integrate this with SSNTCustom trait options/message passing/generic over settings - likelyimpossible
Minimal set of replacement tagsMaybe a way to completely redefine accent / extend it like defaultUnitystation behaviour where custom code runs after all rulesthis is likely covered by passes/custom Tag implementations
How complex should be string case mimickingnext intensity level or use said array between multiple rules
Do tags need to have access to some state/contextnot nowFuture possibilities
Accent system could possibly be reused for speech jumbling system:
turning speech into junk for non-speakers. One (bad) example might be
robot communications visible as ones and zeros for humans.
The text was updated successfully, but these errors were encountered: