You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On Asterisk 13.5+ combined with LumenVox ASR, we're noticing that UniMRCP-based speech recognition is failing with the following error: ERROR Adhearsion::Translator::Asterisk: <Nokogiri::XML::SyntaxError> The value following "version" in the XML declaration must be a quoted string.
The reason for this is that Asterisk 13.5+ now escapes several characters - including '"? - with backslashes \ now for all VarSet (channel variable set) events. So ALL channel variables, including the $RECOG_RESULT variable for conveying NLSML results from speech recognition, are now subject to a different encoding than before.
Add to that, despite the fact that Adhearsion enables the UniMRCP uer option (URI-encoded results), single quote ' is one of the characters that is not typically URI-encoded - and so the single-quotes included in a LumenVox response are not URI-encoded, triggering Asterisk 13.5+'s new functionality to intercede and replace instances of ' with \':
new 2-character AMI Representation in Asterisk >= 13.5
\a (0x07) Alert (Beep, Bell)
\a (0x5c 0x61)
\b (0x08) Backspace
\b (0x5c 0x62)
\f (0x0C) Formfeed Page Break
\f (0x5c 0x66)
\n (0x0A) Newline (Line Feed)
\n (0x5c 0x6E)
\r (0x0D) Carriage Return
\r (0x5c 0x72)
\t (0x09) Horizontal Tab
\t (0x5c 0x74)
\v (0x0B) Vertical Tab
\v (0x5c 0x75)
\ (0x5C) Backslash
\\ (0x5c 0x5c)
' (0x27) Apostrophe or single quotation mark
\' (0x5c 0x27)
" (0x22) Double quotation mark
\" (0x5c 0x22)
? (0x3F) question mark
\? (0x5c 0x3F)
Some Strategies for Resolution
We could just always attempt to unescape \, in all versions of Asterisk. Cons: This would be a change in behavior, and could potentially corrupt data in Asterisk < 13.5.
We could activate auto-unescaping based on RubyAMI::Stream#version being >= 2.8.0 since the issue was introduced as AMI_VERSION moved from 2.7.0 to 2.8.0. Pro: 0-configuration, "It just works" solution. Cons:
A complex, stateful solution.
Introduces the concept of separate modes of Asterisk compatibility.
We could decide whether unescape or not based on a config value of some sort being enabled. Pro:
Straightforward to implement & test.
We can decide whether or not to default the option to ON or OFF. Cons:
Introduces the concept of separate modes of Asterisk compatibility.
NOT 0-configuration -- Rather, if you hit this error, you may have to do a web search for this error and learn that you need to flip this configuration option ON to resolve.
My leaning is towards option 32. But I'm very interested in other points of view on the matter. 👀
My preference is option 2. That's exactly why we detect the AMI version, so we can take care of protocol issues like this and not make the consumer worry about it.
On Asterisk 13.5+ combined with LumenVox ASR, we're noticing that UniMRCP-based speech recognition is failing with the following error:
ERROR Adhearsion::Translator::Asterisk: <Nokogiri::XML::SyntaxError> The value following "version" in the XML declaration must be a quoted string.
The reason for this is that Asterisk 13.5+ now escapes several characters - including
'
"
?
- with backslashes\
now for all VarSet (channel variable set) events. So ALL channel variables, including the $RECOG_RESULT variable for conveying NLSML results from speech recognition, are now subject to a different encoding than before.Add to that, despite the fact that Adhearsion enables the UniMRCP
uer
option (URI-encoded results), single quote'
is one of the characters that is not typically URI-encoded - and so the single-quotes included in a LumenVox response are not URI-encoded, triggering Asterisk 13.5+'s new functionality to intercede and replace instances of'
with\'
:Decoded:
<?xml version=\'1.0\' encoding=\'ISO-8859-1\' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.96"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ❌malformed with
\'
In contrast, here's how that variable would be received prior to Asterisk 13.5:
Decoded:
<?xml version='1.0' encoding='ISO-8859-1' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.92"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ✅valid NLSML
The back-slashing of the following characters was introduced with this change in ASTERISK-24934 [patch]Asterisk manager output does not escape control characters
\a
(0x07) Alert (Beep, Bell)\
a
(0x5c 0x61)\b
(0x08) Backspace\
b
(0x5c 0x62)\f
(0x0C) Formfeed Page Break\
f
(0x5c 0x66)\n
(0x0A) Newline (Line Feed)\
n
(0x5c 0x6E)\r
(0x0D) Carriage Return\
r
(0x5c 0x72)\t
(0x09) Horizontal Tab\
t
(0x5c 0x74)\v
(0x0B) Vertical Tab\
v
(0x5c 0x75)\
(0x5C) Backslash\
\
(0x5c 0x5c)'
(0x27) Apostrophe or single quotation mark\
'
(0x5c 0x27)"
(0x22) Double quotation mark\
"
(0x5c 0x22)?
(0x3F) question mark\
?
(0x5c 0x3F)Some Strategies for Resolution
We could just always attempt to unescape
\
, in all versions of Asterisk.Cons: This would be a change in behavior, and could potentially corrupt data in Asterisk < 13.5.
We could activate auto-unescaping based on
RubyAMI::Stream#version
being >= 2.8.0 since the issue was introduced as AMI_VERSION moved from 2.7.0 to 2.8.0.Pro: 0-configuration, "It just works" solution.
Cons:
Pro:
Cons:
My leaning is towards option
32. But I'm very interested in other points of view on the matter. 👀Cc: @gfaza @lpradovera @bklang
The text was updated successfully, but these errors were encountered: