From 8cf83b00dba626b5225d2b99855323b7776f2f78 Mon Sep 17 00:00:00 2001
From: Dirk Schnelle-Walka Intelligent
HTML
@@ -167,7 +167,7 @@
- Interfaces are described with the help of UML diagrams. We expect the reader to be familiar with that notation, although most concepts are easy to understand and do not require in-depth @@ -373,7 +373,7 @@
@@ -408,15 +408,14 @@
This sequence supports the major use cases stated above.
-- This interface describes the data that is sent from the IPA Client to the IPA Service. The following table - details the data that should be considered for this interface in - the method processInput + href="#ipaservice">IPA Service and reused inside the + IPA. The following table details the corresponding data elements.
- The following request to processInput is a copy of Example Weather - Information for Interface Client Input. -
- -- In return the the external IPA may send back the following - response via ExternalClientResponse to the Dialog. -
--{ - "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002", - "requestId": "42", - "callResult": "success", - "interpretation": [ - { - "intent": "check-weather", - "intentConfidence": 0.9, - "entities": [ - { - "location": "Berlin", - "entityConfidence": 1.0 - }, - { - "date": "2022-12-02", - "entityConfidence": 0.94 - }, - ] - }, - ... - ] -}- -
- The external speech recognizer converts the obtained audio into - text like How will be the weather tomorrow. The NLU - then extracts the following from that decoded utterance, other - multimodal input and metadata. -
-This is illustrated in the following figure.
- - -- The following request to processInput is a copy of Example Flight - Reservation for Interface Client Input. -
- -- In return the the IPA may send back the following response When - do you want to fly from Berlin to San Francisco? via ClientResponse - to the Client. In this case, empty entities, like date - indicate that there are still slots to be filled and no service - call can be made right now.
--{ - "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002", - "requestId": "42", - "callResult": "success", - "interpretation": [ - { - "intent": "book-flight", - "intentConfidence": 0.87, - "entities": [ - { - "origin": "Berlin", - "entityConfidence": 1.0 - }, - { - "destination": "San Francisco", - "entityConfidence": 0.827 - }, - { - "date": "", - }, - ... - ] - }, - ... - ] -}- -
- The external speech recognizer converts the obtained audio into - text like I want to fly to San Francisco. The NLU then - extracts the following from that decoded utterance, other - multimodal input and metadata.
-- This is illustrated in the following figure. -
-- Further steps will be needed to convert both location entities - to origin and destination in the actual reply. - This may be either done by the flight reservation IPA directly - or by calling external services beforehand to determine the - nearest airports from these locations. -
name | +type | +description | +required | +
---|---|---|---|
error code | +data item | +unique error code that could be transformed into a + IPA response matching the language and conversation | +yes | +
error message | +data item | +human-readable error message for logging and + debugging | +yes | +
component id | +data item | +id of the component that has produced or handled + the error | +yes | +
The following sections will provide examples using the JSON + format to illustrate the usage of above mentioned data + structures and interfaces. JSON is only chosen as it is easy to + understand and read. This specification does not make any + assumptions about the underlying programming languages or data + format. They are just meant to be an illustration of how + responses may be generated with the provided data. It is not + required that implementations follow exactly the described + behavior. It is also not required that JSON is used at all.
+ ++ The following request of an IPARequest sends endpointed + audio data with the user's current location to query for + tomorrow's weather with the utterance What will the + weather be like tomorrow". +
++{ + "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002", + "requestId": "42", + "audio": { + "type": "Endpointed", + "data": "ZmhhcGh2cGF3aGZwYWhuZ...zI0MDc4NDY1NiB5dGhvaGF3", + "encoding": "PCM-16BIT" + } + "multimodal": { + "location": { + "latitude": 52.51846213843821, + "longitude": 13.37872252544883338.897957 + } + ... + }, + "meta": { + "timestamp": "2022-12-01T18:45:00.000Z" + ... + } +}+ +
In this example endpointed audio data is transfered as a + value. There are other ways to send the audio data to the IPA, + e.g., as a reference. This way is chosen as it is easier to + illustrate the usage.
+ ++ In return an external IPA may send back the following ExternalIPAResponse + to the Dialog. +
++{ + "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002", + "requestId": "42", + "callResult": "success", + "interpretation": [ + { + "intent": "check-weather", + "intentConfidence": 0.9, + "entities": [ + { + "location": "Berlin", + "entityConfidence": 1.0 + }, + { + "date": "2022-12-02", + "entityConfidence": 0.94 + }, + ] + }, + ... + ] +}+ +
+ The external speech recognizer converts the obtained audio into + text like How will be the weather tomorrow. The NLU + then extracts the following from that decoded utterance, other + multimodal input and metadata. +
+This is illustrated in the following figure.
+ + +The following request to callService may be made to call - the weather information service. Although calling the weather - service is not a direct functionality of the IPA, it may help to - understand how the entered data may be processed to obtain a - spoken reply to the user's input. + the weather information service to actually obain the requested + information. Although calling the weather service is not a + direct functionality of the IPA, it may help to understand how + the entered data may be processed to obtain a spoken reply to + the user's input.
@@ -1119,13 +1002,13 @@} ] }, - ... + ... ] }
In return the the external service may send back the following - response via ExternalClientResponse to the Dialog + response ExternalIPAResponse to the Dialog
@@ -1148,57 +1031,112 @@+} ] }, - ... + ... ] }
- This information is the used to actually create a reply to the - user as described in ExternalClientResponse - to the Client. -
+ In return the IPA may send back the following response Tomorrow + there will be snow showers in Berlin with temperatures + between 0 and -1 degrees via an IPAResponse to the + Client. ++{ + "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002", + "requestId": "42", + "audio": { + "type": "Endpointed", + "data": "Uvrs4hcGh2cGF3aGZwYWhuZ...vI0MDc4DGY1NiB5dGhvaRD2", + "encoding": "PCM-16BIT" + } + "multimodal": { + "text": "Tomorrow there will be snow showers in Berlin with temperatures between 0 and -1 degrees." + ... + }, + "meta": { + ... + } +}-
Errors may occur anywhere in the processing chain of the IPA. - The following gives an overview of how they are suggested to be - handled.
+Along the processing path errors may occur
-+ The following IPARequest sends endpointed + audio data with the user's current location to book a flight + with the utterance I want to fly to San Francisco.
++{ + "sessionId": "0c27895c-644d-11ed-81ce-0242ac120002", + "requestId": "15", + "audio": { + "type": "Endpointed", + "data": "ZmhhcGh2cGF3aGZwYWhuZ...zI0MDc4NDY1NiB5dGhvaGF3", + "encoding": "PCM-16BIT" + } + "multimodal": { + "location": { + "latitude": 52.51846213843821, + "longitude": 13.37872252544883338.897957 + } + ... + }, + "meta": { + "timestamp": "2022-11-14T19:50:00.000Z" + ... + } +}-
Error messages carry the following information
-name | -type | -description | -required | -
---|---|---|---|
error code | -data item | -unique error code that could be transformed into a - IPA response matching the language and conversation | -yes | -
error message | -data item | -human-readable error message for logging and - debugging | -yes | -
component id | -data item | -id of the component that has produced or handled - the error | -yes | -
+ The external speech recognizer converts the obtained audio into + text like I want to fly to San Francisco. The NLU then + extracts the following from that decoded utterance, other + multimodal input and metadata.
++ This is illustrated in the following figure. +
++ Further steps will be needed to convert both location entities + to origin and destination in the actual reply. + This may be either done by the flight reservation IPA directly + or by calling external services beforehand to determine the + nearest airports from these locations. +
+ ++ In return the the IPA may send back the following response When + do you want to fly from Berlin to San Francisco? via IPAResponse + to the Client
++{ + "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002", + "requestId": "42", + "audio": { + "type": "Endpointed", + "data": "Uvrs4hcGh2cGF3aGZwYWhuZ...vI0MDc4DGY1NiB5dGhvaRD2", + "encoding": "PCM-16BIT" + } + "multimodal": { + "text": "When do you want to fly from Berlin to San Francisco?" + ... + }, + "meta": { + ... + } +}