From 8cf83b00dba626b5225d2b99855323b7776f2f78 Mon Sep 17 00:00:00 2001 From: Dirk Schnelle-Walka Date: Mon, 2 Dec 2024 17:49:52 +0100 Subject: [PATCH] issue #51 reworked high level interfaces --- .../paInterfaces/paInterfaces.htm | 618 ++++++++---------- 1 file changed, 278 insertions(+), 340 deletions(-) diff --git a/voice interaction drafts/paInterfaces/paInterfaces.htm b/voice interaction drafts/paInterfaces/paInterfaces.htm index 2f4d4b7..c7bd421 100644 --- a/voice interaction drafts/paInterfaces/paInterfaces.htm +++ b/voice interaction drafts/paInterfaces/paInterfaces.htm @@ -23,7 +23,7 @@

Intelligent
Latest version
- Last modified: November 29, 2024 https://github.com/w3c/voiceinteraction/blob/master/voice%20interaction%20drafts/paInterfaces/paInterfaces.htm (GitHub repository)
HTML @@ -167,7 +167,7 @@

- Interfaces are described with the help of UML diagrams. We expect the reader to be familiar with that notation, although most concepts are easy to understand and do not require in-depth @@ -373,7 +373,7 @@

processing.

- 4. High Level Interfaces + 4. High Level Data Structures and Interfaces

@@ -408,15 +408,14 @@

This sequence supports the major use cases stated above.

-

- 4.1 Interface Client Input +

+ 4.1 IPARequest

- This interface describes the data that is sent from the IPA Client to the IPA Service. The following table - details the data that should be considered for this interface in - the method processInput + href="#ipaservice">IPA Service and reused inside the + IPA. The following table details the corresponding data elements.

@@ -462,13 +461,18 @@

The session id can be created by the IPA Service. In case a session id is - provided, it must be used for subsequent calls. + href="#ipaservice">IPA Service. It may not be known to + the client for the first request. In this case, this field is + simply left empty. The IPA Service may + maintain a session id, e.g., to serve multiple clients + and allow them to be distinguished. In these cases a session id + is provided, that must be used for subsequent calls of the IPA Client.

The IPA Client maintains request - id for each request that is being sent via this interface. + id for each request that is being sent. These ids must be unique within a session.

@@ -517,18 +521,15 @@

timestamp and location.

-

- The IPA Service may maintain a session - id, e.g., to serve multiple clients and allow them to be - distinguished. -

+

+ 4.2 IPAResponse +

- As a return value this interface describes the data that is sent - from the IPA Service to the IPA Service to the IPA Client. The following table - details the data that should be considered for this interface in - the ClientResponse. + details the corresponding data elements.

@@ -572,147 +573,16 @@

the Interface Service Call.

-

The following sections will provide examples using the JSON - format to illustrate the interfaces. JSON is only chosen as it - is easy to understand and read. This specification does not make - any assumptions about the underlying programming languages or - data format. They are just meant to be an illustration of how - responses may be generated with the provided data. It is not - required that implementations follow exactly the described - behavior.

- -

- 4.1.2 Example Weather Information for - Interface Client Input -

- -

- The following request to processInput sends endpointed - audio data with the user's current location to query for - tomorrow's weather with the utterance What will the - weather be like tomorrow".

-
-{
-	"sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
-	"requestId": "42",
-	"audio": {
-		"type": "Endpointed",
-		"data": "ZmhhcGh2cGF3aGZwYWhuZ...zI0MDc4NDY1NiB5dGhvaGF3",
-		"encoding": "PCM-16BIT"
-	}
-	"multimodal": {
-		"location": {
-			"latitude": 52.51846213843821,
-			"longitude": 13.37872252544883338.897957
-		}
-		...
-	},
-	"meta": {
-		"timestamp": "2022-12-01T18:45:00.000Z"
-		...
-	}
-}
- -

In this example endpointed audio data is transfered as a - value. There are other ways to send the audio data to the IPA, - e.g., as a reference. This way is chosen as it is easier to - illustrate the usage.

- -

- In return the the IPA may send back the following response Tomorrow - there will be snow showers in Berlin with temperatures - between 0 and -1 degrees via ClientResponse to the - Client.

-
-{
-	"sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
-	"requestId": "42",
-	"audio": {
-		"type": "Endpointed",
-		"data": "Uvrs4hcGh2cGF3aGZwYWhuZ...vI0MDc4DGY1NiB5dGhvaRD2",
-		"encoding": "PCM-16BIT"
-	}
-	"multimodal": {
-		"text": "Tomorrow there will be snow showers in Berlin with temperatures between 0 and -1 degrees."
-		...
-	},
-	"meta": {
-		...
-	}
-}
- -

- 4.1.3 Example Flight Reservation for - Interface Client Input -

- -

- The following request to processInput sends endpointed - audio data with the user's current location to book a flight - with the utterance I want to fly to San Francisco.

-
-{
-	"sessionId": "0c27895c-644d-11ed-81ce-0242ac120002",
-	"requestId": "15",
-	"audio": {
-		"type": "Endpointed",
-		"data": "ZmhhcGh2cGF3aGZwYWhuZ...zI0MDc4NDY1NiB5dGhvaGF3",
-		"encoding": "PCM-16BIT"
-	}
-	"multimodal": {
-		"location": {
-			"latitude": 52.51846213843821,
-			"longitude": 13.37872252544883338.897957
-		}
-		...
-	},
-	"meta": {
-		"timestamp": "2022-11-14T19:50:00.000Z"
-		...
-	}
-}
-

- In return the the IPA may send back the following response When - do you want to fly from Berlin to San Francisco? via ClientResponse - to the Client

-
-{
-	"sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
-	"requestId": "42",
-	"audio": {
-		"type": "Endpointed",
-		"data": "Uvrs4hcGh2cGF3aGZwYWhuZ...vI0MDc4DGY1NiB5dGhvaRD2",
-		"encoding": "PCM-16BIT"
-	}
-	"multimodal": {
-		"text": "When do you want to fly from Berlin to San Francisco?"
-		...
-	},
-	"meta": {
-		...
-	}
-}
-

- 4.2 External Client Input +

+ 4.3 ExternalIPAResponse

- This interface describes the data that is sent from t the Provider Selection - Service. The input is a copy of the data that is sent from - the IPA Client to the IPA Service in Interface Client Input. This - interface mainly differs in the return value. The following - table details the data that should be considered for this - interface in the method processInput. -

- -

- As a return value this interface describes the data that is sent - from the Provider - Selection Service and the NLU and and the NLU and Dialog Management. The following table details the data that should be considered for this interface in the method ExternalClientResponse. @@ -864,137 +734,9 @@

-

- 4.2.1 Example Weather Information for - Interface External Client Input -

- -

- The following request to processInput is a copy of Example Weather - Information for Interface Client Input. -

- -

- In return the the external IPA may send back the following - response via ExternalClientResponse to the Dialog. -

-
-{
-    "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
-    "requestId": "42",
-    "callResult": "success",
-    "interpretation": [
-        {
-            "intent": "check-weather",
-            "intentConfidence": 0.9,
-            "entities": [
-                {
-                    "location": "Berlin",
-                    "entityConfidence": 1.0
-                },
-                {
-                    "date": "2022-12-02",
-                    "entityConfidence": 0.94
-                },
-            ]
-        },
-        ...	
-    ]
-}
- -

- The external speech recognizer converts the obtained audio into - text like How will be the weather tomorrow. The NLU - then extracts the following from that decoded utterance, other - multimodal input and metadata. -

-
    -
  • intent: check-weather from, e.g., utterance part How - will the weather…
  • -
  • entity: date from utterance part …tomorrow…
  • -
  • entity: location, e.g., from the multimodal input of - location
  • -
-

This is illustrated in the following figure.

- Processing Input of the check weather example - -

- 4.2.2 Example Flight Reservation for - Interface External Client Input -

- -

- The following request to processInput is a copy of Example Flight - Reservation for Interface Client Input. -

- -

- In return the the IPA may send back the following response When - do you want to fly from Berlin to San Francisco? via ClientResponse - to the Client. In this case, empty entities, like date - indicate that there are still slots to be filled and no service - call can be made right now.

-
-{
-    "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
-    "requestId": "42",
-    "callResult": "success",
-    "interpretation": [
-        {
-            "intent": "book-flight",
-            "intentConfidence": 0.87,
-            "entities": [
-                {
-                    "origin": "Berlin",
-                    "entityConfidence": 1.0
-                },
-                {
-                    "destination": "San Francisco",   
-                    "entityConfidence": 0.827
-                },
-                {
-                    "date": "",   
-                },
-                ...
-            ]
-        },
-        ...	
-    ]
-}
- -

- The external speech recognizer converts the obtained audio into - text like I want to fly to San Francisco. The NLU then - extracts the following from that decoded utterance, other - multimodal input and metadata.

-
    -
  • intent: book-fligh from, e.g., utterance part I - want to fly…
  • -
  • entity: location from utterance part …San - Francisco…
  • -
  • entity: location, e.g., from the multimodal input of - location
  • -
-

- This is illustrated in the following figure. Processing Input of the flight reservation example -

-

- Further steps will be needed to convert both location entities - to origin and destination in the actual reply. - This may be either done by the flight reservation IPA directly - or by calling external services beforehand to determine the - nearest airports from these locations. -

- 4.3 External Service Call + 4.4 External Service Call

This interface describes the data that is sent from the

This call is optional depending on the result of the next dialog step if an external service should be called or not.

-

- 4.3.1 Example Weather Information for - Interface Service Call + +

+ 4.5.Error Handling +

+

Errors may occur anywhere in the processing chain of the IPA. + The following gives an overview of how they are suggested to be + handled.

+ +

Along the processing path errors may occur

+
    +
  1. in the response of a call to another component
  2. +
  3. inside this component to be further processed by + subsequent components
  4. +
+ +

Error messages carry the following information

+ + + + + + + + + + + + + + + + + + + + + + + + + +
nametypedescriptionrequired
error codedata itemunique error code that could be transformed into a + IPA response matching the language and conversationyes
error messagedata itemhuman-readable error message for logging and + debuggingyes
component iddata itemid of the component that has produced or handled + the erroryes
+ +

+ 4.6.Examples +

+ +

The following sections will provide examples using the JSON + format to illustrate the usage of above mentioned data + structures and interfaces. JSON is only chosen as it is easy to + understand and read. This specification does not make any + assumptions about the underlying programming languages or data + format. They are just meant to be an illustration of how + responses may be generated with the provided data. It is not + required that implementations follow exactly the described + behavior. It is also not required that JSON is used at all.

+ +

+ 4.6.1 Example Weather Information for + the High Level Interfaces and Data Structures

+

+ The following request of an IPARequest sends endpointed + audio data with the user's current location to query for + tomorrow's weather with the utterance What will the + weather be like tomorrow". +

+
+{
+    "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
+    "requestId": "42",
+    "audio": {
+        "type": "Endpointed",
+        "data": "ZmhhcGh2cGF3aGZwYWhuZ...zI0MDc4NDY1NiB5dGhvaGF3",
+        "encoding": "PCM-16BIT"
+    }
+    "multimodal": {
+        "location": {
+            "latitude": 52.51846213843821,
+            "longitude": 13.37872252544883338.897957
+        }
+        ...
+    },
+    "meta": {
+        "timestamp": "2022-12-01T18:45:00.000Z"
+        ...
+    }
+}
+ +

In this example endpointed audio data is transfered as a + value. There are other ways to send the audio data to the IPA, + e.g., as a reference. This way is chosen as it is easier to + illustrate the usage.

+ +

+ In return an external IPA may send back the following ExternalIPAResponse + to the Dialog. +

+
+{
+    "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
+    "requestId": "42",
+    "callResult": "success",
+    "interpretation": [
+        {
+            "intent": "check-weather",
+            "intentConfidence": 0.9,
+            "entities": [
+                {
+                    "location": "Berlin",
+                    "entityConfidence": 1.0
+                },
+                {
+                    "date": "2022-12-02",
+                    "entityConfidence": 0.94
+                },
+            ]
+        },
+        ... 
+    ]
+}
+ +

+ The external speech recognizer converts the obtained audio into + text like How will be the weather tomorrow. The NLU + then extracts the following from that decoded utterance, other + multimodal input and metadata. +

+
    +
  • intent: check-weather from, e.g., utterance part How + will the weather…
  • +
  • entity: date from utterance part …tomorrow…
  • +
  • entity: location, e.g., from the multimodal input of + location
  • +
+

This is illustrated in the following figure.

+ Processing Input of the check weather example + +

The following request to callService may be made to call - the weather information service. Although calling the weather - service is not a direct functionality of the IPA, it may help to - understand how the entered data may be processed to obtain a - spoken reply to the user's input. + the weather information service to actually obain the requested + information. Although calling the weather service is not a + direct functionality of the IPA, it may help to understand how + the entered data may be processed to obtain a spoken reply to + the user's input.

@@ -1119,13 +1002,13 @@ 

} ] }, - ... + ... ] }

In return the the external service may send back the following - response via ExternalClientResponse to the Dialog + response ExternalIPAResponse to the Dialog

@@ -1148,57 +1031,112 @@ 

} ] }, - ... + ... ] }

+

- This information is the used to actually create a reply to the - user as described in ExternalClientResponse - to the Client. -

+ In return the IPA may send back the following response Tomorrow + there will be snow showers in Berlin with temperatures + between 0 and -1 degrees via an IPAResponse to the + Client.

+
+{
+    "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
+    "requestId": "42",
+    "audio": {
+        "type": "Endpointed",
+        "data": "Uvrs4hcGh2cGF3aGZwYWhuZ...vI0MDc4DGY1NiB5dGhvaRD2",
+        "encoding": "PCM-16BIT"
+    }
+    "multimodal": {
+        "text": "Tomorrow there will be snow showers in Berlin with temperatures between 0 and -1 degrees."
+        ...
+    },
+    "meta": {
+        ...
+    }
+}
-

4.4.Error Handling

-

Errors may occur anywhere in the processing chain of the IPA. - The following gives an overview of how they are suggested to be - handled.

+

+ 4.6.2 Example Flight Reservation for + the High Level Interfaces and Data Structures +

-

Along the processing path errors may occur

-
    -
  1. in the response of a call to another component
  2. -
  3. inside this component to be further processed by - subsequent components
  4. -
+

+ The following IPARequest sends endpointed + audio data with the user's current location to book a flight + with the utterance I want to fly to San Francisco.

+
+{
+    "sessionId": "0c27895c-644d-11ed-81ce-0242ac120002",
+    "requestId": "15",
+    "audio": {
+        "type": "Endpointed",
+        "data": "ZmhhcGh2cGF3aGZwYWhuZ...zI0MDc4NDY1NiB5dGhvaGF3",
+        "encoding": "PCM-16BIT"
+    }
+    "multimodal": {
+        "location": {
+            "latitude": 52.51846213843821,
+            "longitude": 13.37872252544883338.897957
+        }
+        ...
+    },
+    "meta": {
+        "timestamp": "2022-11-14T19:50:00.000Z"
+        ...
+    }
+}
-

Error messages carry the following information

- - - - - - - - - - - - - - - - - - - - - - - - - -
nametypedescriptionrequired
error codedata itemunique error code that could be transformed into a - IPA response matching the language and conversationyes
error messagedata itemhuman-readable error message for logging and - debuggingyes
component iddata itemid of the component that has produced or handled - the erroryes
+

+ The external speech recognizer converts the obtained audio into + text like I want to fly to San Francisco. The NLU then + extracts the following from that decoded utterance, other + multimodal input and metadata.

+
    +
  • intent: book-fligh from, e.g., utterance part I + want to fly…
  • +
  • entity: location from utterance part …San + Francisco…
  • +
  • entity: location, e.g., from the multimodal input of + location
  • +
+

+ This is illustrated in the following figure. Processing Input of the flight reservation example +

+

+ Further steps will be needed to convert both location entities + to origin and destination in the actual reply. + This may be either done by the flight reservation IPA directly + or by calling external services beforehand to determine the + nearest airports from these locations. +

+ +

+ In return the the IPA may send back the following response When + do you want to fly from Berlin to San Francisco? via IPAResponse + to the Client

+
+{
+    "sessionId": "0d770c02-2a13-11ed-a261-0242ac120002",
+    "requestId": "42",
+    "audio": {
+        "type": "Endpointed",
+        "data": "Uvrs4hcGh2cGF3aGZwYWhuZ...vI0MDc4DGY1NiB5dGhvaRD2",
+        "encoding": "PCM-16BIT"
+    }
+    "multimodal": {
+        "text": "When do you want to fly from Berlin to San Francisco?"
+        ...
+    },
+    "meta": {
+        ...
+    }
+}

5. Low Level Interfaces