Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: tti model #2060

Merged
merged 1 commit into from
Jan 21, 2025
Merged

fix: tti model #2060

merged 1 commit into from
Jan 21, 2025

Conversation

shaohuzhang1
Copy link
Contributor

fix: tti model

Copy link

f2c-ci-robot bot commented Jan 21, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

f2c-ci-robot bot commented Jan 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -98,7 +98,7 @@ def create_url(self):
return url

def check_auth(self):
self.text_to_speech(_('Hello'))
self.text_to_speech(__('Hello'))

def text_to_speech(self, text):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet seems to have a few minor issues that can be addressed:

  1. Redundant Import: The ssl module is imported twice, which is unnecessary. You should remove one of them.

  2. Typo Correction: In the line self.text_to_speech(_('Hello')), there's an extra comma (,'') after 'Hello'.

  3. Incomplete Function Implementations: The create_url, _remove_empty_lines, and text_to_speech methods are incomplete. They need to include their respective logic.

Here’s the corrected version of the code with these improvements:

@@ -11,16 +11,16 @@ import hmac
 import json
 import logging
 import os
+ ssl  # Removed duplicate import

 from datetime import datetime, UTC
 from typing import Dict
 from urllib.parse import urlencode, urlparse
 from django.utils.translation import gettext_lazy as _

 max_kb = logging.getLogger("max_kb")

def create_url(self):
    scheme_port = 'http'
    if self.ssl_enabled:
        scheme_port = 'https'
    
    base_url = "{}://{}:{}/api".format(scheme_port, self.host, self.port)
    path_params = {"token": self.token}
    params_str = "?" + urlencode(path_params) if path_params else ""
    return "".join([base_url, "/", self.request_path, params_str])

def check_auth(self):
    self.text_to_speech(_("Hello"))

def text_to_speech(self, text):
    # Placeholder for actual implementation
    pass

Explanation of Changes:

  • Replaced Duplicates: Removed the second import statement for ssl.
  • Fixed Typo: Corrected the extra comma in the text_to_speech method.
  • Implemented Methods: Added placeholder comments within the remaining two methods to indicate that they need implementations based on specific requirements.

from langchain_openai import AzureChatOpenAI
from langchain_openai.chat_models.base import _convert_delta_to_message_chunk

from common.config.tokenizer_manage_config import TokenizerManage
from setting.models_provider.base_model_provider import MaxKBBaseModel
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue with the import statements. The AIMessageChunk class seems to be missing from the imported modules. Additionally, _convert_delta_to_messag_chunk function can cause issues if the base message object is not provided correctly due to potential inconsistencies between LangChain versions.

Here's a revised version of your code:

@@ -7,17 +7,6 @@
     @desc:
 """
 
-from typing import *
-import langchain_core.callbacks.CallbackManagerForLLMRun
from langchain_core.messages import (
    BaseMessage,
    get_buffer_string,
    # Add AIMessageChunk back if needed
)
 from langchain_core.outputs import ChatGenerationChunk

+import langchain_openai
from langchain_openai.chat_models.base import _convert_delta_to_message_chunk

 from common.config.tokenizer_manage_config import TokenizerManage
 from setting.models_provider.base_model_provider import MaxKBBaseModel

Optimization Suggestions:

  1. Remove Unnecessary Imports: If you don't need certain features (like CallbackManagerForLLMRun, get_buffer_string), you can remove them to simplify imports.
  2. Use Type Annotations Correctly: In Python 3.9+, Typing.Any is considered outdated. You might want to use more specific type hints wherever applicable.
  3. Keep Required Modules Imported: Ensure that all necessary functions and classes are still being used.
  4. Review Class Usage: Verify that all methods like _convert_delta_to_message_chunk are called appropriately without causing issues related to argument types or missing attributes.

Make sure that this revision meets your functional requirements and adheres to best practices for using PyPI package management.

@@ -305,7 +305,8 @@ async def segment_data_processor(self, wav_data: bytes, segment_size: int):
res = await ws.recv()
result = parse_response(res)
if 'payload_msg' in result and result['payload_msg']['code'] != self.success_code:
raise Exception(f"Error code: {result['payload_msg']['code']}, message: {result['payload_msg']['message']}")
raise Exception(
f"Error code: {result['payload_msg']['code']}, message: {result['payload_msg']['message']}")
for seq, (chunk, last) in enumerate(VolcanicEngineSpeechToText.slice_data(wav_data, segment_size), 1):
# if no compression, comment this line
payload_bytes = gzip.compress(chunk)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet is mostly clean and follows standard Python conventions. However, there are a few minor issues and potential improvements that can be considered:

  1. Import Order: The ssl module is imported twice, which might not be necessary since it's already used once before in another place.

  2. String Concatenation in Logs: The use of string concatenation within an exception message isn't efficient when dealing with dynamic content. Consider using formatted strings or f-string literals for better readability.

  3. Potential Memory Leak: If the ws.recv() method doesn’t close the WebSocket connection properly after processing messages, it could lead to memory leaks if the client stops sending data but the server continues to wait for more input.

Here’s the revised version with these considerations addressed:

@@ -12,14 +12,14 @@
 import hmac
 import json
 import os
+import ssl
 import uuid
 import wave

 from enum import Enum
 from hashlib import sha256
 from io import BytesIO
 from typing import Dict
 from urllib.parse import urlparse
-import ssl
+
 import websockets

 from setting.models_provider.base_model_provider import MaxKBBaseModel
@@ -305,7 +305,8 @@ async def segment_data_processor(self, wav_data: bytes, segment_size: int):
             res = await ws.recv()
             result = parse_response(res)
             if 'payload_msg' in result and result['payload_msg']['code'] != self.success_code:
-                raise Exception(f"Error code: {result['payload_msg']['code']}, message: {result['payload_msg']['message']}")
+                error_message = f"Error code: {result['payload_msg']['code']}, message: {result['payload_msg']['message']}"
+                raise Exception(error_message)
             for seq, (chunk, last) in enumerate(VolcanicEngineSpeechToText.slice_data(wav_data, segment_size), 1):
                 # if no compression, comment this line
                 payload_bytes = gzip.compress(chunk)

Additional Suggestion for Handling Connection Issues

It would also be wise to implement some basic error handling around the WebSocket connection, such as checking if the connection was closed gracefully during reception and possibly retrying a reconnection if needed:

async def receive_messages(self, ws):
    try:
        while True:
            msg = await ws.recv()
            result = parse_response(msg)
            if 'payload_msg' in result:
                if result['payload_msg']['code'] == self.success_code:
                    handle_successful_result(result)
                else:
                    log_error(result['payload_msg'])
            else:
                log_malformed_response(msg)

    except asyncio.IncompleteReadError:
        logging.error("Unexpectedly disconnected.")
    
    finally:
        await ws.close()

# Usage: self.websocket_coroutine = asyncio.create_task(receive_messages(websocket))

By making these changes, you improve both the readability and robustness of your codebase.

@shaohuzhang1 shaohuzhang1 merged commit 5009a28 into main Jan 21, 2025
4 of 5 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@main@fix_tti_model branch January 21, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant