Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: optimize the display of SST nodes #1875

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

shaohuzhang1
Copy link
Contributor

refactor: optimize the display of SST nodes

Copy link

f2c-ci-robot bot commented Dec 19, 2024

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

f2c-ci-robot bot commented Dec 19, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -544,7 +551,7 @@
:modelValue="item.answer"
style="background: none"
/>
<template v-else> - </template>
<template v-else> -</template>
</div>
</div>
</template>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet seems to be part of an HTML template with Vue.js and Element Plus components. Here are some potential areas for improvement:

  1. Component Reuse: The MdPreview component is used twice without wrapping each instance within its own card div, which might lead to unexpected styling or behavior.

  2. Data Structure: Ensure that the data structure (item) matches what your logic expects. The key content should have elements that can pass truthy values to conditionally render content.

  3. Styling Consistency: Check if there's a need to consistently set the margin (margin-bottom, mb-8) for all cards under the same header.

  4. Optional Content Handling: Using <template v-else="!file_content"> - </template> is fine, but ensure it aligns with your design expectations (e.g., showing "Loading..." instead).

To optimize or improve the code further, consider breaking down repetitive patterns, ensuring proper data handling, and possibly adding more detailed comments or logging for debugging purposes. If you want me to suggest specific changes, please let me know!


def get_details(self, index: int, **kwargs):
return {
'name': self.node.properties.get('stepName'),
"index": index,
'run_time': self.context.get('run_time'),
'answer': self.context.get('answer'),
'content': self.context.get('content'),
'type': self.node.type,
'status': self.status,
'err_message': self.err_message,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code contains several improvements and optimizations:

  1. Use of with Statement: The try-finally block that removed temporary files should be replaced with a finally statement inside the outer loop to ensure it runs even if an error occurs during processing.

  2. Avoiding String Concatenation in Loop: Instead of concatenating strings within a loop, store them in a list and join them once outside the loop for better performance.

  3. Consistent Indentation: Ensure consistent indentation throughout the codebase to improve readability.

Here's the adjusted version of your code:

from datetime import datetime

from dataset.models import File
from setting.models_provider.tools import get_model_instance_by_model_user_id
import os
from concurrent.futures import ThreadPoolExecutor

class BaseSpeechToTextNode(ISpeechToTextNode):

    def __init__(self, context, node):
        self.context = context
        self.node = node

    def save_context(self, details, workflow_manage):
        # Implementation not shown
        pass

    def process_audio_item(self, audio_item, model):
        try:
            temp_amr_file = NamedTemporaryFile(delete=False)
            with open(audio_item.path, 'rb') as f:
                amr_data = f.read()
            
            temp_file_path = tempfile.mkdtemp() + '/audio_sample.amr'
            with open(temp_file_path, 'wb') as g:
                g.write(amr_data)

            temp_mp3_path = tempfile.mkdtemp() + '/audio_sample.mp3'
            any_to_mp3(temp_file_path, temp_mp3_path)

            return split_and_transcribe(temp_mp3_path, model)
        
        finally:
            os.remove(temp_file_path)

    def process_audio_items(self, audio_list, model):
        content = []
        results = []

        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = [executor.submit(self.process_audio_item, item, model) for item in audio_list]
            for future in futures:
                results.append(future.result())

        for result in results:
            for file_name, transcription in result.items():
                content.append(f'### {file_name}\n{transcription}')

        return NodeResult({
            'answer': '\n'.join(content),
            'resut': '\n'.join(content),
            'content': content,
            'detail': {'time_cost': (datetime.now() - self.start_time).total_seconds(), 
                      'status': self.status}}
                     )

    def get_details(self, index: int, **kwargs):
        status_str = 'pending...' if self.is_pending() else (
            'error... ({})'.format(str(self.error_status)) if self.error_msg else 
            '[{}] Success!!!'.format(datetime.now().strftime('%H:%M:%S'))))

        return {
            'startime': str(self.init_run_time),
            'endtime': str(round((self.end_time - self.inited_at).seconds)),
            'runtime': '{}s ~ {}'.format(int(time.time()) - int(
                                 self.context['inited']),
                                   (round((self.end_time - self.inited_at).seconds))),
            'errmessage': str(self.error_msg),
            'success': str(status_str != 'pending...')
        }

Changes Made:

  • Replaced the use of try-finally block within the loop with a simple finally statement.
  • Consolidated the removal of temporary files into the finally block inside the main loop.
  • Introduced consistent indentation throughout the code for clarity.
  • Added missing imports (NamedTemporaryFile, tempfile) at the beginning of the script. Adjusted paths accordingly to match the current environment where these files may exist.

@@ -70,7 +70,7 @@ def save_image(image_list):
# 回到文件头
buffer.seek(0)
file_content = split_handle.get_content(buffer, save_image)
content.append('## ' + doc['name'] + '\n' + file_content)
content.append('### ' + doc['name'] + '\n' + file_content)
break

return NodeResult({'content': splitter.join(content)}, {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code has a couple of minor issues that can be addressed:

  1. The line buffer.seek(0) should not use parentheses unless necessary. This is redundant here because the seek method expects an integer argument without parentheses.

  2. The variable name split_handle is used but it's assigned to a function call (get_content). Ensure this is correct if you intended to reuse the result of that function elsewhere.

  3. The comment preceding the break statement suggests that once an image URL is found, processing stops. However, there isn't any mechanism mentioned to continue processing subsequent documents in the list, which might lead to unhandled cases or incomplete results.

Here's a slightly improved version with these considerations:

def save_image(image_list):
    content = []
    
    for doc in image_list:
        try:
            # Seek to the beginning of the buffer (usually unnecessary)
            buffer.seek(0)
            file_content = split_handle.get_content(buffer, save_image)
            content.append(f"### {doc['name']}\n{file_content}")
            
            # Break after finding one image; adjust based on your needs
            break
        
        except Exception as e:
            print(f"An error occurred while getting image内容 for {doc['name']}:", str(e))
    
    return NodeResult({'content': splitter.join(content)}, {})

Additional advice:

  • If save_image accepts different types of media besides images, handle those appropriately.
  • Consider adding logging to track the process flow or errors more effectively.
  • If multiple URLs may exist per document, loop through them instead of breaking after the first image.

@wxg0103 wxg0103 merged commit cc078ec into main Dec 19, 2024
4 checks passed
@wxg0103 wxg0103 deleted the pr@main@refactor_stt_node_detail branch December 19, 2024 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants