You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to sync translated audio segments with a video using timestamps returned alongside the audio segment itself from a speech to text package. However, even with the stretch ratio calculated correctly, the duration of certain audio segments become too long, particularly because of a strange long pause at the end of the audio segment. For example in the attached zip folder there is the original audio and the stretched one. When calculating the stretch ratio based on the timestamp, the result duration should be about 5-6 seconds, a stretch ratio of around 1.1. However when inputting it into the stretch audio function, the video becomes 8 seconds instead with a 3 second pause. It will be great to know what's causing the problem and if there's something I am unaware of. The relevant code and audio files are below. Thank you!
`
def generate_segment_audio(segment, speaker_id):
start, end, translated_text = segment # Gets start and end timestamps from the audio segment
segment_path = os.path.join(output_dir, f'segment_{start}_{end}.wav')
stretched_path = os.path.join(output_dir, f'segment_{start}_{end}_stretched.wav')
duration = end - start
# Generate the audio file with the TTS model
model.tts_to_file(translated_text, speaker_id, segment_path, speed=speed)
# Adjust the audio speed to match the duration
segment_audio = AudioSegment.from_file(segment_path)
current_duration = len(segment_audio) / 1000 # Convert to seconds
stretch_ratio = duration / current_duration
print(f'{stretch_ratio} = {duration} / {current_duration}')
stretch_audio(segment_path, stretched_path, ratio=stretch_ratio)
return segment_path
I am trying to sync translated audio segments with a video using timestamps returned alongside the audio segment itself from a speech to text package. However, even with the stretch ratio calculated correctly, the duration of certain audio segments become too long, particularly because of a strange long pause at the end of the audio segment. For example in the attached zip folder there is the original audio and the stretched one. When calculating the stretch ratio based on the timestamp, the result duration should be about 5-6 seconds, a stretch ratio of around 1.1. However when inputting it into the stretch audio function, the video becomes 8 seconds instead with a 3 second pause. It will be great to know what's causing the problem and if there's something I am unaware of. The relevant code and audio files are below. Thank you!
`
`
audiofiles.zip
The text was updated successfully, but these errors were encountered: