Skip to content

Comments

Fix #2787: model.generate时,传入的是音频字节流,会对采样率校验耗时200ms。#2820

Open
danielalanbates wants to merge 1 commit intomodelscope:mainfrom
danielalanbates:fix/issue-2787
Open

Fix #2787: model.generate时,传入的是音频字节流,会对采样率校验耗时200ms。#2820
danielalanbates wants to merge 1 commit intomodelscope:mainfrom
danielalanbates:fix/issue-2787

Conversation

@danielalanbates
Copy link

Fixes #2787

Summary

This PR addresses: model.generate时,传入的是音频字节流,会对采样率校验耗时200ms。

Changes

funasr/utils/load_utils.py | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

Testing

Please review the changes carefully. The fix was verified against the existing test suite.


This PR was created with the assistance of Claude Sonnet 4.6 by Anthropic | effort: low. Happy to make any adjustments!

…ad_bytes

`validate_frame_rate` calls `pydub.AudioSegment.from_file` which takes
~200 ms.  When the caller passes raw PCM bytes (no container header) the
call always fails and the exception is silently swallowed, wasting 200 ms
on every invocation.

Add a lightweight `_is_audio_container` helper that inspects the first
few magic bytes of the input.  `load_bytes` now only calls
`validate_frame_rate` when the payload is a recognised container format
(WAV / MP3 / OGG / FLAC / MP4).  Raw PCM buffers skip the validation
entirely, eliminating the latency hit.

Also remove a stray `import pdb` debug statement.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @danielalanbates, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the performance of audio processing within the model.generate function by optimizing how audio byte streams are handled. It resolves an issue where raw PCM data was subjected to an expensive 200ms sample rate validation, which was both unnecessary and inefficient. The changes introduce a robust mechanism to detect actual audio container formats, ensuring that the costly validation step is only executed when truly required, leading to faster and more efficient audio data loading.

Highlights

  • Performance Optimization: Implemented a check to prevent unnecessary and time-consuming sample rate validation for raw PCM audio byte streams, addressing a reported 200ms delay.
  • Audio Container Detection: Introduced a new helper function, _is_audio_container, to efficiently identify common audio file formats (WAV, MP3, OGG, FLAC, MP4/M4A/AAC) based on their magic headers.
  • Conditional Validation: Modified the load_bytes function to only perform validate_frame_rate if the input byte stream is identified as an actual audio container, thereby skipping the validation for raw PCM data.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • funasr/utils/load_utils.py
    • Removed an unused pdb import.
    • Added a new private function _is_audio_container to check if a byte stream starts with a recognized audio container magic header.
    • Modified load_bytes to conditionally call validate_frame_rate only if _is_audio_container returns True, preventing unnecessary validation for raw PCM data.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 PR 通过引入 _is_audio_container 启发式检查,成功解决了在 model.generate 中传入原始 PCM 字节流时,由于 pydub/ffmpeg 尝试解析无头数据而导致的 200ms 延迟问题。这是一个非常有效的性能优化。建议在检查中增加对 WebM/Matroska 格式的支持,并改进异常处理以符合 Python 最佳实践。

Comment on lines +167 to +168
if len(data) >= 8 and data[4:8] == b"ftyp":
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

建议增加对 WebM/Matroska 容器格式的检查。WebM 格式(通常包含 Opus 编码的音频)在 Web 应用和流媒体中非常常见。其 EBML 头部标识符为 \x1a\x45\xdf\xa3

Suggested change
if len(data) >= 8 and data[4:8] == b"ftyp":
return True
if len(data) >= 8 and data[4:8] == b"ftyp":
return True
# WebM / MKV
if data[:4] == b"\x1a\x45\xdf\xa3":
return True

Comment on lines +178 to +181
try:
input = validate_frame_rate(input)
except:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

建议避免使用空的 except:,因为这会捕获包括 KeyboardInterruptSystemExit 在内的所有异常,可能会掩盖真正的程序问题。建议改为捕获 Exception 或更具体的异常类型。

Suggested change
try:
input = validate_frame_rate(input)
except:
pass
try:
input = validate_frame_rate(input)
except Exception:
pass
References
  1. PEP 8 建议不要使用空的 except: 语句,而应该捕获具体的异常或至少使用 except Exception:。 (link)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

model.generate时,传入的是音频字节流,会对采样率校验耗时200ms。

1 participant