Fix #2787: model.generate时，传入的是音频字节流，会对采样率校验耗时200ms。 by danielalanbates · Pull Request #2820 · modelscope/FunASR

danielalanbates · 2026-02-21T07:52:39Z

Summary

This PR addresses: model.generate时，传入的是音频字节流，会对采样率校验耗时200ms。

Changes

funasr/utils/load_utils.py | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

Testing

Please review the changes carefully. The fix was verified against the existing test suite.

This PR was created with the assistance of Claude Sonnet 4.6 by Anthropic | effort: low. Happy to make any adjustments!

…ad_bytes `validate_frame_rate` calls `pydub.AudioSegment.from_file` which takes ~200 ms. When the caller passes raw PCM bytes (no container header) the call always fails and the exception is silently swallowed, wasting 200 ms on every invocation. Add a lightweight `_is_audio_container` helper that inspects the first few magic bytes of the input. `load_bytes` now only calls `validate_frame_rate` when the payload is a recognised container format (WAV / MP3 / OGG / FLAC / MP4). Raw PCM buffers skip the validation entirely, eliminating the latency hit. Also remove a stray `import pdb` debug statement.

gemini-code-assist · 2026-02-21T07:52:53Z

Summary of Changes

Hello @danielalanbates, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the performance of audio processing within the model.generate function by optimizing how audio byte streams are handled. It resolves an issue where raw PCM data was subjected to an expensive 200ms sample rate validation, which was both unnecessary and inefficient. The changes introduce a robust mechanism to detect actual audio container formats, ensuring that the costly validation step is only executed when truly required, leading to faster and more efficient audio data loading.

Highlights

Performance Optimization: Implemented a check to prevent unnecessary and time-consuming sample rate validation for raw PCM audio byte streams, addressing a reported 200ms delay.
Audio Container Detection: Introduced a new helper function, _is_audio_container, to efficiently identify common audio file formats (WAV, MP3, OGG, FLAC, MP4/M4A/AAC) based on their magic headers.
Conditional Validation: Modified the load_bytes function to only perform validate_frame_rate if the input byte stream is identified as an actual audio container, thereby skipping the validation for raw PCM data.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

funasr/utils/load_utils.py
- Removed an unused pdb import.
- Added a new private function _is_audio_container to check if a byte stream starts with a recognized audio container magic header.
- Modified load_bytes to conditionally call validate_frame_rate only if _is_audio_container returns True, preventing unnecessary validation for raw PCM data.

Activity

Addressed issue model.generate时，传入的是音频字节流，会对采样率校验耗时200ms。 #2787, which reported a 200ms delay during sample rate validation for audio byte streams in model.generate.
The pull request was created with the assistance of Claude Sonnet 4.6 by Anthropic.
The author verified the fix against the existing test suite.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

这个 PR 通过引入 _is_audio_container 启发式检查，成功解决了在 model.generate 中传入原始 PCM 字节流时，由于 pydub/ffmpeg 尝试解析无头数据而导致的 200ms 延迟问题。这是一个非常有效的性能优化。建议在检查中增加对 WebM/Matroska 格式的支持，并改进异常处理以符合 Python 最佳实践。

gemini-code-assist · 2026-02-21T07:54:20Z

funasr/utils/load_utils.py

+    if len(data) >= 8 and data[4:8] == b"ftyp":
+        return True


建议增加对 WebM/Matroska 容器格式的检查。WebM 格式（通常包含 Opus 编码的音频）在 Web 应用和流媒体中非常常见。其 EBML 头部标识符为 \x1a\x45\xdf\xa3。

Suggested change

if len(data) >= 8 and data[4:8] == b"ftyp":

return True

if len(data) >= 8 and data[4:8] == b"ftyp":

return True

# WebM / MKV

if data[:4] == b"\x1a\x45\xdf\xa3":

return True

gemini-code-assist · 2026-02-21T07:54:20Z

funasr/utils/load_utils.py

+        try:
+            input = validate_frame_rate(input)
+        except:
+            pass


建议避免使用空的 except:，因为这会捕获包括 KeyboardInterrupt 和 SystemExit 在内的所有异常，可能会掩盖真正的程序问题。建议改为捕获 Exception 或更具体的异常类型。

Suggested change

try:

input = validate_frame_rate(input)

except:

pass

try:

input = validate_frame_rate(input)

except Exception:

pass

References

PEP 8 建议不要使用空的 except: 语句，而应该捕获具体的异常或至少使用 except Exception:。 ^(link)

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix #2787: model.generate时，传入的是音频字节流，会对采样率校验耗时200ms。#2820

Fix #2787: model.generate时，传入的是音频字节流，会对采样率校验耗时200ms。#2820
danielalanbates wants to merge 1 commit intomodelscope:mainfrom
danielalanbates:fix/issue-2787

danielalanbates commented Feb 21, 2026

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Uh oh!

gemini-code-assist bot Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

danielalanbates commented Feb 21, 2026

Summary

Changes

Testing

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant