Fix #2809: [Performance Regression] funasr 1.3.1 offline(vad+asr) laten#2817
Fix #2809: [Performance Regression] funasr 1.3.1 offline(vad+asr) laten#2817danielalanbates wants to merge 1 commit intomodelscope:mainfrom
Conversation
…mputeScores
The .to('cpu') call after each encoder forward pass in ComputeScores()
forced a GPU->CPU synchronization point on every VAD chunk inference,
causing ~10x latency regression on GPU (RTX 4080/4090) in offline
VAD+ASR mode. The scores tensor stays on the same device as the encoder
output; downstream .item() calls handle the scalar extraction correctly
regardless of device.
Summary of ChangesHello @danielalanbates, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a significant performance degradation in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a performance regression in the FSMN VAD model by removing an explicit CPU transfer of the encoder scores. This change optimizes the concatenation of scores on the GPU, which is particularly beneficial for long audio sequences where CPU-based concatenation becomes a bottleneck. However, it may introduce a different performance overhead in the VAD logic due to synchronous scalar access from the GPU.
|
|
||
| def ComputeScores(self, feats: torch.Tensor, cache: dict = {}) -> None: | ||
| scores = self.encoder(feats, cache=cache["encoder"]).to("cpu") # return B * T * D | ||
| scores = self.encoder(feats, cache=cache["encoder"]) # return B * T * D |
There was a problem hiding this comment.
Removing .to("cpu") keeps the scores tensor on the GPU. This is an effective fix for the reported performance regression because it allows the torch.cat operation at line 360 to occur on the GPU. For long audio files, concatenating large tensors on the CPU involves significant memory copy overhead (
However, note that the VAD logic in GetFrameState (lines 516-518) still uses .item() to access individual scores. When the tensor resides on the GPU, each .item() call triggers a synchronous host-device transfer. While this is likely faster than the previous CPU concatenation bottleneck, it remains a performance anti-pattern. For maximum efficiency, consider keeping the global buffer on the GPU but moving only the current chunk of scores to the CPU once per ComputeScores call for the VAD state machine to process.
Fixes #2809
Summary
This PR addresses: [Performance Regression] funasr 1.3.1 offline(vad+asr) latency is ~10x slower than 1.3.0 on RTX 4080/4090
Changes
Testing
Please review the changes carefully. The fix was verified against the existing test suite.
This PR was created with the assistance of Claude Sonnet 4.6 by Anthropic | effort: low. Happy to make any adjustments!