| 英文摘要 |
With the rise of online shopping trends, people are increasingly relying on internet platforms to purchase food, particularly through livestreams and video-sharing platforms where content often includes restaurant reviews or collaborations with vendors promoting their products with purchasing options. Such videos wield significant influence over consumers, thus necessitating the inclusion of online food safety management into the overall risk mitigation strategies. A study testing three different types of online food sales videos using speech recognition models revealed several shortcomings in current technology. In scenarios with background noise, recognition results were noticeably affected, often yielding sporadic words instead of complete sentences. Additionally, errors in recognizing proprietary terms were common, particularly when dealing with specialized fields. Furthermore, unclear speech or instances of connected speech led to recognition errors, and in multi-speaker settings, the system tended to prioritize louder voices, potentially resulting in incomplete or incorrectly recognized dialogue. These limitations underscore the need for improvement in handling both complex audio environments and specialized speech contexts. Future applications shall overcome these challenges by adjusting relevant parameters and enhancing recognition file quality to improve accuracy. |