Getting good quality audio from all captures is challenging. Can AI speech enhancement models be used to generate higher quality audio tracks? This would in turn improve the accuracy of captions. Other examples of this are well documented, ie https://podcast.adobe.com/en/enhance