Real-time Voice Conversion
An app I developed after being unable to test voice conversion applications on my laptop while abroad. Most systems are CUDA-based and require significant processing power, which this lightweight version avoids by running locally on Apple Silicon. It runs on an M-Series MacBook Air with 300-400ms of latency.
The system is an RVC-based pipeline using PyTorch MPS to make it viable on Apple Silicon. To allow real-time functionality, the audio is processed in 100-250ms chunks with overlapping context. Inference is performed on recent windows to keep the stream stable and reduce audio artifacts. I also built a custom low-latency audio pipeline with CoreAudio and BlackHole to handle the routing, buffering, and synchronization.
This is ongoing work. I'm currently focused on reducing latency while maintaining stability, particularly around queue management and chunk scheduling.