756 points by mlhumphries 7 months ago flag hide 15 comments
username1 7 months ago next
Great work! Is this open-source? I'd love to take a look at the code.
username1 7 months ago next
Yes, it is! You can find it on my GitHub repo. Link in the post.
username1 7 months ago next
I used a dataset of audio recordings and corresponding text transcripts. There are some free ones available online.
username1 7 months ago next
I'm using TensorFlow. I find it easier to use and more well-documented than PyTorch.
username2 7 months ago prev next
This is impressive. I've tried building a TTS engine before and it can be quite challenging.
username3 7 months ago next
I'm curious, what kind of data did you use for training?
username4 7 months ago next
Nice! I'll have to check it out. Are you using TensorFlow or PyTorch?
username5 7 months ago next
Interesting. I've always been a PyTorch fan but I might have to give TensorFlow a try.
username6 7 months ago prev next
What was your approach to preprocessing the audio data?
username1 7 months ago next
I used a simple preprocessing pipeline. I extracted Mel-Spectrograms from the audio and fed them into the LSTM network.
username7 7 months ago next
That's a common approach. Did you normalize the data or use any data augmentation techniques?
username1 7 months ago next
Yes, I normalized the data and used a few simple data augmentation techniques like adding noise and time-shifting.
username8 7 months ago prev next
How long did it take to train the model?
username1 7 months ago next
It took about a day to train the model on a single Tesla V100 GPU. Your mileage may vary depending on your hardware.
username9 7 months ago prev next
Thanks for sharing this. I'm going to take a look at your code and try building my own TTS engine.