James Betker
c737632eae
Train and use a bespoke tokenizer
2021-12-22 15:06:14 -07:00
James Betker
a9629f7022
Try out using the GPT tokenizer rather than nv_tacotron
...
This results in a significant compression of the text domain, I'm curious what the
effect on speech quality will be.
2021-12-22 14:03:18 -07:00
James Betker
ced81a760b
restore nv_tacotron
2021-12-22 13:48:53 -07:00
James Betker
7bf4f9f580
duplicate nvtacotron
2021-12-22 13:48:30 -07:00
James Betker
9e8a9bf6ca
Various fixes to gpt_tts_hf
2021-12-16 23:28:44 -07:00
James Betker
31fc693a8a
dafsdf
2021-12-02 22:55:36 -07:00
James Betker
040d998922
maasd
2021-12-02 22:53:48 -07:00
James Betker
cc10e7e7e8
Add tsv loader
2021-12-02 22:43:07 -07:00
James Betker
702607556d
nv_tacotron_dataset: allow it to load conditioning signals
2021-12-02 22:14:44 -07:00
James Betker
0604060580
Finish up mods for next version of GptAsrHf
2021-11-20 21:33:49 -07:00
James Betker
9b3c3b1227
use sets instead of list ops
2021-11-07 20:45:57 -07:00
James Betker
722d3dbdc2
f
2021-11-07 18:52:05 -07:00
James Betker
18b1de9b2c
Add exclusion_lists to unsupervised_audio_dataset
2021-11-07 18:46:47 -07:00
James Betker
fd14746bf8
badtimes
2021-11-03 00:33:38 -06:00
James Betker
2fa80486de
tacotron_dataset: recover gracefully
2021-11-03 00:31:50 -06:00
James Betker
af51d00dee
Load wav files from voxpopuli instead of oggs
2021-11-02 09:32:26 -06:00
James Betker
f7d0901ce6
Decouple MEL from nv_tacotron_dataset
2021-10-31 15:01:38 -06:00
James Betker
b8b268b5f6
Misc
2021-10-31 14:29:23 -06:00
James Betker
579f0a70ee
Move UnsupervisedAudioDataset to use my new mp3 loader
2021-10-28 22:33:12 -06:00
James Betker
5d714bc566
Add deepspeech model and support for decoding with it
2021-10-27 13:09:46 -06:00
James Betker
21b6daa0ed
Introduce clip resampling
2021-10-26 10:42:23 -06:00
James Betker
c3421b7f6d
Dataset work for audio quality processor
2021-10-24 09:09:34 -06:00
James Betker
06ea6191a9
Initial implementation of audio_with_noise dataset
2021-10-21 16:45:19 -06:00
James Betker
d016a2fbad
Go back to vanilla flavor of diffusion
2021-10-17 17:32:46 -06:00
James Betker
6833048bf7
Alterations to diffusion_dvae so it can be used directly on spectrograms
2021-09-23 15:56:25 -06:00
James Betker
359e9e27a7
unsupervised_audio_dataset: try to recover from failures of audio2numpy
2021-09-17 15:25:57 -06:00
James Betker
f78ce9d924
Get diffusion_dvae ready for prime time!
2021-09-16 22:43:10 -06:00
James Betker
1197ae1928
Misc
2021-09-16 10:53:56 -06:00
James Betker
8d9857f33d
More fixes
2021-09-14 20:45:05 -06:00
James Betker
9a9c90660f
Fixes
2021-09-14 18:29:17 -06:00
James Betker
e513052fca
Add unsupervised_audio_dataset
2021-09-14 17:43:16 -06:00
James Betker
b8f2e0f452
mydvae
2021-09-06 17:45:30 -06:00
James Betker
30cd33fe44
another fix
2021-08-31 14:46:46 -06:00
James Betker
8810d3de97
fix wavfile_dataset
2021-08-31 14:45:29 -06:00
James Betker
dabd87246d
Add unet_diffusion_vocoder
2021-08-31 14:38:33 -06:00
James Betker
570ed327ed
Stop dataset - attempt #2
2021-08-18 18:29:38 -06:00
James Betker
8332923f5c
Two more tools to test the audio segmentor
2021-08-17 09:09:11 -06:00
James Betker
93e903af15
Rework wavfile dataset to be usable for things other than augments
2021-08-16 22:52:35 -06:00
James Betker
d7f30232c3
Oh yeah
2021-08-16 22:52:15 -06:00
James Betker
4c01d82265
Fix for voxpopuli
2021-08-16 22:52:05 -06:00
James Betker
1fede41b7b
Audio segmentor
2021-08-16 22:51:53 -06:00
James Betker
2d3372054d
Add support for voxpopuli to nv_tacotron_dataset
2021-08-16 17:13:40 -06:00
James Betker
3580c52eac
Fix up wavfile_dataset to be able to provide a full clip
2021-08-15 20:53:26 -06:00
James Betker
a523c4f932
Auto-normalize wav files by data type
2021-08-15 09:09:51 -06:00
James Betker
c28f657ab8
Allow usage of pre-rendered mels saved to npy files
2021-08-14 23:38:15 -06:00
James Betker
ad3391bd96
Fix nan issue when interpolating audio
2021-08-14 20:42:01 -06:00
James Betker
769f0acc53
Moar fix
2021-08-14 17:23:15 -06:00
James Betker
3d2e724083
Fix audio ranging problem
2021-08-14 17:18:55 -06:00
James Betker
d6a73acaed
Allow processing of multiple audio sources at once from nv_tacotron_dataset
2021-08-14 16:04:05 -06:00
James Betker
007976082b
GPT_asr for inference
2021-08-14 14:37:17 -06:00
James Betker
72622b4d61
Allow saving mel strips as files from the dataset implementation
2021-08-13 22:46:41 -06:00
James Betker
cfd284f425
Fix up some stuff that allows the MEL to be computed on-GPU
2021-08-13 18:35:55 -06:00
James Betker
fff1a59e08
max/min mel invalid fix
2021-08-13 09:36:31 -06:00
James Betker
4b2946e581
More fix
2021-08-12 15:51:23 -06:00
James Betker
4c76257c71
Dont require collation for nv_tacotron
2021-08-12 15:44:55 -06:00
James Betker
5b07d3b623
Found error that I was trying to fix with reload=True
2021-08-12 15:22:34 -06:00
James Betker
430b650a34
......
2021-08-12 10:31:10 -06:00
James Betker
b35d6ae028
Print some metrics from tacotron dataset when it croaks
2021-08-12 09:21:12 -06:00
James Betker
0c4d6b1916
Just offer generic re-load for nv-tacotron
2021-08-12 09:09:12 -06:00
James Betker
154f5aa73c
Fix annoying warning and add to requirements
2021-08-11 17:32:06 -06:00
James Betker
f04a7bdf63
Bug fixes for tacotron dataset on mozilla cv
...
- Support a max mel length (mozilla cv has some tracks that are basically unbounded..)
- Don't fail on low sample rates (mozilla cv has some of those)
2021-08-11 16:17:03 -06:00
James Betker
2d3f0cc33c
nv_tacotron_dataset - Allow training on mozilla cv
2021-08-11 13:34:31 -06:00
James Betker
d0c74278bf
Enable multiple wavfile paths to be specified, fix eps bug in mp3 splitter
2021-08-11 08:46:02 -06:00
James Betker
e19c00398e
More improvements to random_mp3_splitter
2021-08-09 21:31:12 -06:00
James Betker
74342b860b
Revert "Undo forced text padding"
...
This reverts commit 83ab5e6a00
.
2021-08-09 11:56:34 -06:00
James Betker
d4e33bf15f
Fixes to the mp3 splitter
2021-08-09 11:55:46 -06:00
James Betker
4100469902
Add a tool to split mp3 files into arbitrary chunks of wav files
2021-08-08 23:23:13 -06:00
James Betker
83ab5e6a00
Undo forced text padding
2021-08-08 11:42:20 -06:00
James Betker
690d7e86d3
Fix nv_tacotron_dataset bug which incorrectly mapped filenames
...
dammit..
2021-08-08 11:38:52 -06:00
James Betker
a2afb25e42
Fix inference, always flow full text tokens through transformer
2021-08-07 20:11:10 -06:00
James Betker
b43683b772
Add lucidrains_dvae
2021-08-06 12:03:46 -06:00
James Betker
62c7570512
Constrain wav_aug a bit more
2021-08-06 08:19:38 -06:00
James Betker
f126040da2
Undo noise first
2021-08-05 23:24:38 -06:00
James Betker
908ef5495f
Add noise first to audio_aug
2021-08-05 23:22:44 -06:00
James Betker
d6007c6de1
dataset fixes
2021-08-05 23:12:59 -06:00
James Betker
d120e1aa99
Add audio augmentation to wavfile_dataset, utility to test audio similary
2021-08-05 22:14:49 -06:00
James Betker
4017236ba9
Fix up inference for gpt_tts
2021-08-05 06:46:30 -06:00
James Betker
5037220ac7
Mods to support contrastive learning on audio files
2021-08-05 05:57:04 -06:00
James Betker
341f28dd82
It works!
2021-08-04 20:07:51 -06:00
James Betker
d9936df363
Add gpt_tts dataset and implement inference
...
- Adds a script which preprocesses quantized mels given a DVAE
- Adds a dataset which can consume preprocessed qmels
- Reworks GPT TTS to consume the outputs of that dataset (removes logic to add padding and start/end tokens)
- Adds inference to gpt_tts
2021-08-04 00:44:04 -06:00
James Betker
dadc54795c
Add gpt_tts
2021-07-27 20:33:30 -06:00
James Betker
49e3b310ea
Allow audio sample rate interpolation for faster training
2021-07-26 17:44:06 -06:00
James Betker
96e90e7047
Add support for a gaussian-diffusion-based wave tacotron
2021-07-26 16:27:31 -06:00
James Betker
d81386c1be
Mods to support vqvae in audio mode (1d)
2021-07-20 08:36:46 -06:00
James Betker
1ff434218e
tacotron2, ready for prime time!
2021-07-08 22:13:44 -06:00
James Betker
86fd3ad7fd
Initial checkin of nvidia tacotron model & dataset
...
These two are tested, full support for training to come.
2021-07-06 11:11:35 -06:00
James Betker
afa41f1804
Allow hq color jittering and corruptions that are not included in the corruption factor
2021-06-30 09:44:46 -06:00
James Betker
6fd16ea9c8
Add meta-anomaly detection, colorjitter augmentation
2021-06-29 13:41:55 -06:00
James Betker
46e9f62be0
Add unet with latent guide
...
This is a diffusion network that uses both a LQ image
and a reference sample HQ image that is compressed into
a latent vector to perform upsampling
The hope is that we can steer the upsampling network
with sample images.
2021-06-26 11:02:58 -06:00
James Betker
0ded106562
Merge remote-tracking branch 'origin/master'
2021-06-25 13:16:28 -06:00
James Betker
a57ed8e960
Various mods to support better jpeg image filtering
2021-06-25 13:16:15 -06:00
James Betker
61e7ca39cd
Update image_folder_dataset.py
2021-06-25 11:48:31 -06:00
James Betker
6b32c87dcb
Try to make diffusion fid more deterministic
2021-06-14 09:27:43 -06:00
James Betker
65c474eecf
Various changes to fix testing
2021-06-11 15:31:10 -06:00
James Betker
6c6e82406e
Pass a corruption factor through the dataset into the upsampling network
...
The intuition is this will help guide the network to make better informed decisions
about how it performs upsampling based on how it perceives the underlying content.
(I'm giving up on letting networks detect their own quality - I'm not convinced it is
actually feasible)
2021-06-07 09:13:54 -06:00
James Betker
fb405d9ef1
CIFAR stuff
...
- Extract coarse labels for the CIFAR dataset
- Add simple resnet that branches lower layers based on coarse labels
- Some other cleanup
2021-06-05 14:16:02 -06:00
James Betker
e6c537824a
Allow validation for ce
2021-06-04 21:21:04 -06:00
James Betker
7c251af7a8
Support cifar100 with resnet
2021-06-04 17:29:07 -06:00
James Betker
6084915af8
Support gaussian diffusion models
...
Adds support for GD models, courtesy of some maths from openai.
Also:
- Fixes requirement for eval{} even when it isn't being used
- Adds support for denormalizing an imagenet norm
2021-06-02 21:47:32 -06:00
James Betker
45bc76ba92
Fixes and mods to support training classifiers on imagenet
2021-06-01 17:25:24 -06:00