Commit Graph

238 Commits

Author SHA1 Message Date
James Betker
65ffe38fce misc 2022-01-06 22:16:17 -07:00
James Betker
f4484fd155 Add "dataset_debugger" support
This allows the datasets themselves compile statistics and report them
via tensorboard and wandb.
2022-01-06 12:38:20 -07:00
James Betker
50d267ab1a misc 2022-01-05 17:01:22 -07:00
James Betker
963c6072bb Add mel_encoder and solo embeddings to unified_voice 2022-01-04 15:15:58 -07:00
James Betker
b24a51f0aa Check in speech2speech CLIP inference tool 2021-12-29 00:19:44 -07:00
James Betker
c1bef01dfa GptAsrHf2 checkin 2021-12-28 20:48:38 -07:00
James Betker
07c2b9907c Add voice2voice clip model 2021-12-28 16:18:12 -07:00
James Betker
93624fa4b2 Don't use tqdm in ranks!=0 2021-12-28 10:06:54 -07:00
James Betker
6996dfd9d5 asr_hf2: add independent position embedders 2021-12-26 15:17:24 -07:00
James Betker
8b19c37409 UnifiedGptVoice! 2021-12-23 15:20:26 -07:00
James Betker
f9c45d70f0 Fix mel terminator 2021-12-18 17:18:06 -07:00
James Betker
5a664aa56e misc 2021-12-11 08:17:26 -07:00
James Betker
b2d8fbcfc0 build a better speech synthesis toolset 2021-12-09 22:59:56 -07:00
James Betker
3b5c3d85d8 Allow specification of wandb run name 2021-11-22 17:31:29 -07:00
James Betker
19c80bf7a7 Improve wandb logging 2021-11-22 16:40:05 -07:00
James Betker
596a62fe01 Apply fix to gpt_asr_hf and prep it for inference
Fix is that we were predicting two characters in advance, not next character
2021-11-04 10:09:24 -06:00
James Betker
87364b890f Add custom clip_grad_norm that prints out the param names in error. 2021-11-01 11:12:20 -06:00
James Betker
b8b268b5f6 Misc 2021-10-31 14:29:23 -06:00
James Betker
e9dc37f19c Mod trainer to copy config file into experiments root 2021-10-30 17:00:24 -06:00
James Betker
2afea126d7 mod trainer to be very explicit about the fact that loading models and state together dont work, but allow it 2021-10-28 22:32:42 -06:00
James Betker
5d714bc566 Add deepspeech model and support for decoding with it 2021-10-27 13:09:46 -06:00
James Betker
c3421b7f6d Dataset work for audio quality processor 2021-10-24 09:09:34 -06:00
James Betker
f2a31702b5 Clean stuff up, move more things into arch_util 2021-10-20 21:19:25 -06:00
James Betker
83798887a8 Mods to support unet diffusion vocoder with conditioning 2021-10-13 21:23:18 -06:00
James Betker
33120cb35c Add norming to discretization_loss 2021-10-06 17:10:50 -06:00
James Betker
09f373e3b1 Add dvae with channel attention 2021-10-03 10:52:01 -06:00
James Betker
ac57cdc794 Add scheduling to quantizer, enable cudnn_benchmarking to be disabled 2021-09-24 17:01:36 -06:00
James Betker
c5297ccec6 Add dvae balancing heuristic 2021-09-23 21:19:36 -06:00
James Betker
6833048bf7 Alterations to diffusion_dvae so it can be used directly on spectrograms 2021-09-23 15:56:25 -06:00
James Betker
f78ce9d924 Get diffusion_dvae ready for prime time! 2021-09-16 22:43:10 -06:00
James Betker
b8f2e0f452 mydvae 2021-09-06 17:45:30 -06:00
James Betker
dabd87246d Add unet_diffusion_vocoder 2021-08-31 14:38:33 -06:00
James Betker
d05cc1f46c Misc 2021-08-24 17:12:04 -06:00
James Betker
9dfe936c16 Fix ddp for sampler 2021-08-19 16:45:34 -06:00
James Betker
570ed327ed Stop dataset - attempt #2 2021-08-18 18:29:38 -06:00
James Betker
8332923f5c Two more tools to test the audio segmentor 2021-08-17 09:09:11 -06:00
James Betker
1fede41b7b Audio segmentor 2021-08-16 22:51:53 -06:00
James Betker
a523c4f932 Auto-normalize wav files by data type 2021-08-15 09:09:51 -06:00
James Betker
cdee31c60b GPT_ASR 2021-08-13 15:02:18 -06:00
James Betker
f5a9b88ef6 tacotron cleaners: remove quotation marks
these don't really have relevance for tts or asr
2021-08-11 16:18:44 -06:00
James Betker
e19c00398e More improvements to random_mp3_splitter 2021-08-09 21:31:12 -06:00
James Betker
04d14b3acc No batch factors for eval 2021-08-09 16:02:01 -06:00
James Betker
82fc69abfa Add "pure" evaluator
Which simply computes the training loss against an eval dataset
2021-08-09 14:58:35 -06:00
James Betker
b43683b772 Add lucidrains_dvae 2021-08-06 12:03:46 -06:00
James Betker
d120e1aa99 Add audio augmentation to wavfile_dataset, utility to test audio similary 2021-08-05 22:14:49 -06:00
James Betker
c0f61a2e15 Rework how DVAE tokens are ordered
It might make more sense to have top tokens, then bottom tokens
with top tokens having different discretized values.
2021-08-05 07:07:17 -06:00
James Betker
5037220ac7 Mods to support contrastive learning on audio files 2021-08-05 05:57:04 -06:00
James Betker
4c98b9703f Get dalle-style TTS to "work" 2021-08-03 21:08:27 -06:00
James Betker
2814307eee Alterations to support VQVAE on mel spectrograms 2021-08-01 07:54:21 -06:00
James Betker
dadc54795c Add gpt_tts 2021-07-27 20:33:30 -06:00