James Betker
|
b24a51f0aa
|
Check in speech2speech CLIP inference tool
|
2021-12-29 00:19:44 -07:00 |
|
James Betker
|
c1bef01dfa
|
GptAsrHf2 checkin
|
2021-12-28 20:48:38 -07:00 |
|
James Betker
|
07c2b9907c
|
Add voice2voice clip model
|
2021-12-28 16:18:12 -07:00 |
|
James Betker
|
93624fa4b2
|
Don't use tqdm in ranks!=0
|
2021-12-28 10:06:54 -07:00 |
|
James Betker
|
6996dfd9d5
|
asr_hf2: add independent position embedders
|
2021-12-26 15:17:24 -07:00 |
|
James Betker
|
8b19c37409
|
UnifiedGptVoice!
|
2021-12-23 15:20:26 -07:00 |
|
James Betker
|
f9c45d70f0
|
Fix mel terminator
|
2021-12-18 17:18:06 -07:00 |
|
James Betker
|
5a664aa56e
|
misc
|
2021-12-11 08:17:26 -07:00 |
|
James Betker
|
b2d8fbcfc0
|
build a better speech synthesis toolset
|
2021-12-09 22:59:56 -07:00 |
|
James Betker
|
3b5c3d85d8
|
Allow specification of wandb run name
|
2021-11-22 17:31:29 -07:00 |
|
James Betker
|
19c80bf7a7
|
Improve wandb logging
|
2021-11-22 16:40:05 -07:00 |
|
James Betker
|
596a62fe01
|
Apply fix to gpt_asr_hf and prep it for inference
Fix is that we were predicting two characters in advance, not next character
|
2021-11-04 10:09:24 -06:00 |
|
James Betker
|
87364b890f
|
Add custom clip_grad_norm that prints out the param names in error.
|
2021-11-01 11:12:20 -06:00 |
|
James Betker
|
b8b268b5f6
|
Misc
|
2021-10-31 14:29:23 -06:00 |
|
James Betker
|
e9dc37f19c
|
Mod trainer to copy config file into experiments root
|
2021-10-30 17:00:24 -06:00 |
|
James Betker
|
2afea126d7
|
mod trainer to be very explicit about the fact that loading models and state together dont work, but allow it
|
2021-10-28 22:32:42 -06:00 |
|
James Betker
|
5d714bc566
|
Add deepspeech model and support for decoding with it
|
2021-10-27 13:09:46 -06:00 |
|
James Betker
|
c3421b7f6d
|
Dataset work for audio quality processor
|
2021-10-24 09:09:34 -06:00 |
|
James Betker
|
f2a31702b5
|
Clean stuff up, move more things into arch_util
|
2021-10-20 21:19:25 -06:00 |
|
James Betker
|
83798887a8
|
Mods to support unet diffusion vocoder with conditioning
|
2021-10-13 21:23:18 -06:00 |
|
James Betker
|
33120cb35c
|
Add norming to discretization_loss
|
2021-10-06 17:10:50 -06:00 |
|
James Betker
|
09f373e3b1
|
Add dvae with channel attention
|
2021-10-03 10:52:01 -06:00 |
|
James Betker
|
ac57cdc794
|
Add scheduling to quantizer, enable cudnn_benchmarking to be disabled
|
2021-09-24 17:01:36 -06:00 |
|
James Betker
|
c5297ccec6
|
Add dvae balancing heuristic
|
2021-09-23 21:19:36 -06:00 |
|
James Betker
|
6833048bf7
|
Alterations to diffusion_dvae so it can be used directly on spectrograms
|
2021-09-23 15:56:25 -06:00 |
|
James Betker
|
f78ce9d924
|
Get diffusion_dvae ready for prime time!
|
2021-09-16 22:43:10 -06:00 |
|
James Betker
|
b8f2e0f452
|
mydvae
|
2021-09-06 17:45:30 -06:00 |
|
James Betker
|
dabd87246d
|
Add unet_diffusion_vocoder
|
2021-08-31 14:38:33 -06:00 |
|
James Betker
|
d05cc1f46c
|
Misc
|
2021-08-24 17:12:04 -06:00 |
|
James Betker
|
9dfe936c16
|
Fix ddp for sampler
|
2021-08-19 16:45:34 -06:00 |
|
James Betker
|
570ed327ed
|
Stop dataset - attempt #2
|
2021-08-18 18:29:38 -06:00 |
|
James Betker
|
8332923f5c
|
Two more tools to test the audio segmentor
|
2021-08-17 09:09:11 -06:00 |
|
James Betker
|
1fede41b7b
|
Audio segmentor
|
2021-08-16 22:51:53 -06:00 |
|
James Betker
|
a523c4f932
|
Auto-normalize wav files by data type
|
2021-08-15 09:09:51 -06:00 |
|
James Betker
|
cdee31c60b
|
GPT_ASR
|
2021-08-13 15:02:18 -06:00 |
|
James Betker
|
f5a9b88ef6
|
tacotron cleaners: remove quotation marks
these don't really have relevance for tts or asr
|
2021-08-11 16:18:44 -06:00 |
|
James Betker
|
e19c00398e
|
More improvements to random_mp3_splitter
|
2021-08-09 21:31:12 -06:00 |
|
James Betker
|
04d14b3acc
|
No batch factors for eval
|
2021-08-09 16:02:01 -06:00 |
|
James Betker
|
82fc69abfa
|
Add "pure" evaluator
Which simply computes the training loss against an eval dataset
|
2021-08-09 14:58:35 -06:00 |
|
James Betker
|
b43683b772
|
Add lucidrains_dvae
|
2021-08-06 12:03:46 -06:00 |
|
James Betker
|
d120e1aa99
|
Add audio augmentation to wavfile_dataset, utility to test audio similary
|
2021-08-05 22:14:49 -06:00 |
|
James Betker
|
c0f61a2e15
|
Rework how DVAE tokens are ordered
It might make more sense to have top tokens, then bottom tokens
with top tokens having different discretized values.
|
2021-08-05 07:07:17 -06:00 |
|
James Betker
|
5037220ac7
|
Mods to support contrastive learning on audio files
|
2021-08-05 05:57:04 -06:00 |
|
James Betker
|
4c98b9703f
|
Get dalle-style TTS to "work"
|
2021-08-03 21:08:27 -06:00 |
|
James Betker
|
2814307eee
|
Alterations to support VQVAE on mel spectrograms
|
2021-08-01 07:54:21 -06:00 |
|
James Betker
|
dadc54795c
|
Add gpt_tts
|
2021-07-27 20:33:30 -06:00 |
|
James Betker
|
96e90e7047
|
Add support for a gaussian-diffusion-based wave tacotron
|
2021-07-26 16:27:31 -06:00 |
|
James Betker
|
97d7cbbc34
|
Additional work for audio xformer (which doesnt really do a great job)
|
2021-07-23 10:58:14 -06:00 |
|
James Betker
|
d81386c1be
|
Mods to support vqvae in audio mode (1d)
|
2021-07-20 08:36:46 -06:00 |
|
James Betker
|
1ff434218e
|
tacotron2, ready for prime time!
|
2021-07-08 22:13:44 -06:00 |
|