James Betker
0872e17e60
unified_voice mods
2022-02-19 20:37:35 -07:00
James Betker
7b12799370
Reformat mel_text_clip for use in eval
2022-02-19 20:37:26 -07:00
James Betker
baf7b65566
Attempt to make w2v play with DDP AND checkpointing
2022-02-18 18:47:11 -07:00
James Betker
f3776f1992
reset ctc loss from "mean" to "sum"
2022-02-17 22:00:58 -07:00
James Betker
2b20da679c
make spec_augment a parameter
2022-02-17 20:22:05 -07:00
James Betker
e1d71e1bd5
w2v_wrapper: get rid of ctc attention mask
2022-02-15 20:54:40 -07:00
James Betker
79e8f36d30
Convert CLIP models into new folder
2022-02-15 20:53:07 -07:00
James Betker
2bdb515068
A few mods to make wav2vec2 trainable with DDP on DLAS
2022-02-15 06:28:54 -07:00
James Betker
52b61b9f77
Update scripts and attempt to figure out how UnifiedVoice could be used to produce CTC codes
2022-02-13 20:48:06 -07:00
James Betker
a4f1641eea
Add & refine WER evaluator for w2v
2022-02-13 20:47:29 -07:00
James Betker
29534180b2
w2v fine tuner
2022-02-12 20:00:59 -07:00
James Betker
3252972057
ctc_code_gen mods
2022-02-12 19:59:54 -07:00
James Betker
302ac8652d
Undo mask during training
2022-02-11 09:35:12 -07:00
James Betker
618a20412a
new rev of ctc_code_gen with surrogate LM loss
2022-02-10 23:09:57 -07:00
James Betker
820a29f81e
ctc code gen mods
2022-02-10 09:44:01 -07:00
James Betker
ac9417b956
ctc_code_gen: mask out all padding tokens
2022-02-09 17:26:30 -07:00
James Betker
ddb77ef502
ctc_code_gen: use a mean() on the ConditioningEncoder
2022-02-09 14:26:44 -07:00
James Betker
9e9ae328f2
mild updates
2022-02-08 23:51:17 -07:00
James Betker
ff35d13b99
Use non-uniform noise in diffusion_tts6
2022-02-08 07:27:41 -07:00
James Betker
34fbb78671
Straight CtcCodeGenerator as an encoder
2022-02-07 15:46:46 -07:00
James Betker
65a546c4d7
Fix for tts6
2022-02-05 16:00:14 -07:00
James Betker
5ae816bead
ctc gen checkin
2022-02-05 15:59:53 -07:00
James Betker
bb3d1ab03d
More cleanup
2022-02-04 11:06:17 -07:00
James Betker
5cc342de66
Clean up
2022-02-04 11:00:42 -07:00
James Betker
8fb147e8ab
add an autoregressive ctc code generator
2022-02-04 11:00:15 -07:00
James Betker
7f4fc55344
Update SR model
2022-02-03 21:42:53 -07:00
James Betker
bc506d4bcd
Mods to unet_diffusion_tts6 to support super resolution mode
2022-02-03 19:59:39 -07:00
James Betker
4249681c4b
Mods to support a autoregressive CTC code generator
2022-02-03 19:58:54 -07:00
James Betker
8132766d38
tts6
2022-01-31 20:15:06 -07:00
James Betker
fbea6e8eac
Adjustments to diffusion networks
2022-01-30 16:14:06 -07:00
James Betker
e58dab14c3
new diffusion updates from testing
2022-01-29 11:01:01 -07:00
James Betker
935a4e853e
get rid of nil tokens in <2>
2022-01-27 22:45:57 -07:00
James Betker
a77d376ad2
rename unet diffusion tts and add 3
2022-01-27 19:56:24 -07:00
James Betker
8c255811ad
more fixes
2022-01-25 17:57:16 -07:00
James Betker
0f3ca28e39
Allow diffusion model to be trained with masking tokens
2022-01-25 14:26:21 -07:00
James Betker
d18aec793a
Revert "(re) attempt diffusion checkpointing logic"
...
This reverts commit b22eec8fe3
.
2022-01-22 09:14:50 -07:00
James Betker
b22eec8fe3
(re) attempt diffusion checkpointing logic
2022-01-22 08:34:40 -07:00
James Betker
8f48848f91
misc
2022-01-22 08:23:29 -07:00
James Betker
851070075a
text<->cond clip
...
I need that universal clip..
2022-01-22 08:23:14 -07:00
James Betker
8ada52ccdc
Update LR layers to checkpoint better
2022-01-22 08:22:57 -07:00
James Betker
8e2439f50d
Decrease resolution requirements to 2048
2022-01-20 11:27:49 -07:00
James Betker
4af8525dc3
Adjust diffusion vocoder to allow training individual levels
2022-01-19 13:37:59 -07:00
James Betker
ac13bfefe8
use_diffuse_tts
2022-01-19 00:35:24 -07:00
James Betker
bcd8cc51e1
Enable collated data for diffusion purposes
2022-01-19 00:35:08 -07:00
James Betker
dc9cd8c206
Update use_gpt_tts to be usable with unified_voice2
2022-01-18 21:14:17 -07:00
James Betker
7b4544b83a
Add an experimental unet_diffusion_tts to perform experiments on
2022-01-18 08:38:24 -07:00
James Betker
37e4e737b5
a few fixes
2022-01-16 15:17:17 -07:00
James Betker
9100e7fa9b
Add a diffusion network that takes aligned text instead of MELs
2022-01-15 17:28:02 -07:00
James Betker
009a1e8404
Add a new diffusion_vocoder that should be trainable faster
...
This new one has a "cheating" top layer, that does not feed down into the unet encoder,
but does consume the outputs of the unet. This cheater only operates on half of the input,
while the rest of the unet operates on the full input. This limits the dimensionality of this last
layer, on the assumption that these last layers consume by far the most computation and memory,
but do not require the full input context.
Losses are only computed on half of the aggregate input.
2022-01-11 17:26:07 -07:00
James Betker
91f28580e2
fix unified_voice
2022-01-10 16:17:31 -07:00