James Betker
34ee32a90e
get rid of autocasting in tts7
2022-02-24 21:53:51 -07:00
James Betker
f458f5d8f1
abort early if losses reach nan too much, and save the model
2022-02-24 20:55:30 -07:00
James Betker
18dc62453f
Don't step if NaN losses are encountered.
2022-02-24 17:45:08 -07:00
James Betker
ea500ad42a
Use clustered masking in udtts7
2022-02-24 07:57:26 -07:00
James Betker
7c17c8e674
gurgl
2022-02-23 21:28:24 -07:00
James Betker
e6824e398f
Load dvae to cpu
2022-02-23 21:21:45 -07:00
James Betker
81017d9696
put frechet_distance on cuda
2022-02-23 21:21:13 -07:00
James Betker
9a7bbf33df
f
2022-02-23 18:03:38 -07:00
James Betker
68726eac74
.
2022-02-23 17:58:07 -07:00
James Betker
b7319ab518
Support vocoder type diffusion in audio_diffusion_fid
2022-02-23 17:25:16 -07:00
James Betker
58f6c9805b
adf
2022-02-22 23:12:58 -07:00
James Betker
03752c1cd6
Report NaN
2022-02-22 23:09:37 -07:00
James Betker
7201b4500c
default text_to_sequence cleaners
2022-02-21 19:14:22 -07:00
James Betker
ba7f54c162
w2v: new inference function
2022-02-21 19:13:03 -07:00
James Betker
896ac029ae
allow continuation of samples encountered
2022-02-21 19:12:50 -07:00
James Betker
6313a94f96
eval: integrate a n-gram language model into decoding
2022-02-21 19:12:34 -07:00
James Betker
af50afe222
pairedvoice: error out if clip is too short
2022-02-21 19:11:10 -07:00
James Betker
38802a96c8
remove timesteps from cond calculation
2022-02-21 12:32:21 -07:00
James Betker
668876799d
unet_diffusion_tts7
2022-02-20 15:22:38 -07:00
James Betker
0872e17e60
unified_voice mods
2022-02-19 20:37:35 -07:00
James Betker
7b12799370
Reformat mel_text_clip for use in eval
2022-02-19 20:37:26 -07:00
James Betker
bcba65c539
DataParallel Fix
2022-02-19 20:36:35 -07:00
James Betker
34001ad765
et
2022-02-18 18:52:33 -07:00
James Betker
baf7b65566
Attempt to make w2v play with DDP AND checkpointing
2022-02-18 18:47:11 -07:00
James Betker
f3776f1992
reset ctc loss from "mean" to "sum"
2022-02-17 22:00:58 -07:00
James Betker
2b20da679c
make spec_augment a parameter
2022-02-17 20:22:05 -07:00
James Betker
a813fbed9c
Update to evaluator
2022-02-17 17:30:33 -07:00
James Betker
e1d71e1bd5
w2v_wrapper: get rid of ctc attention mask
2022-02-15 20:54:40 -07:00
James Betker
79e8f36d30
Convert CLIP models into new folder
2022-02-15 20:53:07 -07:00
James Betker
8f767b8b4f
...
2022-02-15 07:08:17 -07:00
James Betker
29e07913a8
Fix
2022-02-15 06:58:11 -07:00
James Betker
dd585df772
LAMB optimizer
2022-02-15 06:48:13 -07:00
James Betker
2bdb515068
A few mods to make wav2vec2 trainable with DDP on DLAS
2022-02-15 06:28:54 -07:00
James Betker
52b61b9f77
Update scripts and attempt to figure out how UnifiedVoice could be used to produce CTC codes
2022-02-13 20:48:06 -07:00
James Betker
a4f1641eea
Add & refine WER evaluator for w2v
2022-02-13 20:47:29 -07:00
James Betker
e16af944c0
BSO fix
2022-02-12 20:01:04 -07:00
James Betker
29534180b2
w2v fine tuner
2022-02-12 20:00:59 -07:00
James Betker
0c3cc5ebad
use script updates to fix output size disparities
2022-02-12 20:00:46 -07:00
James Betker
15fd60aad3
Allow EMA training to be disabled
2022-02-12 20:00:23 -07:00
James Betker
3252972057
ctc_code_gen mods
2022-02-12 19:59:54 -07:00
James Betker
35170c77b3
fix sweep
2022-02-11 11:43:11 -07:00
James Betker
c6b6d120fe
fix ranking
2022-02-11 11:34:57 -07:00
James Betker
095944569c
deep_update dicts
2022-02-11 11:32:25 -07:00
James Betker
ab1f6e8ac6
deepcopy map
2022-02-11 11:29:32 -07:00
James Betker
496fb81997
use fork instead
2022-02-11 11:22:25 -07:00
James Betker
4abc094b47
fix train bug
2022-02-11 11:18:15 -07:00
James Betker
006add64c5
sweep fix
2022-02-11 11:17:08 -07:00
James Betker
102142d1eb
f
2022-02-11 11:05:13 -07:00
James Betker
40b08a52d0
dafuk
2022-02-11 11:01:31 -07:00
James Betker
f6a7f12cad
Remove broken evaluator
2022-02-11 11:00:29 -07:00
James Betker
46b97049dc
Fix eval
2022-02-11 10:59:32 -07:00
James Betker
5175b7d91a
training sweeper checkin
2022-02-11 10:46:37 -07:00
James Betker
302ac8652d
Undo mask during training
2022-02-11 09:35:12 -07:00
James Betker
618a20412a
new rev of ctc_code_gen with surrogate LM loss
2022-02-10 23:09:57 -07:00
James Betker
d1d1ae32a1
audio diffusion frechet distance measurement!
2022-02-10 22:55:46 -07:00
James Betker
23a310b488
Fix BSO
2022-02-10 20:54:51 -07:00
James Betker
1e28e02f98
BSO improvement to make it work with distributed optimizers
2022-02-10 09:53:13 -07:00
James Betker
836eb08afb
Update BSO to use the proper step size
2022-02-10 09:44:15 -07:00
James Betker
820a29f81e
ctc code gen mods
2022-02-10 09:44:01 -07:00
James Betker
ac9417b956
ctc_code_gen: mask out all padding tokens
2022-02-09 17:26:30 -07:00
James Betker
a930f2576e
Begin a migration to specifying training rate on megasamples instead of arbitrary "steps"
...
This should help me greatly in tuning models. It's also necessary now that batch size isn't really
respected; we simply step once the gradient direction becomes unstable.
2022-02-09 17:25:05 -07:00
James Betker
93ca619267
script updates
2022-02-09 14:26:52 -07:00
James Betker
ddb77ef502
ctc_code_gen: use a mean() on the ConditioningEncoder
2022-02-09 14:26:44 -07:00
James Betker
3d946356f8
batch_size_optimizer works. sweet! no more tuning batch sizes.
2022-02-09 14:26:23 -07:00
James Betker
18938248e4
Add batch_size_optimizer support
2022-02-08 23:51:31 -07:00
James Betker
9e9ae328f2
mild updates
2022-02-08 23:51:17 -07:00
James Betker
ff35d13b99
Use non-uniform noise in diffusion_tts6
2022-02-08 07:27:41 -07:00
James Betker
f44b064c5e
Update scripts
2022-02-07 19:43:18 -07:00
James Betker
34fbb78671
Straight CtcCodeGenerator as an encoder
2022-02-07 15:46:46 -07:00
James Betker
c24682c668
Record load times in fast_paired_dataset
2022-02-07 15:45:38 -07:00
James Betker
65a546c4d7
Fix for tts6
2022-02-05 16:00:14 -07:00
James Betker
5ae816bead
ctc gen checkin
2022-02-05 15:59:53 -07:00
James Betker
bb3d1ab03d
More cleanup
2022-02-04 11:06:17 -07:00
James Betker
5cc342de66
Clean up
2022-02-04 11:00:42 -07:00
James Betker
8fb147e8ab
add an autoregressive ctc code generator
2022-02-04 11:00:15 -07:00
James Betker
7f4fc55344
Update SR model
2022-02-03 21:42:53 -07:00
James Betker
de1a1d501a
Move audio injectors into their own file
2022-02-03 21:42:37 -07:00
James Betker
687393de59
Add a better split_on_silence (processing_pipeline)
...
Going to extend this a bit more going forwards to support the entire pipeline.
2022-02-03 20:00:26 -07:00
James Betker
1d29999648
Uupdates to the TTS production scripts
2022-02-03 20:00:01 -07:00
James Betker
bc506d4bcd
Mods to unet_diffusion_tts6 to support super resolution mode
2022-02-03 19:59:39 -07:00
James Betker
4249681c4b
Mods to support a autoregressive CTC code generator
2022-02-03 19:58:54 -07:00
James Betker
8132766d38
tts6
2022-01-31 20:15:06 -07:00
James Betker
fbea6e8eac
Adjustments to diffusion networks
2022-01-30 16:14:06 -07:00
James Betker
e58dab14c3
new diffusion updates from testing
2022-01-29 11:01:01 -07:00
James Betker
935a4e853e
get rid of nil tokens in <2>
2022-01-27 22:45:57 -07:00
James Betker
0152174c0e
Add wandb_step_factor argument
2022-01-27 19:58:58 -07:00
James Betker
e0e36ed98c
Update use_diffuse_tts
2022-01-27 19:57:28 -07:00
James Betker
a77d376ad2
rename unet diffusion tts and add 3
2022-01-27 19:56:24 -07:00
James Betker
7badbf1b4d
update usage scripts
2022-01-25 17:57:26 -07:00
James Betker
8c255811ad
more fixes
2022-01-25 17:57:16 -07:00
James Betker
0f3ca28e39
Allow diffusion model to be trained with masking tokens
2022-01-25 14:26:21 -07:00
James Betker
798ed7730a
i like wasting time
2022-01-24 18:12:08 -07:00
James Betker
fc09cff4b3
angry
2022-01-24 18:09:29 -07:00
James Betker
cc0d9f7216
Fix
2022-01-24 18:05:45 -07:00
James Betker
3a9e3a9db3
consolidate state
2022-01-24 17:59:31 -07:00
James Betker
dfef34ba39
Load ema to cpu memory if specified
2022-01-24 15:08:29 -07:00
James Betker
49edffb6ad
Revise device mapping
2022-01-24 15:08:13 -07:00
James Betker
33511243d5
load model state dicts into the correct device
...
it's not clear to me that this will make a huge difference, but it's a good idea anyways
2022-01-24 14:40:09 -07:00
James Betker
3e16c509f6
Misc fixes
2022-01-24 14:31:43 -07:00
James Betker
e2ed0adbd8
use_diffuse_tts updates
2022-01-24 14:31:28 -07:00
James Betker
e420df479f
Allow steps to specify which state keys to carry forward (reducing memory utilization)
2022-01-24 11:01:27 -07:00
James Betker
62475005e4
Sort data items in descending order, which I suspect will improve performance because we will hit GC less
2022-01-23 19:05:32 -07:00
James Betker
d18aec793a
Revert "(re) attempt diffusion checkpointing logic"
...
This reverts commit b22eec8fe3
.
2022-01-22 09:14:50 -07:00
James Betker
b22eec8fe3
(re) attempt diffusion checkpointing logic
2022-01-22 08:34:40 -07:00
James Betker
8f48848f91
misc
2022-01-22 08:23:29 -07:00
James Betker
851070075a
text<->cond clip
...
I need that universal clip..
2022-01-22 08:23:14 -07:00
James Betker
8ada52ccdc
Update LR layers to checkpoint better
2022-01-22 08:22:57 -07:00
James Betker
ce929a6b3f
Allow grad scaler to be enabled even in fp32 mode
2022-01-21 23:13:24 -07:00
James Betker
91b4b240ac
dont pickle unique files
2022-01-21 00:02:06 -07:00
James Betker
7fef7fb9ff
Update fast_paired_dataset to report how many audio files it is actually using
2022-01-20 21:49:38 -07:00
James Betker
ed35cfe393
Update inference scripts
2022-01-20 11:28:50 -07:00
James Betker
20312211e0
Fix bug in code alignment
2022-01-20 11:28:12 -07:00
James Betker
8e2439f50d
Decrease resolution requirements to 2048
2022-01-20 11:27:49 -07:00
James Betker
4af8525dc3
Adjust diffusion vocoder to allow training individual levels
2022-01-19 13:37:59 -07:00
James Betker
ac13bfefe8
use_diffuse_tts
2022-01-19 00:35:24 -07:00
James Betker
bcd8cc51e1
Enable collated data for diffusion purposes
2022-01-19 00:35:08 -07:00
James Betker
dc9cd8c206
Update use_gpt_tts to be usable with unified_voice2
2022-01-18 21:14:17 -07:00
James Betker
7b4544b83a
Add an experimental unet_diffusion_tts to perform experiments on
2022-01-18 08:38:24 -07:00
James Betker
b6190e96b2
fast_paired
2022-01-17 15:46:02 -07:00
James Betker
1d30d79e34
De-specify fast-paired-dataset
2022-01-16 21:20:00 -07:00
James Betker
2b36ca5f8e
Revert paired back
2022-01-16 21:10:46 -07:00
James Betker
ad3e7df086
Split the fast random into its own new dataset
2022-01-16 21:10:11 -07:00
James Betker
7331862755
Updated paired to randomly index data, offsetting memory costs and speeding up initialization
2022-01-16 21:09:22 -07:00
James Betker
37e4e737b5
a few fixes
2022-01-16 15:17:17 -07:00
James Betker
35db5ebf41
paired_voice_audio_dataset - aligned codes support
2022-01-15 17:38:26 -07:00
James Betker
3f177cd2b3
requirements
2022-01-15 17:28:59 -07:00
James Betker
b398ecca01
wer fix
2022-01-15 17:28:17 -07:00
James Betker
9100e7fa9b
Add a diffusion network that takes aligned text instead of MELs
2022-01-15 17:28:02 -07:00
James Betker
87c83e4957
update wer script
2022-01-13 17:08:49 -07:00
James Betker
009a1e8404
Add a new diffusion_vocoder that should be trainable faster
...
This new one has a "cheating" top layer, that does not feed down into the unet encoder,
but does consume the outputs of the unet. This cheater only operates on half of the input,
while the rest of the unet operates on the full input. This limits the dimensionality of this last
layer, on the assumption that these last layers consume by far the most computation and memory,
but do not require the full input context.
Losses are only computed on half of the aggregate input.
2022-01-11 17:26:07 -07:00
James Betker
d4e27ccf62
misc updates
2022-01-11 16:25:40 -07:00
James Betker
91f28580e2
fix unified_voice
2022-01-10 16:17:31 -07:00
James Betker
136744dc1d
Fixes
2022-01-10 14:32:04 -07:00
James Betker
ee3dfac2ae
unified_voice2: decouple positional embeddings and token embeddings from underlying gpt model
2022-01-10 08:14:41 -07:00
James Betker
f503d8d96b
Partially implement performers in transformer_builders
2022-01-09 22:35:03 -07:00
James Betker
ec456b6733
Revert unified_voice back to beginning
...
I'll be doing my work within unified_voice2
2022-01-09 22:34:30 -07:00
James Betker
432073c5ca
Make performer code functional
2022-01-09 22:32:50 -07:00
James Betker
f474a7ac65
unified_voice2
2022-01-09 22:32:34 -07:00
James Betker
c075fe72e2
import performer repo
2022-01-09 22:10:07 -07:00
James Betker
7de3874f15
Make dalle transformer checkpointable
2022-01-09 19:14:35 -07:00
James Betker
70b17da193
Alter unified_voice to use extensible transformer (still WIP)
2022-01-08 22:18:25 -07:00
James Betker
15d9517e26
Allow bi-directional clipping
2022-01-08 22:18:04 -07:00
James Betker
894d245062
More zero_grad fixes
2022-01-08 20:31:19 -07:00
James Betker
8bade38180
Add generic CLIP model based off of x_clip
2022-01-08 19:08:01 -07:00
James Betker
2a9a25e6e7
Fix likely defective nan grad recovery
2022-01-08 18:24:58 -07:00
James Betker
438dd9ed33
fix text-voice-clip bug
2022-01-08 08:55:00 -07:00
James Betker
34774f9948
unified_voice: begin decoupling from HF GPT
...
I'd like to try some different (newer) transformer variants. The way to get
there is softly decoupling the transformer portion of this architecture
from GPT. This actually should be fairly easy.
2022-01-07 22:51:24 -07:00
James Betker
1f6a5310b8
More fixes to use_gpt_tts
2022-01-07 22:30:55 -07:00
James Betker
68090ac3e9
Finish up the text->voice clip model
2022-01-07 22:28:45 -07:00
James Betker
65ffe38fce
misc
2022-01-06 22:16:17 -07:00