James Betker
2b20da679c
make spec_augment a parameter
2022-02-17 20:22:05 -07:00
James Betker
a813fbed9c
Update to evaluator
2022-02-17 17:30:33 -07:00
James Betker
e1d71e1bd5
w2v_wrapper: get rid of ctc attention mask
2022-02-15 20:54:40 -07:00
James Betker
79e8f36d30
Convert CLIP models into new folder
2022-02-15 20:53:07 -07:00
James Betker
8f767b8b4f
...
2022-02-15 07:08:17 -07:00
James Betker
29e07913a8
Fix
2022-02-15 06:58:11 -07:00
James Betker
dd585df772
LAMB optimizer
2022-02-15 06:48:13 -07:00
James Betker
2bdb515068
A few mods to make wav2vec2 trainable with DDP on DLAS
2022-02-15 06:28:54 -07:00
James Betker
52b61b9f77
Update scripts and attempt to figure out how UnifiedVoice could be used to produce CTC codes
2022-02-13 20:48:06 -07:00
James Betker
a4f1641eea
Add & refine WER evaluator for w2v
2022-02-13 20:47:29 -07:00
James Betker
e16af944c0
BSO fix
2022-02-12 20:01:04 -07:00
James Betker
29534180b2
w2v fine tuner
2022-02-12 20:00:59 -07:00
James Betker
0c3cc5ebad
use script updates to fix output size disparities
2022-02-12 20:00:46 -07:00
James Betker
15fd60aad3
Allow EMA training to be disabled
2022-02-12 20:00:23 -07:00
James Betker
3252972057
ctc_code_gen mods
2022-02-12 19:59:54 -07:00
James Betker
35170c77b3
fix sweep
2022-02-11 11:43:11 -07:00
James Betker
c6b6d120fe
fix ranking
2022-02-11 11:34:57 -07:00
James Betker
095944569c
deep_update dicts
2022-02-11 11:32:25 -07:00
James Betker
ab1f6e8ac6
deepcopy map
2022-02-11 11:29:32 -07:00
James Betker
496fb81997
use fork instead
2022-02-11 11:22:25 -07:00
James Betker
4abc094b47
fix train bug
2022-02-11 11:18:15 -07:00
James Betker
006add64c5
sweep fix
2022-02-11 11:17:08 -07:00
James Betker
102142d1eb
f
2022-02-11 11:05:13 -07:00
James Betker
40b08a52d0
dafuk
2022-02-11 11:01:31 -07:00
James Betker
f6a7f12cad
Remove broken evaluator
2022-02-11 11:00:29 -07:00
James Betker
46b97049dc
Fix eval
2022-02-11 10:59:32 -07:00
James Betker
5175b7d91a
training sweeper checkin
2022-02-11 10:46:37 -07:00
James Betker
302ac8652d
Undo mask during training
2022-02-11 09:35:12 -07:00
James Betker
618a20412a
new rev of ctc_code_gen with surrogate LM loss
2022-02-10 23:09:57 -07:00
James Betker
d1d1ae32a1
audio diffusion frechet distance measurement!
2022-02-10 22:55:46 -07:00
James Betker
23a310b488
Fix BSO
2022-02-10 20:54:51 -07:00
James Betker
1e28e02f98
BSO improvement to make it work with distributed optimizers
2022-02-10 09:53:13 -07:00
James Betker
836eb08afb
Update BSO to use the proper step size
2022-02-10 09:44:15 -07:00
James Betker
820a29f81e
ctc code gen mods
2022-02-10 09:44:01 -07:00
James Betker
ac9417b956
ctc_code_gen: mask out all padding tokens
2022-02-09 17:26:30 -07:00
James Betker
a930f2576e
Begin a migration to specifying training rate on megasamples instead of arbitrary "steps"
...
This should help me greatly in tuning models. It's also necessary now that batch size isn't really
respected; we simply step once the gradient direction becomes unstable.
2022-02-09 17:25:05 -07:00
James Betker
93ca619267
script updates
2022-02-09 14:26:52 -07:00
James Betker
ddb77ef502
ctc_code_gen: use a mean() on the ConditioningEncoder
2022-02-09 14:26:44 -07:00
James Betker
3d946356f8
batch_size_optimizer works. sweet! no more tuning batch sizes.
2022-02-09 14:26:23 -07:00
James Betker
18938248e4
Add batch_size_optimizer support
2022-02-08 23:51:31 -07:00
James Betker
9e9ae328f2
mild updates
2022-02-08 23:51:17 -07:00
James Betker
ff35d13b99
Use non-uniform noise in diffusion_tts6
2022-02-08 07:27:41 -07:00
James Betker
f44b064c5e
Update scripts
2022-02-07 19:43:18 -07:00
James Betker
34fbb78671
Straight CtcCodeGenerator as an encoder
2022-02-07 15:46:46 -07:00
James Betker
c24682c668
Record load times in fast_paired_dataset
2022-02-07 15:45:38 -07:00
James Betker
65a546c4d7
Fix for tts6
2022-02-05 16:00:14 -07:00
James Betker
5ae816bead
ctc gen checkin
2022-02-05 15:59:53 -07:00
James Betker
bb3d1ab03d
More cleanup
2022-02-04 11:06:17 -07:00
James Betker
5cc342de66
Clean up
2022-02-04 11:00:42 -07:00
James Betker
8fb147e8ab
add an autoregressive ctc code generator
2022-02-04 11:00:15 -07:00
James Betker
7f4fc55344
Update SR model
2022-02-03 21:42:53 -07:00
James Betker
de1a1d501a
Move audio injectors into their own file
2022-02-03 21:42:37 -07:00
James Betker
687393de59
Add a better split_on_silence (processing_pipeline)
...
Going to extend this a bit more going forwards to support the entire pipeline.
2022-02-03 20:00:26 -07:00
James Betker
1d29999648
Uupdates to the TTS production scripts
2022-02-03 20:00:01 -07:00
James Betker
bc506d4bcd
Mods to unet_diffusion_tts6 to support super resolution mode
2022-02-03 19:59:39 -07:00
James Betker
4249681c4b
Mods to support a autoregressive CTC code generator
2022-02-03 19:58:54 -07:00
James Betker
8132766d38
tts6
2022-01-31 20:15:06 -07:00
James Betker
fbea6e8eac
Adjustments to diffusion networks
2022-01-30 16:14:06 -07:00
James Betker
e58dab14c3
new diffusion updates from testing
2022-01-29 11:01:01 -07:00
James Betker
935a4e853e
get rid of nil tokens in <2>
2022-01-27 22:45:57 -07:00
James Betker
0152174c0e
Add wandb_step_factor argument
2022-01-27 19:58:58 -07:00
James Betker
e0e36ed98c
Update use_diffuse_tts
2022-01-27 19:57:28 -07:00
James Betker
a77d376ad2
rename unet diffusion tts and add 3
2022-01-27 19:56:24 -07:00
James Betker
7badbf1b4d
update usage scripts
2022-01-25 17:57:26 -07:00
James Betker
8c255811ad
more fixes
2022-01-25 17:57:16 -07:00
James Betker
0f3ca28e39
Allow diffusion model to be trained with masking tokens
2022-01-25 14:26:21 -07:00
James Betker
798ed7730a
i like wasting time
2022-01-24 18:12:08 -07:00
James Betker
fc09cff4b3
angry
2022-01-24 18:09:29 -07:00
James Betker
cc0d9f7216
Fix
2022-01-24 18:05:45 -07:00
James Betker
3a9e3a9db3
consolidate state
2022-01-24 17:59:31 -07:00
James Betker
dfef34ba39
Load ema to cpu memory if specified
2022-01-24 15:08:29 -07:00
James Betker
49edffb6ad
Revise device mapping
2022-01-24 15:08:13 -07:00
James Betker
33511243d5
load model state dicts into the correct device
...
it's not clear to me that this will make a huge difference, but it's a good idea anyways
2022-01-24 14:40:09 -07:00
James Betker
3e16c509f6
Misc fixes
2022-01-24 14:31:43 -07:00
James Betker
e2ed0adbd8
use_diffuse_tts updates
2022-01-24 14:31:28 -07:00
James Betker
e420df479f
Allow steps to specify which state keys to carry forward (reducing memory utilization)
2022-01-24 11:01:27 -07:00
James Betker
62475005e4
Sort data items in descending order, which I suspect will improve performance because we will hit GC less
2022-01-23 19:05:32 -07:00
James Betker
d18aec793a
Revert "(re) attempt diffusion checkpointing logic"
...
This reverts commit b22eec8fe3
.
2022-01-22 09:14:50 -07:00
James Betker
b22eec8fe3
(re) attempt diffusion checkpointing logic
2022-01-22 08:34:40 -07:00
James Betker
8f48848f91
misc
2022-01-22 08:23:29 -07:00
James Betker
851070075a
text<->cond clip
...
I need that universal clip..
2022-01-22 08:23:14 -07:00
James Betker
8ada52ccdc
Update LR layers to checkpoint better
2022-01-22 08:22:57 -07:00
James Betker
ce929a6b3f
Allow grad scaler to be enabled even in fp32 mode
2022-01-21 23:13:24 -07:00
James Betker
91b4b240ac
dont pickle unique files
2022-01-21 00:02:06 -07:00
James Betker
7fef7fb9ff
Update fast_paired_dataset to report how many audio files it is actually using
2022-01-20 21:49:38 -07:00
James Betker
ed35cfe393
Update inference scripts
2022-01-20 11:28:50 -07:00
James Betker
20312211e0
Fix bug in code alignment
2022-01-20 11:28:12 -07:00
James Betker
8e2439f50d
Decrease resolution requirements to 2048
2022-01-20 11:27:49 -07:00
James Betker
4af8525dc3
Adjust diffusion vocoder to allow training individual levels
2022-01-19 13:37:59 -07:00
James Betker
ac13bfefe8
use_diffuse_tts
2022-01-19 00:35:24 -07:00
James Betker
bcd8cc51e1
Enable collated data for diffusion purposes
2022-01-19 00:35:08 -07:00
James Betker
dc9cd8c206
Update use_gpt_tts to be usable with unified_voice2
2022-01-18 21:14:17 -07:00
James Betker
7b4544b83a
Add an experimental unet_diffusion_tts to perform experiments on
2022-01-18 08:38:24 -07:00
James Betker
b6190e96b2
fast_paired
2022-01-17 15:46:02 -07:00
James Betker
1d30d79e34
De-specify fast-paired-dataset
2022-01-16 21:20:00 -07:00
James Betker
2b36ca5f8e
Revert paired back
2022-01-16 21:10:46 -07:00
James Betker
ad3e7df086
Split the fast random into its own new dataset
2022-01-16 21:10:11 -07:00
James Betker
7331862755
Updated paired to randomly index data, offsetting memory costs and speeding up initialization
2022-01-16 21:09:22 -07:00
James Betker
37e4e737b5
a few fixes
2022-01-16 15:17:17 -07:00
James Betker
35db5ebf41
paired_voice_audio_dataset - aligned codes support
2022-01-15 17:38:26 -07:00