James Betker
0152174c0e
Add wandb_step_factor argument
2022-01-27 19:58:58 -07:00
James Betker
e0e36ed98c
Update use_diffuse_tts
2022-01-27 19:57:28 -07:00
James Betker
a77d376ad2
rename unet diffusion tts and add 3
2022-01-27 19:56:24 -07:00
James Betker
7badbf1b4d
update usage scripts
2022-01-25 17:57:26 -07:00
James Betker
8c255811ad
more fixes
2022-01-25 17:57:16 -07:00
James Betker
0f3ca28e39
Allow diffusion model to be trained with masking tokens
2022-01-25 14:26:21 -07:00
James Betker
798ed7730a
i like wasting time
2022-01-24 18:12:08 -07:00
James Betker
fc09cff4b3
angry
2022-01-24 18:09:29 -07:00
James Betker
cc0d9f7216
Fix
2022-01-24 18:05:45 -07:00
James Betker
3a9e3a9db3
consolidate state
2022-01-24 17:59:31 -07:00
James Betker
dfef34ba39
Load ema to cpu memory if specified
2022-01-24 15:08:29 -07:00
James Betker
49edffb6ad
Revise device mapping
2022-01-24 15:08:13 -07:00
James Betker
33511243d5
load model state dicts into the correct device
...
it's not clear to me that this will make a huge difference, but it's a good idea anyways
2022-01-24 14:40:09 -07:00
James Betker
3e16c509f6
Misc fixes
2022-01-24 14:31:43 -07:00
James Betker
e2ed0adbd8
use_diffuse_tts updates
2022-01-24 14:31:28 -07:00
James Betker
e420df479f
Allow steps to specify which state keys to carry forward (reducing memory utilization)
2022-01-24 11:01:27 -07:00
James Betker
62475005e4
Sort data items in descending order, which I suspect will improve performance because we will hit GC less
2022-01-23 19:05:32 -07:00
James Betker
d18aec793a
Revert "(re) attempt diffusion checkpointing logic"
...
This reverts commit b22eec8fe3
.
2022-01-22 09:14:50 -07:00
James Betker
b22eec8fe3
(re) attempt diffusion checkpointing logic
2022-01-22 08:34:40 -07:00
James Betker
8f48848f91
misc
2022-01-22 08:23:29 -07:00
James Betker
851070075a
text<->cond clip
...
I need that universal clip..
2022-01-22 08:23:14 -07:00
James Betker
8ada52ccdc
Update LR layers to checkpoint better
2022-01-22 08:22:57 -07:00
James Betker
ce929a6b3f
Allow grad scaler to be enabled even in fp32 mode
2022-01-21 23:13:24 -07:00
James Betker
91b4b240ac
dont pickle unique files
2022-01-21 00:02:06 -07:00
James Betker
7fef7fb9ff
Update fast_paired_dataset to report how many audio files it is actually using
2022-01-20 21:49:38 -07:00
James Betker
ed35cfe393
Update inference scripts
2022-01-20 11:28:50 -07:00
James Betker
20312211e0
Fix bug in code alignment
2022-01-20 11:28:12 -07:00
James Betker
8e2439f50d
Decrease resolution requirements to 2048
2022-01-20 11:27:49 -07:00
James Betker
4af8525dc3
Adjust diffusion vocoder to allow training individual levels
2022-01-19 13:37:59 -07:00
James Betker
ac13bfefe8
use_diffuse_tts
2022-01-19 00:35:24 -07:00
James Betker
bcd8cc51e1
Enable collated data for diffusion purposes
2022-01-19 00:35:08 -07:00
James Betker
dc9cd8c206
Update use_gpt_tts to be usable with unified_voice2
2022-01-18 21:14:17 -07:00
James Betker
7b4544b83a
Add an experimental unet_diffusion_tts to perform experiments on
2022-01-18 08:38:24 -07:00
James Betker
b6190e96b2
fast_paired
2022-01-17 15:46:02 -07:00
James Betker
1d30d79e34
De-specify fast-paired-dataset
2022-01-16 21:20:00 -07:00
James Betker
2b36ca5f8e
Revert paired back
2022-01-16 21:10:46 -07:00
James Betker
ad3e7df086
Split the fast random into its own new dataset
2022-01-16 21:10:11 -07:00
James Betker
7331862755
Updated paired to randomly index data, offsetting memory costs and speeding up initialization
2022-01-16 21:09:22 -07:00
James Betker
37e4e737b5
a few fixes
2022-01-16 15:17:17 -07:00
James Betker
35db5ebf41
paired_voice_audio_dataset - aligned codes support
2022-01-15 17:38:26 -07:00
James Betker
3f177cd2b3
requirements
2022-01-15 17:28:59 -07:00
James Betker
b398ecca01
wer fix
2022-01-15 17:28:17 -07:00
James Betker
9100e7fa9b
Add a diffusion network that takes aligned text instead of MELs
2022-01-15 17:28:02 -07:00
James Betker
87c83e4957
update wer script
2022-01-13 17:08:49 -07:00
James Betker
009a1e8404
Add a new diffusion_vocoder that should be trainable faster
...
This new one has a "cheating" top layer, that does not feed down into the unet encoder,
but does consume the outputs of the unet. This cheater only operates on half of the input,
while the rest of the unet operates on the full input. This limits the dimensionality of this last
layer, on the assumption that these last layers consume by far the most computation and memory,
but do not require the full input context.
Losses are only computed on half of the aggregate input.
2022-01-11 17:26:07 -07:00
James Betker
d4e27ccf62
misc updates
2022-01-11 16:25:40 -07:00
James Betker
91f28580e2
fix unified_voice
2022-01-10 16:17:31 -07:00
James Betker
136744dc1d
Fixes
2022-01-10 14:32:04 -07:00
James Betker
ee3dfac2ae
unified_voice2: decouple positional embeddings and token embeddings from underlying gpt model
2022-01-10 08:14:41 -07:00
James Betker
f503d8d96b
Partially implement performers in transformer_builders
2022-01-09 22:35:03 -07:00