James Betker
f95d3d2b82
move waveglow to audio/vocoders
2022-03-15 11:03:07 -06:00
James Betker
0419a64107
misc
2022-03-15 10:36:34 -06:00
James Betker
bb03cbb9fc
composable initial checkin
2022-03-15 10:35:40 -06:00
James Betker
86b0d76fb9
tts8 (incomplete, may be removed)
2022-03-15 10:35:31 -06:00
James Betker
eecbc0e678
Use wider spectrogram when asked
2022-03-15 10:35:11 -06:00
James Betker
9767260c6c
tacotron stft - loosen bounds restrictions and clip
2022-03-15 10:31:26 -06:00
James Betker
f8631ad4f7
Updates to support inputting MELs into the conditioning encoder
2022-03-14 17:31:42 -06:00
James Betker
e045fb0ad7
fix clip grad norm with scaler
2022-03-13 16:28:23 -06:00
James Betker
22c67ce8d3
tts9 mods
2022-03-13 10:25:55 -06:00
James Betker
08599b4c75
fix random_audio_crop injector
2022-03-12 20:42:29 -07:00
James Betker
8f130e2b3f
add scale_shift_norm back to tts9
2022-03-12 20:42:13 -07:00
James Betker
9bbbe26012
update audio_with_noise
2022-03-12 20:41:47 -07:00
James Betker
e754c4fbbc
sweep update
2022-03-12 15:33:00 -07:00
James Betker
73bfd4a86d
another tts9 update
2022-03-12 15:17:06 -07:00
James Betker
0523777ff7
add efficient config to tts9
2022-03-12 15:10:35 -07:00
James Betker
896accb71f
data and prep improvements
2022-03-12 15:10:11 -07:00
James Betker
1e87b934db
potentially average conditioning inputs
2022-03-10 20:37:41 -07:00
James Betker
e6a95f7c11
Update tts9: Remove torchscript provisions and add mechanism to train solely on codes
2022-03-09 09:43:38 -07:00
James Betker
726e30c4f7
Update noise augmentation dataset to include voices that are appended at the end of another clip.
2022-03-09 09:43:10 -07:00
James Betker
c4e4cf91a0
add support for the original vocoder to audio_diffusion_fid; also add a new "intelligibility" metric
2022-03-08 15:53:27 -07:00
James Betker
3e5da71b16
add grad scaler scale to metrics
2022-03-08 15:52:42 -07:00
James Betker
d2bdeb6f20
misc audio support
2022-03-08 15:52:26 -07:00
James Betker
d553808d24
misc
2022-03-08 15:52:16 -07:00
James Betker
7dabc17626
phase2 filter initial commit
2022-03-08 15:51:55 -07:00
James Betker
f56edb2122
minicoder with classifier head: spread out probability mass for 0 predictions
2022-03-08 15:51:31 -07:00
James Betker
29b2921222
move diffusion vocoder
2022-03-08 15:51:05 -07:00
James Betker
94222b0216
tts9 initial commit
2022-03-08 15:50:45 -07:00
James Betker
38fd9fc985
Improve efficiency of audio_with_noise_dataset
2022-03-08 15:50:13 -07:00
James Betker
b3def182de
move processing pipeline to "phase_1"
2022-03-08 15:49:51 -07:00
James Betker
30ddac69aa
lots of bad entries
2022-03-05 23:15:59 -07:00
James Betker
dcf98df0c2
++
2022-03-05 23:12:34 -07:00
James Betker
64d764ccd7
fml
2022-03-05 23:11:10 -07:00
James Betker
ef63ff84e2
pvd2
2022-03-05 23:08:39 -07:00
James Betker
1a05712764
pvd
2022-03-05 23:05:29 -07:00
James Betker
d1dc8dbb35
Support tts9
2022-03-05 20:14:36 -07:00
James Betker
93a3302819
Push training_state data to CPU memory before saving it
...
For whatever reason, keeping this on GPU memory just doesn't work.
When you load it, it consumes a large amount of GPU memory and that
utilization doesn't go away. Saving to CPU should fix this.
2022-03-04 17:57:33 -07:00
James Betker
6000580e2e
df
2022-03-04 13:47:00 -07:00
James Betker
382681a35d
Load diffusion_fid DVAE into the correct cuda device
2022-03-04 13:42:14 -07:00
James Betker
e1052a5e32
Move log consensus to train for efficiency
2022-03-04 13:41:32 -07:00
James Betker
ce6dfdf255
Distributed "fixes"
2022-03-04 12:46:41 -07:00
James Betker
3ff878ae85
Accumulate loss & grad_norm metrics from all entities within a distributed graph
2022-03-04 12:01:16 -07:00
James Betker
79e5692388
Fix distributed bug
2022-03-04 11:58:53 -07:00
James Betker
f87e10ffef
Make deterministic sampler work with distributed training & microbatches
2022-03-04 11:50:50 -07:00
James Betker
77c18b53b3
Cap grad booster
2022-03-04 10:40:24 -07:00
James Betker
2d1cb83c1d
Add a deterministic timestep sampler, with provisions to employ it every n steps
2022-03-04 10:40:14 -07:00
James Betker
f490eaeba7
Shuffle optimizer states back and forth between cpu memory during steps
2022-03-04 10:38:51 -07:00
James Betker
3c242403f5
adjust location of pre-optimizer step so I can visualize the new grad norms
2022-03-04 08:56:42 -07:00
James Betker
58019a2ce3
audio diffusion fid updates
2022-03-03 21:53:32 -07:00
James Betker
998c53ad4f
w2v_matcher mods
2022-03-03 21:52:51 -07:00
James Betker
9029e4f20c
Add a base-wrapper
2022-03-03 21:52:28 -07:00
James Betker
6873ad6660
Support functionality
2022-03-03 21:52:16 -07:00
James Betker
6af5d129ce
Add experimental gradient boosting into tts7
2022-03-03 21:51:40 -07:00
James Betker
7ea84f1ac3
asdf
2022-03-03 13:43:44 -07:00
James Betker
3cd6c7f428
Get rid of unused codes in vq
2022-03-03 13:41:38 -07:00
James Betker
619da9ea28
Get rid of discretization loss
2022-03-03 13:36:25 -07:00
James Betker
beb7c8a39d
asdf
2022-03-01 21:41:31 -07:00
James Betker
70fa780edb
Add mechanism to export grad norms
2022-03-01 20:19:52 -07:00
James Betker
d9f8f92840
Codified fp16
2022-03-01 15:46:04 -07:00
James Betker
45ab444c04
Rework minicoder to always checkpoint
2022-03-01 14:09:18 -07:00
James Betker
db0c3340ac
Implement guidance-free diffusion in eval
...
And a few other fixes
2022-03-01 11:49:36 -07:00
James Betker
2134f06516
Implement conditioning-free diffusion at the eval level
2022-02-27 15:11:42 -07:00
James Betker
436fe24822
Add conditioning-free guidance
2022-02-27 15:00:06 -07:00
James Betker
ac920798bb
misc
2022-02-27 14:49:11 -07:00
James Betker
ba155e4e2f
script for uploading models to the HF hub
2022-02-27 14:48:38 -07:00
James Betker
dbc74e96b2
w2v_matcher
2022-02-27 14:48:23 -07:00
James Betker
42879d7296
w2v_wrapper ramping dropout mode
...
this is an experimental feature that needs some testing
2022-02-27 14:47:51 -07:00
James Betker
c375287db9
Re-instate autocasting
2022-02-25 11:06:18 -07:00
James Betker
34ee32a90e
get rid of autocasting in tts7
2022-02-24 21:53:51 -07:00
James Betker
f458f5d8f1
abort early if losses reach nan too much, and save the model
2022-02-24 20:55:30 -07:00
James Betker
18dc62453f
Don't step if NaN losses are encountered.
2022-02-24 17:45:08 -07:00
James Betker
ea500ad42a
Use clustered masking in udtts7
2022-02-24 07:57:26 -07:00
James Betker
7c17c8e674
gurgl
2022-02-23 21:28:24 -07:00
James Betker
e6824e398f
Load dvae to cpu
2022-02-23 21:21:45 -07:00
James Betker
81017d9696
put frechet_distance on cuda
2022-02-23 21:21:13 -07:00
James Betker
9a7bbf33df
f
2022-02-23 18:03:38 -07:00
James Betker
68726eac74
.
2022-02-23 17:58:07 -07:00
James Betker
b7319ab518
Support vocoder type diffusion in audio_diffusion_fid
2022-02-23 17:25:16 -07:00
James Betker
58f6c9805b
adf
2022-02-22 23:12:58 -07:00
James Betker
03752c1cd6
Report NaN
2022-02-22 23:09:37 -07:00
James Betker
7201b4500c
default text_to_sequence cleaners
2022-02-21 19:14:22 -07:00
James Betker
ba7f54c162
w2v: new inference function
2022-02-21 19:13:03 -07:00
James Betker
896ac029ae
allow continuation of samples encountered
2022-02-21 19:12:50 -07:00
James Betker
6313a94f96
eval: integrate a n-gram language model into decoding
2022-02-21 19:12:34 -07:00
James Betker
af50afe222
pairedvoice: error out if clip is too short
2022-02-21 19:11:10 -07:00
James Betker
38802a96c8
remove timesteps from cond calculation
2022-02-21 12:32:21 -07:00
James Betker
668876799d
unet_diffusion_tts7
2022-02-20 15:22:38 -07:00
James Betker
0872e17e60
unified_voice mods
2022-02-19 20:37:35 -07:00
James Betker
7b12799370
Reformat mel_text_clip for use in eval
2022-02-19 20:37:26 -07:00
James Betker
bcba65c539
DataParallel Fix
2022-02-19 20:36:35 -07:00
James Betker
34001ad765
et
2022-02-18 18:52:33 -07:00
James Betker
baf7b65566
Attempt to make w2v play with DDP AND checkpointing
2022-02-18 18:47:11 -07:00
James Betker
f3776f1992
reset ctc loss from "mean" to "sum"
2022-02-17 22:00:58 -07:00
James Betker
2b20da679c
make spec_augment a parameter
2022-02-17 20:22:05 -07:00
James Betker
a813fbed9c
Update to evaluator
2022-02-17 17:30:33 -07:00
James Betker
e1d71e1bd5
w2v_wrapper: get rid of ctc attention mask
2022-02-15 20:54:40 -07:00
James Betker
79e8f36d30
Convert CLIP models into new folder
2022-02-15 20:53:07 -07:00
James Betker
8f767b8b4f
...
2022-02-15 07:08:17 -07:00
James Betker
29e07913a8
Fix
2022-02-15 06:58:11 -07:00
James Betker
dd585df772
LAMB optimizer
2022-02-15 06:48:13 -07:00
James Betker
2bdb515068
A few mods to make wav2vec2 trainable with DDP on DLAS
2022-02-15 06:28:54 -07:00
James Betker
52b61b9f77
Update scripts and attempt to figure out how UnifiedVoice could be used to produce CTC codes
2022-02-13 20:48:06 -07:00
James Betker
a4f1641eea
Add & refine WER evaluator for w2v
2022-02-13 20:47:29 -07:00
James Betker
e16af944c0
BSO fix
2022-02-12 20:01:04 -07:00
James Betker
29534180b2
w2v fine tuner
2022-02-12 20:00:59 -07:00
James Betker
0c3cc5ebad
use script updates to fix output size disparities
2022-02-12 20:00:46 -07:00
James Betker
15fd60aad3
Allow EMA training to be disabled
2022-02-12 20:00:23 -07:00
James Betker
3252972057
ctc_code_gen mods
2022-02-12 19:59:54 -07:00
James Betker
35170c77b3
fix sweep
2022-02-11 11:43:11 -07:00
James Betker
c6b6d120fe
fix ranking
2022-02-11 11:34:57 -07:00
James Betker
095944569c
deep_update dicts
2022-02-11 11:32:25 -07:00
James Betker
ab1f6e8ac6
deepcopy map
2022-02-11 11:29:32 -07:00
James Betker
496fb81997
use fork instead
2022-02-11 11:22:25 -07:00
James Betker
4abc094b47
fix train bug
2022-02-11 11:18:15 -07:00
James Betker
006add64c5
sweep fix
2022-02-11 11:17:08 -07:00
James Betker
102142d1eb
f
2022-02-11 11:05:13 -07:00
James Betker
40b08a52d0
dafuk
2022-02-11 11:01:31 -07:00
James Betker
f6a7f12cad
Remove broken evaluator
2022-02-11 11:00:29 -07:00
James Betker
46b97049dc
Fix eval
2022-02-11 10:59:32 -07:00
James Betker
5175b7d91a
training sweeper checkin
2022-02-11 10:46:37 -07:00
James Betker
302ac8652d
Undo mask during training
2022-02-11 09:35:12 -07:00
James Betker
618a20412a
new rev of ctc_code_gen with surrogate LM loss
2022-02-10 23:09:57 -07:00
James Betker
d1d1ae32a1
audio diffusion frechet distance measurement!
2022-02-10 22:55:46 -07:00
James Betker
23a310b488
Fix BSO
2022-02-10 20:54:51 -07:00
James Betker
1e28e02f98
BSO improvement to make it work with distributed optimizers
2022-02-10 09:53:13 -07:00
James Betker
836eb08afb
Update BSO to use the proper step size
2022-02-10 09:44:15 -07:00
James Betker
820a29f81e
ctc code gen mods
2022-02-10 09:44:01 -07:00
James Betker
ac9417b956
ctc_code_gen: mask out all padding tokens
2022-02-09 17:26:30 -07:00
James Betker
a930f2576e
Begin a migration to specifying training rate on megasamples instead of arbitrary "steps"
...
This should help me greatly in tuning models. It's also necessary now that batch size isn't really
respected; we simply step once the gradient direction becomes unstable.
2022-02-09 17:25:05 -07:00
James Betker
93ca619267
script updates
2022-02-09 14:26:52 -07:00
James Betker
ddb77ef502
ctc_code_gen: use a mean() on the ConditioningEncoder
2022-02-09 14:26:44 -07:00
James Betker
3d946356f8
batch_size_optimizer works. sweet! no more tuning batch sizes.
2022-02-09 14:26:23 -07:00
James Betker
18938248e4
Add batch_size_optimizer support
2022-02-08 23:51:31 -07:00
James Betker
9e9ae328f2
mild updates
2022-02-08 23:51:17 -07:00
James Betker
ff35d13b99
Use non-uniform noise in diffusion_tts6
2022-02-08 07:27:41 -07:00
James Betker
f44b064c5e
Update scripts
2022-02-07 19:43:18 -07:00
James Betker
34fbb78671
Straight CtcCodeGenerator as an encoder
2022-02-07 15:46:46 -07:00
James Betker
c24682c668
Record load times in fast_paired_dataset
2022-02-07 15:45:38 -07:00
James Betker
65a546c4d7
Fix for tts6
2022-02-05 16:00:14 -07:00
James Betker
5ae816bead
ctc gen checkin
2022-02-05 15:59:53 -07:00
James Betker
bb3d1ab03d
More cleanup
2022-02-04 11:06:17 -07:00
James Betker
5cc342de66
Clean up
2022-02-04 11:00:42 -07:00
James Betker
8fb147e8ab
add an autoregressive ctc code generator
2022-02-04 11:00:15 -07:00
James Betker
7f4fc55344
Update SR model
2022-02-03 21:42:53 -07:00
James Betker
de1a1d501a
Move audio injectors into their own file
2022-02-03 21:42:37 -07:00
James Betker
687393de59
Add a better split_on_silence (processing_pipeline)
...
Going to extend this a bit more going forwards to support the entire pipeline.
2022-02-03 20:00:26 -07:00
James Betker
1d29999648
Uupdates to the TTS production scripts
2022-02-03 20:00:01 -07:00
James Betker
bc506d4bcd
Mods to unet_diffusion_tts6 to support super resolution mode
2022-02-03 19:59:39 -07:00
James Betker
4249681c4b
Mods to support a autoregressive CTC code generator
2022-02-03 19:58:54 -07:00
James Betker
8132766d38
tts6
2022-01-31 20:15:06 -07:00
James Betker
fbea6e8eac
Adjustments to diffusion networks
2022-01-30 16:14:06 -07:00
James Betker
e58dab14c3
new diffusion updates from testing
2022-01-29 11:01:01 -07:00
James Betker
935a4e853e
get rid of nil tokens in <2>
2022-01-27 22:45:57 -07:00
James Betker
0152174c0e
Add wandb_step_factor argument
2022-01-27 19:58:58 -07:00
James Betker
e0e36ed98c
Update use_diffuse_tts
2022-01-27 19:57:28 -07:00
James Betker
a77d376ad2
rename unet diffusion tts and add 3
2022-01-27 19:56:24 -07:00
James Betker
7badbf1b4d
update usage scripts
2022-01-25 17:57:26 -07:00
James Betker
8c255811ad
more fixes
2022-01-25 17:57:16 -07:00
James Betker
0f3ca28e39
Allow diffusion model to be trained with masking tokens
2022-01-25 14:26:21 -07:00
James Betker
798ed7730a
i like wasting time
2022-01-24 18:12:08 -07:00
James Betker
fc09cff4b3
angry
2022-01-24 18:09:29 -07:00
James Betker
cc0d9f7216
Fix
2022-01-24 18:05:45 -07:00
James Betker
3a9e3a9db3
consolidate state
2022-01-24 17:59:31 -07:00
James Betker
dfef34ba39
Load ema to cpu memory if specified
2022-01-24 15:08:29 -07:00
James Betker
49edffb6ad
Revise device mapping
2022-01-24 15:08:13 -07:00
James Betker
33511243d5
load model state dicts into the correct device
...
it's not clear to me that this will make a huge difference, but it's a good idea anyways
2022-01-24 14:40:09 -07:00
James Betker
3e16c509f6
Misc fixes
2022-01-24 14:31:43 -07:00
James Betker
e2ed0adbd8
use_diffuse_tts updates
2022-01-24 14:31:28 -07:00
James Betker
e420df479f
Allow steps to specify which state keys to carry forward (reducing memory utilization)
2022-01-24 11:01:27 -07:00
James Betker
62475005e4
Sort data items in descending order, which I suspect will improve performance because we will hit GC less
2022-01-23 19:05:32 -07:00
James Betker
d18aec793a
Revert "(re) attempt diffusion checkpointing logic"
...
This reverts commit b22eec8fe3
.
2022-01-22 09:14:50 -07:00
James Betker
b22eec8fe3
(re) attempt diffusion checkpointing logic
2022-01-22 08:34:40 -07:00
James Betker
8f48848f91
misc
2022-01-22 08:23:29 -07:00
James Betker
851070075a
text<->cond clip
...
I need that universal clip..
2022-01-22 08:23:14 -07:00
James Betker
8ada52ccdc
Update LR layers to checkpoint better
2022-01-22 08:22:57 -07:00
James Betker
ce929a6b3f
Allow grad scaler to be enabled even in fp32 mode
2022-01-21 23:13:24 -07:00
James Betker
91b4b240ac
dont pickle unique files
2022-01-21 00:02:06 -07:00
James Betker
7fef7fb9ff
Update fast_paired_dataset to report how many audio files it is actually using
2022-01-20 21:49:38 -07:00
James Betker
ed35cfe393
Update inference scripts
2022-01-20 11:28:50 -07:00
James Betker
20312211e0
Fix bug in code alignment
2022-01-20 11:28:12 -07:00
James Betker
8e2439f50d
Decrease resolution requirements to 2048
2022-01-20 11:27:49 -07:00
James Betker
4af8525dc3
Adjust diffusion vocoder to allow training individual levels
2022-01-19 13:37:59 -07:00
James Betker
ac13bfefe8
use_diffuse_tts
2022-01-19 00:35:24 -07:00
James Betker
bcd8cc51e1
Enable collated data for diffusion purposes
2022-01-19 00:35:08 -07:00
James Betker
dc9cd8c206
Update use_gpt_tts to be usable with unified_voice2
2022-01-18 21:14:17 -07:00
James Betker
7b4544b83a
Add an experimental unet_diffusion_tts to perform experiments on
2022-01-18 08:38:24 -07:00
James Betker
b6190e96b2
fast_paired
2022-01-17 15:46:02 -07:00
James Betker
1d30d79e34
De-specify fast-paired-dataset
2022-01-16 21:20:00 -07:00
James Betker
2b36ca5f8e
Revert paired back
2022-01-16 21:10:46 -07:00
James Betker
ad3e7df086
Split the fast random into its own new dataset
2022-01-16 21:10:11 -07:00
James Betker
7331862755
Updated paired to randomly index data, offsetting memory costs and speeding up initialization
2022-01-16 21:09:22 -07:00
James Betker
37e4e737b5
a few fixes
2022-01-16 15:17:17 -07:00
James Betker
35db5ebf41
paired_voice_audio_dataset - aligned codes support
2022-01-15 17:38:26 -07:00
James Betker
3f177cd2b3
requirements
2022-01-15 17:28:59 -07:00
James Betker
b398ecca01
wer fix
2022-01-15 17:28:17 -07:00
James Betker
9100e7fa9b
Add a diffusion network that takes aligned text instead of MELs
2022-01-15 17:28:02 -07:00
James Betker
87c83e4957
update wer script
2022-01-13 17:08:49 -07:00
James Betker
009a1e8404
Add a new diffusion_vocoder that should be trainable faster
...
This new one has a "cheating" top layer, that does not feed down into the unet encoder,
but does consume the outputs of the unet. This cheater only operates on half of the input,
while the rest of the unet operates on the full input. This limits the dimensionality of this last
layer, on the assumption that these last layers consume by far the most computation and memory,
but do not require the full input context.
Losses are only computed on half of the aggregate input.
2022-01-11 17:26:07 -07:00
James Betker
d4e27ccf62
misc updates
2022-01-11 16:25:40 -07:00
James Betker
91f28580e2
fix unified_voice
2022-01-10 16:17:31 -07:00
James Betker
136744dc1d
Fixes
2022-01-10 14:32:04 -07:00