James Betker
9100e7fa9b
Add a diffusion network that takes aligned text instead of MELs
2022-01-15 17:28:02 -07:00
James Betker
87c83e4957
update wer script
2022-01-13 17:08:49 -07:00
James Betker
009a1e8404
Add a new diffusion_vocoder that should be trainable faster
...
This new one has a "cheating" top layer, that does not feed down into the unet encoder,
but does consume the outputs of the unet. This cheater only operates on half of the input,
while the rest of the unet operates on the full input. This limits the dimensionality of this last
layer, on the assumption that these last layers consume by far the most computation and memory,
but do not require the full input context.
Losses are only computed on half of the aggregate input.
2022-01-11 17:26:07 -07:00
James Betker
d4e27ccf62
misc updates
2022-01-11 16:25:40 -07:00
James Betker
91f28580e2
fix unified_voice
2022-01-10 16:17:31 -07:00
James Betker
136744dc1d
Fixes
2022-01-10 14:32:04 -07:00
James Betker
ee3dfac2ae
unified_voice2: decouple positional embeddings and token embeddings from underlying gpt model
2022-01-10 08:14:41 -07:00
James Betker
f503d8d96b
Partially implement performers in transformer_builders
2022-01-09 22:35:03 -07:00
James Betker
ec456b6733
Revert unified_voice back to beginning
...
I'll be doing my work within unified_voice2
2022-01-09 22:34:30 -07:00
James Betker
432073c5ca
Make performer code functional
2022-01-09 22:32:50 -07:00
James Betker
f474a7ac65
unified_voice2
2022-01-09 22:32:34 -07:00
James Betker
c075fe72e2
import performer repo
2022-01-09 22:10:07 -07:00
James Betker
7de3874f15
Make dalle transformer checkpointable
2022-01-09 19:14:35 -07:00
James Betker
70b17da193
Alter unified_voice to use extensible transformer (still WIP)
2022-01-08 22:18:25 -07:00
James Betker
15d9517e26
Allow bi-directional clipping
2022-01-08 22:18:04 -07:00
James Betker
894d245062
More zero_grad fixes
2022-01-08 20:31:19 -07:00
James Betker
8bade38180
Add generic CLIP model based off of x_clip
2022-01-08 19:08:01 -07:00
James Betker
2a9a25e6e7
Fix likely defective nan grad recovery
2022-01-08 18:24:58 -07:00
James Betker
438dd9ed33
fix text-voice-clip bug
2022-01-08 08:55:00 -07:00
James Betker
34774f9948
unified_voice: begin decoupling from HF GPT
...
I'd like to try some different (newer) transformer variants. The way to get
there is softly decoupling the transformer portion of this architecture
from GPT. This actually should be fairly easy.
2022-01-07 22:51:24 -07:00
James Betker
1f6a5310b8
More fixes to use_gpt_tts
2022-01-07 22:30:55 -07:00
James Betker
68090ac3e9
Finish up the text->voice clip model
2022-01-07 22:28:45 -07:00
James Betker
65ffe38fce
misc
2022-01-06 22:16:17 -07:00
James Betker
6706591d3d
Fix dataset
2022-01-06 15:24:37 -07:00
James Betker
f4484fd155
Add "dataset_debugger" support
...
This allows the datasets themselves compile statistics and report them
via tensorboard and wandb.
2022-01-06 12:38:20 -07:00
James Betker
f3cab45658
Revise audio datasets to include interesting statistics in batch
...
Stats include:
- How many indices were skipped to retrieve a given index
- Whether or not a conditioning input was actually the file itself
2022-01-06 11:15:16 -07:00
James Betker
06c1093090
Remove collating from paired_voice_audio_dataset
...
This will now be done at the model level, which is more efficient
2022-01-06 10:29:39 -07:00
James Betker
e7a705fe6e
Make gpt_asr_hf2 more efficient at inference
2022-01-06 10:27:10 -07:00
James Betker
5e1d1da2e9
Clean paired_voice
2022-01-06 10:26:53 -07:00
James Betker
525addffab
Unified: automatically clip inputs according to specified max length to improve inference time
2022-01-06 10:13:45 -07:00
James Betker
61cd351b71
update unified
2022-01-06 09:48:11 -07:00
James Betker
10fd1110be
Fix (?) use_gpt_tts for unified_voice
2022-01-05 20:09:31 -07:00
James Betker
3c4301f085
Remove dvae_arch_playground
2022-01-05 17:06:45 -07:00
James Betker
a63a17e48f
Remove deepspeech models
2022-01-05 17:05:13 -07:00
James Betker
c584ba05ee
unified_voice improvements
...
- Rename max_symbols_per_phrase to max_text_tokens
- Remove max_total_tokens (no longer necessary)
- Fix integration with MelEncoder
2022-01-05 17:03:53 -07:00
James Betker
50d267ab1a
misc
2022-01-05 17:01:22 -07:00
James Betker
0fe34f57d1
Use torch resampler
2022-01-05 15:47:22 -07:00
James Betker
38aba6f88d
Another dumdum fix
2022-01-04 15:18:25 -07:00
James Betker
963c6072bb
Add mel_encoder and solo embeddings to unified_voice
2022-01-04 15:15:58 -07:00
James Betker
2165124f19
Add GPT documentation
2022-01-01 21:00:07 -07:00
James Betker
2635412291
doh
2022-01-01 14:29:59 -07:00
James Betker
d4a6298658
more debugging
2022-01-01 14:25:27 -07:00
James Betker
d8111e0477
misc
2022-01-01 14:05:33 -07:00
James Betker
dc535b5358
better bounds
2022-01-01 14:05:22 -07:00
James Betker
fe9ea4e01a
auto-fix text_inputs too big
2022-01-01 13:25:47 -07:00
James Betker
35abefd038
More fix
2022-01-01 10:31:03 -07:00
James Betker
d5a5111890
Fix collating on by default on grand_conjoined
2022-01-01 10:30:15 -07:00
James Betker
4d9ba4a48a
can i has fix now
2022-01-01 00:48:27 -07:00
James Betker
56752f1dbc
Fix collator bug
2022-01-01 00:33:31 -07:00
James Betker
c28d8770c7
fix tensor lengths
2022-01-01 00:23:46 -07:00
James Betker
bbacffb790
dataset improvements and fix to unified_voice_Bilevel
2022-01-01 00:16:30 -07:00
James Betker
eda753e776
Allow conditioning shuffling to be disabled
2021-12-31 23:32:08 -07:00
James Betker
17fb934575
wer update
2021-12-31 16:21:39 -07:00
James Betker
f0c4cd6317
Taking another stab at a BPE tokenizer
2021-12-30 13:41:24 -07:00
James Betker
9aa06542cd
Further reduce the complexity of the MEL encoder in GptAsrHf
2021-12-30 09:10:40 -07:00
James Betker
f2cd6a7f08
For loading conditional clips, default to falling back to loading the clip itself
2021-12-30 09:10:14 -07:00
James Betker
5ae7e0d9b0
Fix gapping bug in voice2voice clip
2021-12-29 14:44:46 -07:00
James Betker
51ce1b5007
Add conditioning clips features to grand_conjoined
2021-12-29 14:44:32 -07:00
James Betker
b12f47b36d
Add some noise to voice_voice_clip
2021-12-29 13:56:30 -07:00
James Betker
c6ef0eef0b
asdf
2021-12-29 10:07:39 -07:00
James Betker
53784ec806
grand conjoined dataset: support collating
2021-12-29 09:44:37 -07:00
James Betker
8a02ba5935
Transit s2s clips back to CPU memory after processing
2021-12-29 08:54:07 -07:00
James Betker
af6d5cd526
Add resume into speech-speech
2021-12-29 08:50:49 -07:00
James Betker
0e4bcc33ab
Additional debugging
2021-12-29 00:23:27 -07:00
James Betker
b24a51f0aa
Check in speech2speech CLIP inference tool
2021-12-29 00:19:44 -07:00
James Betker
c1bef01dfa
GptAsrHf2 checkin
2021-12-28 20:48:38 -07:00
James Betker
07c2b9907c
Add voice2voice clip model
2021-12-28 16:18:12 -07:00
James Betker
a9ee5b624f
Simplify and conform gpt_asr_hf2
2021-12-28 11:54:33 -07:00
James Betker
a5b4bee719
Improve asr_eval
2021-12-28 11:45:15 -07:00
James Betker
312f631c5b
gpt_asr_hf2: remove dual positional embeddings
2021-12-28 10:57:45 -07:00
James Betker
93624fa4b2
Don't use tqdm in ranks!=0
2021-12-28 10:06:54 -07:00
James Betker
a12042ea99
Allow multi-embeddings to be disabled
2021-12-28 09:00:53 -07:00
James Betker
4a32949b0e
update inference mode for unified
2021-12-26 15:33:21 -07:00
James Betker
a698d3f525
unified_voice: introduce paired embeddings
2021-12-26 15:33:05 -07:00
James Betker
6996dfd9d5
asr_hf2: add independent position embedders
2021-12-26 15:17:24 -07:00
James Betker
5b5cbc057c
Work checkpoint for gpt asr hf2
2021-12-26 10:29:12 -07:00
James Betker
cd89e6b42e
Initialize our embeddings the same way GPT-2 initializes theirs.
2021-12-26 00:20:30 -07:00
James Betker
8d01f7685c
Get rid of absolute positional embeddings in unifiedvoice
2021-12-26 00:10:24 -07:00
James Betker
6700f8851d
moar verbosity
2021-12-25 23:23:21 -07:00
James Betker
8acf3b3097
Better dimensional asserting
2021-12-25 23:18:25 -07:00
James Betker
e959541494
Add position embeddings back into unified_voice
...
I think this may be the solution behind the days problems.
2021-12-25 23:10:56 -07:00
James Betker
64cb4a92db
Support adamw_zero
2021-12-25 21:32:01 -07:00
James Betker
776a7abfcc
Support torch DDP _set_static_graph
2021-12-25 21:20:06 -07:00
James Betker
746392f35c
Fix DS
2021-12-25 15:28:59 -07:00
James Betker
736c2626ee
build in character tokenizer
2021-12-25 15:21:01 -07:00
James Betker
b595c62893
One way decoder for decoding from mel codes
2021-12-25 12:18:00 -07:00
James Betker
ab9cafa572
Make tokenization configs more configurable
2021-12-25 12:17:50 -07:00
James Betker
52410fd9d9
256-bpe tokenizer
2021-12-25 08:52:08 -07:00
James Betker
8e26400ce2
Add inference for unified gpt
2021-12-24 13:27:06 -07:00
James Betker
ead2a74bf0
Add debug_failures flag
2021-12-23 16:12:16 -07:00
James Betker
9677f7084c
dataset mod
2021-12-23 15:21:30 -07:00
James Betker
8b19c37409
UnifiedGptVoice!
2021-12-23 15:20:26 -07:00
James Betker
5bc9772cb0
grand: support validation mode
2021-12-23 15:03:20 -07:00
James Betker
e55d949855
GrandConjoinedDataset
2021-12-23 14:32:33 -07:00
James Betker
b9de8a8eda
More fixes
2021-12-22 19:21:29 -07:00
James Betker
191e0130ee
Another fix
2021-12-22 18:30:50 -07:00
James Betker
6c6daa5795
Build a bigger, better tokenizer
2021-12-22 17:46:18 -07:00
James Betker
c737632eae
Train and use a bespoke tokenizer
2021-12-22 15:06:14 -07:00
James Betker
66bc60aeff
Re-add start_text_token
2021-12-22 14:10:35 -07:00
James Betker
a9629f7022
Try out using the GPT tokenizer rather than nv_tacotron
...
This results in a significant compression of the text domain, I'm curious what the
effect on speech quality will be.
2021-12-22 14:03:18 -07:00