Commit Graph

1517 Commits

Author SHA1 Message Date
James Betker
ee3dfac2ae unified_voice2: decouple positional embeddings and token embeddings from underlying gpt model 2022-01-10 08:14:41 -07:00
James Betker
f503d8d96b Partially implement performers in transformer_builders 2022-01-09 22:35:03 -07:00
James Betker
ec456b6733 Revert unified_voice back to beginning
I'll be doing my work within unified_voice2
2022-01-09 22:34:30 -07:00
James Betker
432073c5ca Make performer code functional 2022-01-09 22:32:50 -07:00
James Betker
f474a7ac65 unified_voice2 2022-01-09 22:32:34 -07:00
James Betker
c075fe72e2 import performer repo 2022-01-09 22:10:07 -07:00
James Betker
7de3874f15 Make dalle transformer checkpointable 2022-01-09 19:14:35 -07:00
James Betker
70b17da193 Alter unified_voice to use extensible transformer (still WIP) 2022-01-08 22:18:25 -07:00
James Betker
15d9517e26 Allow bi-directional clipping 2022-01-08 22:18:04 -07:00
James Betker
894d245062 More zero_grad fixes 2022-01-08 20:31:19 -07:00
James Betker
8bade38180 Add generic CLIP model based off of x_clip 2022-01-08 19:08:01 -07:00
James Betker
2a9a25e6e7 Fix likely defective nan grad recovery 2022-01-08 18:24:58 -07:00
James Betker
438dd9ed33 fix text-voice-clip bug 2022-01-08 08:55:00 -07:00
James Betker
34774f9948 unified_voice: begin decoupling from HF GPT
I'd like to try some different (newer) transformer variants. The way to get
there is softly decoupling the transformer portion of this architecture
from GPT. This actually should be fairly easy.
2022-01-07 22:51:24 -07:00
James Betker
1f6a5310b8 More fixes to use_gpt_tts 2022-01-07 22:30:55 -07:00
James Betker
68090ac3e9 Finish up the text->voice clip model 2022-01-07 22:28:45 -07:00
James Betker
65ffe38fce misc 2022-01-06 22:16:17 -07:00
James Betker
6706591d3d Fix dataset 2022-01-06 15:24:37 -07:00
James Betker
f4484fd155 Add "dataset_debugger" support
This allows the datasets themselves compile statistics and report them
via tensorboard and wandb.
2022-01-06 12:38:20 -07:00
James Betker
f3cab45658 Revise audio datasets to include interesting statistics in batch
Stats include:
- How many indices were skipped to retrieve a given index
- Whether or not a conditioning input was actually the file itself
2022-01-06 11:15:16 -07:00
James Betker
06c1093090 Remove collating from paired_voice_audio_dataset
This will now be done at the model level, which is more efficient
2022-01-06 10:29:39 -07:00
James Betker
e7a705fe6e Make gpt_asr_hf2 more efficient at inference 2022-01-06 10:27:10 -07:00
James Betker
5e1d1da2e9 Clean paired_voice 2022-01-06 10:26:53 -07:00
James Betker
525addffab Unified: automatically clip inputs according to specified max length to improve inference time 2022-01-06 10:13:45 -07:00
James Betker
61cd351b71 update unified 2022-01-06 09:48:11 -07:00
James Betker
10fd1110be Fix (?) use_gpt_tts for unified_voice 2022-01-05 20:09:31 -07:00
James Betker
3c4301f085 Remove dvae_arch_playground 2022-01-05 17:06:45 -07:00
James Betker
a63a17e48f Remove deepspeech models 2022-01-05 17:05:13 -07:00
James Betker
c584ba05ee unified_voice improvements
- Rename max_symbols_per_phrase to max_text_tokens
- Remove max_total_tokens (no longer necessary)
- Fix integration with MelEncoder
2022-01-05 17:03:53 -07:00
James Betker
50d267ab1a misc 2022-01-05 17:01:22 -07:00
James Betker
0fe34f57d1 Use torch resampler 2022-01-05 15:47:22 -07:00
James Betker
38aba6f88d Another dumdum fix 2022-01-04 15:18:25 -07:00
James Betker
963c6072bb Add mel_encoder and solo embeddings to unified_voice 2022-01-04 15:15:58 -07:00
James Betker
2165124f19 Add GPT documentation 2022-01-01 21:00:07 -07:00
James Betker
2635412291 doh 2022-01-01 14:29:59 -07:00
James Betker
d4a6298658 more debugging 2022-01-01 14:25:27 -07:00
James Betker
d8111e0477 misc 2022-01-01 14:05:33 -07:00
James Betker
dc535b5358 better bounds 2022-01-01 14:05:22 -07:00
James Betker
fe9ea4e01a auto-fix text_inputs too big 2022-01-01 13:25:47 -07:00
James Betker
35abefd038 More fix 2022-01-01 10:31:03 -07:00
James Betker
d5a5111890 Fix collating on by default on grand_conjoined 2022-01-01 10:30:15 -07:00
James Betker
4d9ba4a48a can i has fix now 2022-01-01 00:48:27 -07:00
James Betker
56752f1dbc Fix collator bug 2022-01-01 00:33:31 -07:00
James Betker
c28d8770c7 fix tensor lengths 2022-01-01 00:23:46 -07:00
James Betker
bbacffb790 dataset improvements and fix to unified_voice_Bilevel 2022-01-01 00:16:30 -07:00
James Betker
eda753e776 Allow conditioning shuffling to be disabled 2021-12-31 23:32:08 -07:00
James Betker
17fb934575 wer update 2021-12-31 16:21:39 -07:00
James Betker
f0c4cd6317 Taking another stab at a BPE tokenizer 2021-12-30 13:41:24 -07:00
James Betker
9aa06542cd Further reduce the complexity of the MEL encoder in GptAsrHf 2021-12-30 09:10:40 -07:00
James Betker
f2cd6a7f08 For loading conditional clips, default to falling back to loading the clip itself 2021-12-30 09:10:14 -07:00