Commit Graph

1407 Commits

Author SHA1 Message Date
James Betker
8bade38180 Add generic CLIP model based off of x_clip 2022-01-08 19:08:01 -07:00
James Betker
2a9a25e6e7 Fix likely defective nan grad recovery 2022-01-08 18:24:58 -07:00
James Betker
438dd9ed33 fix text-voice-clip bug 2022-01-08 08:55:00 -07:00
James Betker
34774f9948 unified_voice: begin decoupling from HF GPT
I'd like to try some different (newer) transformer variants. The way to get
there is softly decoupling the transformer portion of this architecture
from GPT. This actually should be fairly easy.
2022-01-07 22:51:24 -07:00
James Betker
1f6a5310b8 More fixes to use_gpt_tts 2022-01-07 22:30:55 -07:00
James Betker
68090ac3e9 Finish up the text->voice clip model 2022-01-07 22:28:45 -07:00
James Betker
65ffe38fce misc 2022-01-06 22:16:17 -07:00
James Betker
6706591d3d Fix dataset 2022-01-06 15:24:37 -07:00
James Betker
f4484fd155 Add "dataset_debugger" support
This allows the datasets themselves compile statistics and report them
via tensorboard and wandb.
2022-01-06 12:38:20 -07:00
James Betker
f3cab45658 Revise audio datasets to include interesting statistics in batch
Stats include:
- How many indices were skipped to retrieve a given index
- Whether or not a conditioning input was actually the file itself
2022-01-06 11:15:16 -07:00
James Betker
06c1093090 Remove collating from paired_voice_audio_dataset
This will now be done at the model level, which is more efficient
2022-01-06 10:29:39 -07:00
James Betker
e7a705fe6e Make gpt_asr_hf2 more efficient at inference 2022-01-06 10:27:10 -07:00
James Betker
5e1d1da2e9 Clean paired_voice 2022-01-06 10:26:53 -07:00
James Betker
525addffab Unified: automatically clip inputs according to specified max length to improve inference time 2022-01-06 10:13:45 -07:00
James Betker
61cd351b71 update unified 2022-01-06 09:48:11 -07:00
James Betker
10fd1110be Fix (?) use_gpt_tts for unified_voice 2022-01-05 20:09:31 -07:00
James Betker
3c4301f085 Remove dvae_arch_playground 2022-01-05 17:06:45 -07:00
James Betker
a63a17e48f Remove deepspeech models 2022-01-05 17:05:13 -07:00
James Betker
c584ba05ee unified_voice improvements
- Rename max_symbols_per_phrase to max_text_tokens
- Remove max_total_tokens (no longer necessary)
- Fix integration with MelEncoder
2022-01-05 17:03:53 -07:00
James Betker
50d267ab1a misc 2022-01-05 17:01:22 -07:00
James Betker
0fe34f57d1 Use torch resampler 2022-01-05 15:47:22 -07:00
James Betker
38aba6f88d Another dumdum fix 2022-01-04 15:18:25 -07:00
James Betker
963c6072bb Add mel_encoder and solo embeddings to unified_voice 2022-01-04 15:15:58 -07:00
James Betker
2165124f19 Add GPT documentation 2022-01-01 21:00:07 -07:00
James Betker
2635412291 doh 2022-01-01 14:29:59 -07:00
James Betker
d4a6298658 more debugging 2022-01-01 14:25:27 -07:00
James Betker
d8111e0477 misc 2022-01-01 14:05:33 -07:00
James Betker
dc535b5358 better bounds 2022-01-01 14:05:22 -07:00
James Betker
fe9ea4e01a auto-fix text_inputs too big 2022-01-01 13:25:47 -07:00
James Betker
35abefd038 More fix 2022-01-01 10:31:03 -07:00
James Betker
d5a5111890 Fix collating on by default on grand_conjoined 2022-01-01 10:30:15 -07:00
James Betker
4d9ba4a48a can i has fix now 2022-01-01 00:48:27 -07:00
James Betker
56752f1dbc Fix collator bug 2022-01-01 00:33:31 -07:00
James Betker
c28d8770c7 fix tensor lengths 2022-01-01 00:23:46 -07:00
James Betker
bbacffb790 dataset improvements and fix to unified_voice_Bilevel 2022-01-01 00:16:30 -07:00
James Betker
eda753e776 Allow conditioning shuffling to be disabled 2021-12-31 23:32:08 -07:00
James Betker
17fb934575 wer update 2021-12-31 16:21:39 -07:00
James Betker
f0c4cd6317 Taking another stab at a BPE tokenizer 2021-12-30 13:41:24 -07:00
James Betker
9aa06542cd Further reduce the complexity of the MEL encoder in GptAsrHf 2021-12-30 09:10:40 -07:00
James Betker
f2cd6a7f08 For loading conditional clips, default to falling back to loading the clip itself 2021-12-30 09:10:14 -07:00
James Betker
5ae7e0d9b0 Fix gapping bug in voice2voice clip 2021-12-29 14:44:46 -07:00
James Betker
51ce1b5007 Add conditioning clips features to grand_conjoined 2021-12-29 14:44:32 -07:00
James Betker
b12f47b36d Add some noise to voice_voice_clip 2021-12-29 13:56:30 -07:00
James Betker
c6ef0eef0b asdf 2021-12-29 10:07:39 -07:00
James Betker
53784ec806 grand conjoined dataset: support collating 2021-12-29 09:44:37 -07:00
James Betker
8a02ba5935 Transit s2s clips back to CPU memory after processing 2021-12-29 08:54:07 -07:00
James Betker
af6d5cd526 Add resume into speech-speech 2021-12-29 08:50:49 -07:00
James Betker
0e4bcc33ab Additional debugging 2021-12-29 00:23:27 -07:00
James Betker
b24a51f0aa Check in speech2speech CLIP inference tool 2021-12-29 00:19:44 -07:00
James Betker
c1bef01dfa GptAsrHf2 checkin 2021-12-28 20:48:38 -07:00