James Betker
438dd9ed33
fix text-voice-clip bug
2022-01-08 08:55:00 -07:00
James Betker
34774f9948
unified_voice: begin decoupling from HF GPT
...
I'd like to try some different (newer) transformer variants. The way to get
there is softly decoupling the transformer portion of this architecture
from GPT. This actually should be fairly easy.
2022-01-07 22:51:24 -07:00
James Betker
1f6a5310b8
More fixes to use_gpt_tts
2022-01-07 22:30:55 -07:00
James Betker
68090ac3e9
Finish up the text->voice clip model
2022-01-07 22:28:45 -07:00
James Betker
65ffe38fce
misc
2022-01-06 22:16:17 -07:00
James Betker
6706591d3d
Fix dataset
2022-01-06 15:24:37 -07:00
James Betker
f4484fd155
Add "dataset_debugger" support
...
This allows the datasets themselves compile statistics and report them
via tensorboard and wandb.
2022-01-06 12:38:20 -07:00
James Betker
f3cab45658
Revise audio datasets to include interesting statistics in batch
...
Stats include:
- How many indices were skipped to retrieve a given index
- Whether or not a conditioning input was actually the file itself
2022-01-06 11:15:16 -07:00
James Betker
06c1093090
Remove collating from paired_voice_audio_dataset
...
This will now be done at the model level, which is more efficient
2022-01-06 10:29:39 -07:00
James Betker
e7a705fe6e
Make gpt_asr_hf2 more efficient at inference
2022-01-06 10:27:10 -07:00
James Betker
5e1d1da2e9
Clean paired_voice
2022-01-06 10:26:53 -07:00
James Betker
525addffab
Unified: automatically clip inputs according to specified max length to improve inference time
2022-01-06 10:13:45 -07:00
James Betker
61cd351b71
update unified
2022-01-06 09:48:11 -07:00
James Betker
10fd1110be
Fix (?) use_gpt_tts for unified_voice
2022-01-05 20:09:31 -07:00
James Betker
3c4301f085
Remove dvae_arch_playground
2022-01-05 17:06:45 -07:00
James Betker
a63a17e48f
Remove deepspeech models
2022-01-05 17:05:13 -07:00
James Betker
c584ba05ee
unified_voice improvements
...
- Rename max_symbols_per_phrase to max_text_tokens
- Remove max_total_tokens (no longer necessary)
- Fix integration with MelEncoder
2022-01-05 17:03:53 -07:00
James Betker
50d267ab1a
misc
2022-01-05 17:01:22 -07:00
James Betker
0fe34f57d1
Use torch resampler
2022-01-05 15:47:22 -07:00
James Betker
38aba6f88d
Another dumdum fix
2022-01-04 15:18:25 -07:00
James Betker
963c6072bb
Add mel_encoder and solo embeddings to unified_voice
2022-01-04 15:15:58 -07:00
James Betker
2165124f19
Add GPT documentation
2022-01-01 21:00:07 -07:00
James Betker
2635412291
doh
2022-01-01 14:29:59 -07:00
James Betker
d4a6298658
more debugging
2022-01-01 14:25:27 -07:00
James Betker
d8111e0477
misc
2022-01-01 14:05:33 -07:00
James Betker
dc535b5358
better bounds
2022-01-01 14:05:22 -07:00
James Betker
fe9ea4e01a
auto-fix text_inputs too big
2022-01-01 13:25:47 -07:00
James Betker
35abefd038
More fix
2022-01-01 10:31:03 -07:00
James Betker
d5a5111890
Fix collating on by default on grand_conjoined
2022-01-01 10:30:15 -07:00
James Betker
4d9ba4a48a
can i has fix now
2022-01-01 00:48:27 -07:00
James Betker
56752f1dbc
Fix collator bug
2022-01-01 00:33:31 -07:00
James Betker
c28d8770c7
fix tensor lengths
2022-01-01 00:23:46 -07:00
James Betker
bbacffb790
dataset improvements and fix to unified_voice_Bilevel
2022-01-01 00:16:30 -07:00
James Betker
eda753e776
Allow conditioning shuffling to be disabled
2021-12-31 23:32:08 -07:00
James Betker
17fb934575
wer update
2021-12-31 16:21:39 -07:00
James Betker
f0c4cd6317
Taking another stab at a BPE tokenizer
2021-12-30 13:41:24 -07:00
James Betker
9aa06542cd
Further reduce the complexity of the MEL encoder in GptAsrHf
2021-12-30 09:10:40 -07:00
James Betker
f2cd6a7f08
For loading conditional clips, default to falling back to loading the clip itself
2021-12-30 09:10:14 -07:00
James Betker
5ae7e0d9b0
Fix gapping bug in voice2voice clip
2021-12-29 14:44:46 -07:00
James Betker
51ce1b5007
Add conditioning clips features to grand_conjoined
2021-12-29 14:44:32 -07:00
James Betker
b12f47b36d
Add some noise to voice_voice_clip
2021-12-29 13:56:30 -07:00
James Betker
c6ef0eef0b
asdf
2021-12-29 10:07:39 -07:00
James Betker
53784ec806
grand conjoined dataset: support collating
2021-12-29 09:44:37 -07:00
James Betker
8a02ba5935
Transit s2s clips back to CPU memory after processing
2021-12-29 08:54:07 -07:00
James Betker
af6d5cd526
Add resume into speech-speech
2021-12-29 08:50:49 -07:00
James Betker
0e4bcc33ab
Additional debugging
2021-12-29 00:23:27 -07:00
James Betker
b24a51f0aa
Check in speech2speech CLIP inference tool
2021-12-29 00:19:44 -07:00
James Betker
c1bef01dfa
GptAsrHf2 checkin
2021-12-28 20:48:38 -07:00
James Betker
07c2b9907c
Add voice2voice clip model
2021-12-28 16:18:12 -07:00
James Betker
a9ee5b624f
Simplify and conform gpt_asr_hf2
2021-12-28 11:54:33 -07:00