Commit Graph

34 Commits

Author SHA1 Message Date
mrq
2266d34818 oops 2024-09-21 16:06:01 -05:00
mrq
d31f27119a regex replace out the (lang) markers in espeak, updated tokenizer vocab as lazily as possible to not have unk tokens 2024-09-21 12:29:28 -05:00
mrq
491ae2a684 some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...) 2024-07-22 00:30:40 -05:00
mrq
3e5ca3a201 more demo page tweaks 2024-07-21 19:31:13 -05:00
mrq
e19aa643a6 cleaned up demo page creation, added option to pass in RVQ level sampling distribution for training 2024-07-21 19:12:03 -05:00
mrq
d87b492295 added rudimentary demo page creator (currently just embeds base64 wavs into the page, need to test not doing that) 2024-07-19 20:49:40 -05:00
mrq
39f961abcd test trainer (vall_e.models.ar_nar) tests some SpeechX features 2024-07-18 18:46:45 -05:00
mrq
f770467eb3 stuff 2024-07-01 18:13:29 -05:00
mrq
396af541c5 ugh 2024-06-30 11:11:58 -05:00
mrq
dced595391 more cleanup 2024-06-30 11:00:12 -05:00
mrq
dd40463803 limit eval size because the training batch size seems to be used for the eval dataloader, somehow (bandaid) 2024-06-29 09:11:28 -05:00
mrq
934672252b feverish cleanup 2024-06-03 21:28:49 -05:00
mrq
7feeb944a0 probably insane with even entertaining going this route 2024-06-03 20:26:27 -05:00
mrq
458b95d196 added option to split between text loss and audio loss (to-do: document this better), because it may or may not be a problem with LLaMA-backed models because my loss hovers around 3.9 / 56% accuracy despite sounding decent at the moment 2024-05-19 11:23:56 -05:00
mrq
d9aabfa3ae final tweaks, hopefully, again 2024-05-15 23:04:19 -05:00
mrq
8d79f78e0a god I need to replace omegaconf 2024-05-12 14:01:52 -05:00
mrq
215800484d correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues) 2024-05-04 23:49:15 -05:00
mrq
783db3d2c5 forgot to commit the DAC test utterance 2024-05-04 09:46:51 -05:00
mrq
6a11bc9cb6 update tokenizer because, for some reason, it had the wrong order for the special tokens to where eos = unk 2024-04-29 09:09:26 -05:00
mrq
071fb97777 dataset preparation script updates, caved and am using HF tokenizer now 2024-04-21 14:49:18 -05:00
mrq
2e9e6e68f7 Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9. 2024-04-17 20:59:25 -05:00
mrq
2deb995cc9 updated setup script 2023-10-06 20:08:28 -05:00
mrq
4abd6564d1 fixed training stats not loading from exported weights, a bit of a readme cleanup, updated example training yaml 2023-09-23 19:59:00 -05:00
mrq
2d1a9f10c0 nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now) 2023-08-19 15:06:33 -05:00
mrq
f7f6d3bf6d validated that SpeechX tasks cse and nse works, added a method to test each task by invoking python3 -m vall_e.data --action=tasks --tasks='sr,se,cse,nse' 2023-08-19 09:50:07 -05:00
mrq
8f42c578c9 setting up for allowing training for a partial amount of the speechx tasks (do NOT try this at home yet without a proper model, as performance is predecated on having a solid base vall-e model for the tasks 2023-08-19 00:16:08 -05:00
mrq
508677fcd5 repaired auraloss loss calc during eval/val 2023-08-18 21:19:47 -05:00
mrq
fb4e816823 oops 2023-08-18 21:11:19 -05:00
mrq
5fa86182b5 oops 2023-08-14 10:50:40 -05:00
mrq
d7deaf6def distributed training works now (hopefully) 2023-08-13 22:07:45 -05:00
mrq
c85101403f big cleanup 2023-08-03 20:26:36 -05:00
mrq
7a06b27a9c Tweaks 2023-08-02 22:06:39 +00:00
mrq
d88e43800b adjustments 2023-08-02 22:01:49 +00:00
mrq
bf8cedc9dd Rewrite init 2023-08-02 21:53:35 +00:00