Commit Graph

44 Commits

Author SHA1 Message Date
mrq
75b04686f8 added prom-less training / inferencing, some other things 2024-07-22 19:36:07 -05:00
mrq
491ae2a684 some insanity for sanity checks (some phonemes from phonemizing japanese are not in my tokenizer...) 2024-07-22 00:30:40 -05:00
mrq
ad024f400f actually pass language into dataset process script, fix coercing japanese into hiragana because espeak does not like kanji 2024-07-21 23:21:37 -05:00
mrq
28a674e0f1 fixes... 2024-07-18 23:25:32 -05:00
mrq
bccbb77a1a added option to either naively concat codes to concat audio waveforms (prior behavior) or to decode => concat => encode instead (although this only currently happens for prom sampling if an utternace is too small) 2024-07-18 16:48:41 -05:00
mrq
7b210d9738 sanity cleanup 2024-07-04 15:58:08 -05:00
mrq
1ecf2793f4 (commented-out) support for facebookresearch/AudioDec, but support really didn't wow me (so I commented it out until I figure out why my output audio is super crusty with AudioDec) 2024-07-04 15:40:51 -05:00
mrq
b21f74a5c5 added summing of external embeddings (at this point i dont think any amount of cope bandaids will get DAC to train nicely, I think the RVQ levels the NAR tends add too much noise if they're not accurate) 2024-06-29 23:42:30 -05:00
mrq
793ccb16fb ugh 2024-06-29 22:14:35 -05:00
mrq
2808f881c8 cleaned up subjugated audio embedding into a flag, flag can also have it include the original, underlying embedding as well (it seems to do better when set to inclusive) 2024-06-29 21:46:35 -05:00
mrq
ec5eaebcbc experimental method of using DACs quantizer ""embeddings"" to see if it helps with model quality 2024-06-29 19:46:11 -05:00
mrq
234f9efc6e ugh 2024-06-09 11:39:43 -05:00
mrq
ddbacde0d1 DAC just doesn't work well enough...... 2024-05-25 11:07:52 -05:00
mrq
74e531d391 ugh 2024-05-18 12:02:56 -05:00
mrq
5eb5db7f7f just don't use DAC 24Khz, it's bad 2024-05-12 13:41:17 -05:00
mrq
230da8b559 should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work 2024-05-12 13:22:08 -05:00
mrq
2437a86efa ugh 2024-05-12 13:02:15 -05:00
mrq
4f1593c8db a bunch of shit to salvage my old encodec-quantized audio because dac-encoded audio just does not want to converge 2024-05-12 10:17:29 -05:00
mrq
14709ac67f ughh 2024-05-12 07:30:59 -05:00
mrq
c4b696ebeb oops 2024-05-09 22:33:40 -05:00
mrq
0d5d545a40 crammed in DAdaptation (doesn't seem worth it) and ScheduleFree (forgot I wanted to weeks ago, seems promising), optimization wrapper cleanup, test trainer changes, etc. 2024-05-09 20:28:20 -05:00
mrq
c6e0f905b5 final tweaks (again) before training restarts 2024-05-08 02:11:38 -05:00
mrq
215800484d correcting my wrong of assuming I could just use raw 24Khz audio in the 44Khz DAC without too much of an issue (there are issues) 2024-05-04 23:49:15 -05:00
mrq
9f738fbd5b seems I actually don't need RVQ bins 9-32 with the 24Khz DAC model........ (time to requantize my audio...) 2024-05-04 23:09:18 -05:00
mrq
a8ffa88844 it slipped my mind that technically DAC can be used at any sample rate, since it models waveforms; make it a config YAML option to allow this behavior 2024-04-19 18:36:54 -05:00
mrq
8214aa23d7 converting over to a different intermediary dataset format 2024-04-18 21:24:06 -05:00
mrq
4f5c9e518a actually use the passed-through sample rate from encode for DAC because it does its own resampling I guess 2024-04-18 13:32:41 -05:00
mrq
2e9e6e68f7 Forgot I need to use the DAC's 44K model because 24K model has 32 codebooks instead of 9. 2024-04-17 20:59:25 -05:00
mrq
5ff2b4aab5 finally swallowing the Descript-Audio-Codec pill (I guess I'm going to have to regenerate my entire dataset) 2024-04-17 20:39:35 -05:00
mrq
545162195b deprecate sole AR/NAR model by only keeping the AR+NAR (the beauty of no one using this is that I can break compat as much as I want), add tone token for when I classify my dataset with tone/emotion in the future, some other things 2024-04-15 19:54:32 -05:00
mrq
09cda7d3f9 added sampling by speaker group name (might be better to de-emphasize the LibriVox/Audiobooks that are in large numbers, and emphasize the smaller pools), log cleanup 2023-10-16 19:30:38 -05:00
mrq
2bc2d08b09 (need to verify) added modifying model size and config bool to align with VALL-E continuous' methodology 2023-09-01 17:19:34 -05:00
mrq
78378ed1ce overhauled dataloading code to be marginally faster, mostly cleaned up, and can leverage a metadata json to help things out 2023-08-26 19:53:23 -05:00
mrq
22904a8639 more oversights fixed because I've been using a cached dataloader forever now and didn't catch these problems 2023-08-24 10:25:33 -05:00
mrq
4585824cd3 tweaks, including exporting on save/quit 2023-08-23 16:43:03 -05:00
mrq
7b1b82e0e5 inferencing cleanup 2023-08-20 21:36:02 -05:00
mrq
2d1a9f10c0 nightmare of spaghetti that might break compat; mechanism to increase RVQ bins of an existing model without retraining, keeps sampled proms/resps at max RVQ level and trim off excess levels according to what model receives them, some other things I already forgot (I really hope no one else has weights being baked right now) 2023-08-19 15:06:33 -05:00
mrq
77292c42f9 tested the training preparation for tasks ns, sr, and tse (I don't expect it to go well with only 2 RVQ bins) 2023-08-18 23:55:40 -05:00
mrq
bbb0563b3d pseudocode polyfill stub some other flavor of working on adding the tasks 2023-08-18 22:22:13 -05:00
mrq
fb4e816823 oops 2023-08-18 21:11:19 -05:00
mrq
d7deaf6def distributed training works now (hopefully) 2023-08-13 22:07:45 -05:00
mrq
608c1970eb ops 2023-08-03 20:36:19 -05:00
mrq
f6597e2dfe adjustments 2023-08-02 18:36:26 -05:00
mrq
bf8cedc9dd Rewrite init 2023-08-02 21:53:35 +00:00