ai-voice-cloning

mrq/ai-voice-cloning

Fork 43

Commit Graph

Select branches

Hide Pull Requests

master

#112

#114

#117

#118

#122

#124

#191

#301

#328

#333

#334

#336

#341

#350

#369

#393

#448

#455

#474

#475

#5

#57

#65

#66

#67

#76

1ac278e885 Merge pull request 'keep_training' (#118) from zim33/ai-voice-cloning:keep_training into master mrq 2023-03-12 06:47:01 +0000
29b3d1ae1d Fixed Keep X Previous States tigi6346 2023-03-12 08:01:08 +0200
9e320a34c8 Fixed Keep X Previous States tigi6346 2023-03-12 08:00:03 +0200
8ed09f9b87 Merge pull request 'Catch OOM and run whisper on cpu automatically.' (#117) from zim33/ai-voice-cloning:vram into master mrq 2023-03-12 05:09:53 +0000
61500107ab Catch OOM and run whisper on cpu automatically. tigi6346 2023-03-12 06:48:28 +0200
ede9804b76 added option to trim silence using torchaudio's VAD mrq 2023-03-11 21:41:35 +0000
dea2fa9caf added fields to offset start/end slices to apply in bulk when slicing mrq 2023-03-11 21:34:29 +0000
89bb3d4419 rename transcribe button since it does more than transcribe mrq 2023-03-11 21:18:04 +0000
382a3e4104 rely on the whisper.json for handling a lot more things mrq 2023-03-11 21:17:11 +0000
9b376c381f brain worm mrq 2023-03-11 18:14:32 +0000
94551fb9ac split slicing dataset routine so it can be done after the fact mrq 2023-03-11 17:27:01 +0000
e3fdb79b49 rocm5.2 works for me desu so I bumped it back up mrq 2023-03-11 17:02:56 +0000
e680d84a13 removed the hotfix pip installs that whisperx requires now that whisperx is gone mrq 2023-03-11 16:55:19 +0000
cf41492f76 fall back to normal behavior if theres actually no audiofiles loaded from the dataset when using it for computing latents mrq 2023-03-11 16:46:03 +0000
b90c164778 Farewell, parasite mrq 2023-03-11 16:40:34 +0000
2424c455cb added option to not slice audio when transcribing, added option to prepare validation dataset on audio duration, added a warning if youre using whisperx and you're slicing audio mrq 2023-03-11 16:32:35 +0000
6ef5bae46a added cpu option for whisperx only. tigi6346 2023-03-11 08:23:35 +0200
dcdcf8516c master (#112) tigi6346 2023-03-11 03:28:04 +0000
88227992de Upload files to 'src' 1719239734232870530/tmp_refs/heads/master 1719239734232870530/master 1719117902270648089/tmp_refs/heads/master 1719117902270648089/master 1719083970738888094/tmp_refs/heads/master 1719083970738888094/master 1719053278273770532/tmp_refs/heads/master 1719053278273770532/master 1718649494044351066/tmp_refs/heads/master 1718649494044351066/master 1717532833522045279/tmp_refs/heads/master 1717532833522045279/master 1717530032600537081/tmp_refs/heads/master 1717530032600537081/master 1717113918345866604/tmp_refs/heads/master 1717113918345866604/master 1717113906349185913/tmp_refs/heads/master 1717113906349185913/master 1716256684521578672/tmp_refs/heads/master 1716256684521578672/master 1712651250662044647/tmp_refs/heads/master 1712651250662044647/master 1712616447072189661/tmp_refs/heads/master 1712616447072189661/master tigi6346 2023-03-11 01:37:30 +0000
1810136f4e Delete 'src/webui.py' tigi6346 2023-03-11 01:37:12 +0000
008a1f5f8f simplified spawning the training process by having it spawn the distributed training processes in the train.py script, so it should work on Windows too mrq 2023-03-11 01:37:00 +0000
2feb6da0c0 cleanups and fixes, fix DLAS throwing errors from '''too short of sound files''' by just culling them during transcription mrq 2023-03-11 01:19:49 +0000
7f2da0f5fb rewrote how AIVC gets training metrics (need to clean up later) mrq 2023-03-10 22:35:32 +0000
df0edacc60 fix the cleanup actually only doing 2 despite requesting more than 2, surprised no one has pointed it out mrq 2023-03-10 14:04:07 +0000
8e890d3023 forgot to fix reset settings to use the new arg-agnostic way mrq 2023-03-10 13:49:39 +0000
d250e0ec17 brain fried mrq 2023-03-10 04:27:34 +0000
0b364b590e maybe don't --force-reinstall to try and force downgrading, it just forces everything to uninstall then reinstall mrq 2023-03-10 04:22:47 +0000
c231d842aa make dependencies after the one in this repo force reinstall to downgrade, i hope, I hav eother things to do than validate this works mrq 2023-03-10 03:53:21 +0000
c92b006129 I really hate YAML mrq 2023-03-10 03:48:46 +0000
d3184004fd only God knows why the YAML spec lets you specify string values without quotes mrq 2023-03-10 01:58:30 +0000
eb1551ee92 what I thought was an override and not a ternary mrq 2023-03-09 23:04:02 +0000
c3b43d2429 today I learned adamw_zero actually negates ANY LR schemes mrq 2023-03-09 19:42:31 +0000
cb273b8428 cleanup mrq 2023-03-09 18:34:52 +0000
7c71f7239c expose options for CosineAnnealingLR_Restart (seems to be able to train very quickly due to the restarts mrq 2023-03-09 14:17:01 +0000
2f6dd9c076 some cleanup mrq 2023-03-09 06:20:05 +0000
5460e191b0 added loss graph, because I'm going to experiment with cosine annealing LR and I need to view my loss mrq 2023-03-09 05:54:08 +0000
a182df8f4e is mrq 2023-03-09 04:33:12 +0000
a01eb10960 (try to) unload voicefixer if it raises an error during loading voicefixer mrq 2023-03-09 04:28:14 +0000
dc1902b91c cleanup block that makes embedding latents for random/microphone happen, remove builtin voice options from voice list to avoid duplicates mrq 2023-03-09 04:23:36 +0000
797882336b maybe remedy an issue that crops up if you have a non-wav and non-json file in a results folder (assuming) mrq 2023-03-09 04:06:07 +0000
b64948d966 while I'm breaking things, migrating dependencies to modules folder for tidiness mrq 2023-03-09 04:03:57 +0000
b8867a5fb0 added the mysterious tortoise_compat flag mentioned in DLAS repo mrq 2023-03-09 03:41:40 +0000
3b4f4500d1 when you have three separate machines running and you test one one, but you accidentally revert changes because you then test on another mrq 2023-03-09 03:26:18 +0000
ef75dba995 I hate commas make tuples mrq 2023-03-09 02:43:05 +0000
f795dd5c20 you might be wondering why so many small commits instead of rolling the HEAD back one to just combine them, i don't want to force push and roll back the paperspace i'm testing in mrq 2023-03-09 02:31:32 +0000
51339671ec typo mrq 2023-03-09 02:29:08 +0000
1b18b3e335 forgot to save the simplified training input json first before touching any of the settings that dump to the yaml mrq 2023-03-09 02:27:20 +0000
221ac38b32 forgot to update to finetune subdir mrq 2023-03-09 02:25:32 +0000
0e80e311b0 added VRAM validation for a given batch:gradient accumulation size ratio (based emprically off of 6GiB, 16GiB, and 16x2GiB, would be nice to have more data on what's safe) mrq 2023-03-09 02:08:06 +0000
ef7b957fff oops mrq 2023-03-09 00:53:00 +0000
b0baa1909a forgot template mrq 2023-03-09 00:32:35 +0000
3f321fe664 big cleanup to make my life easier when i add more parameters mrq 2023-03-09 00:26:47 +0000
0ab091e7ff oops mrq 2023-03-08 16:09:29 +0000
40e8d0774e share if you mrq 2023-03-08 15:59:16 +0000
d58b67004a colab notebook uses venv and normal scripts to keep it on parity with a local install (and it literally just works stop creating issues for someething inconsistent with known solutions) mrq 2023-03-08 15:51:13 +0000
34dcb845b5 actually make using adamw_zero optimizer for multi-gpus work mrq 2023-03-08 15:31:33 +0000
8494628f3c normalize validation batch size because i oom'd without it getting scaled mrq 2023-03-08 05:27:20 +0000
d7e75a51cf I forgot about the changelog and never kept up with it, so I'll just not use a changelog mrq 2023-03-08 05:14:50 +0000
ff07f707cb disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset) mrq 2023-03-08 04:47:05 +0000
f1788a5639 lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehand mrq 2023-03-08 04:12:22 +0000
83b5125854 fixed notebooks, provided paperspace notebook mrq 2023-03-08 03:29:12 +0000
b4098dca73 made validation working (will document later) mrq 2023-03-08 02:58:00 +0000
a7e0dc9127 oops mrq 2023-03-08 00:51:51 +0000
e862169e7f set validation to save rate and validation file if exists (need to test later) mrq 2023-03-07 20:38:31 +0000
fe8bf7a9d1 added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui) mrq 2023-03-07 20:16:49 +0000
7f89e8058a fixed update checker for dlas+tortoise-tts mrq 2023-03-07 19:33:56 +0000
6d7e143f53 added override for large training plots mrq 2023-03-07 19:29:09 +0000
3718e9d0fb set NaN alarm to show the iteration it happened it mrq 2023-03-07 19:22:11 +0000
c27ee3ce95 added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the same mrq 2023-03-07 17:04:45 +0000
166d491a98 fixes mrq 2023-03-07 13:40:41 +0000
df5ba634c0 brain dead 1719368336534918541/tmp_refs/heads/master 1719368336534918541/master 1719280716920083229/tmp_refs/heads/master 1719280716920083229/master 1719276047628095174/tmp_refs/heads/master 1719276047628095174/master 1716607612726059684/tmp_refs/heads/master 1716607612726059684/master mrq 2023-03-07 05:43:26 +0000
2726d98ee1 fried my brain trying to nail out bugs involving using solely ar model=auto mrq 2023-03-07 05:35:21 +0000
d7a5ad9fd9 cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice) mrq 2023-03-07 04:34:39 +0000
3899f9b4e3 added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme) mrq 2023-03-07 03:55:35 +0000
5063728bb0 brain worms and headaches mrq 2023-03-07 03:01:02 +0000
7c9f55b1de Re-added missing joined variable 1719308371170049487/tmp_refs/heads/fix_utils 1719308371170049487/fix_utils 1719299044697362738/tmp_refs/heads/master 1719299044697362738/master 1719297879377993010/tmp_refs/heads/fix_utils 1719297879377993010/fix_utils 1719294281838976885/tmp_refs/heads/master 1719294281838976885/master 1719247110745284268/tmp_refs/heads/fix_utils 1719247110745284268/fix_utils 1719162871395844557/tmp_refs/heads/fix_utils 1719162871395844557/fix_utils 1718806980235577251/tmp_refs/heads/fix_utils 1718806980235577251/fix_utils 1716606800521822793/tmp_refs/heads/master 1716606800521822793/master 1715994207210588792/tmp_refs/heads/master 1715994207210588792/master 1715441933756555699/tmp_refs/heads/master 1715441933756555699/master 1714660107988479784/tmp_refs/heads/fix_utils 1714660107988479784/fix_utils 1713603198353459076/tmp_refs/heads/fix_utils 1713603198353459076/fix_utils 1712856122784874451/tmp_refs/heads/master 1712856122784874451/master 1710198124957310375/tmp_refs/heads/fix_utils 1710198124957310375/fix_utils apolygon 2023-03-06 18:48:39 -0800
0f31c34120 download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehow mrq 2023-03-07 02:47:10 +0000
0f0b394445 moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loaded mrq 2023-03-07 02:45:22 +0000
e731b9ba84 reworked generating metadata to embed, should now store overrided settings mrq 2023-03-06 23:07:16 +0000
7798767fc6 added settings editing (will add a guide on what to do later, and an example) mrq 2023-03-06 21:48:34 +0000
2c244c49ec Added experimental guided setup 1718809729038004232/tmp_refs/heads/guided-setup 1718809729038004232/guided-setup 1718781567108565313/tmp_refs/heads/guided-setup 1718781567108565313/guided-setup 1716606629146155517/tmp_refs/heads/guided-setup 1716606629146155517/guided-setup 1715187057407942784/tmp_refs/heads/guided-setup 1715187057407942784/guided-setup 1714430852820004540/tmp_refs/heads/guided-setup 1714430852820004540/guided-setup 1713349967058540394/tmp_refs/heads/guided-setup 1713349967058540394/guided-setup 1713349831503391314/tmp_refs/heads/guided-setup 1713349831503391314/guided-setup 1713349748257973512/tmp_refs/heads/guided-setup 1713349748257973512/guided-setup 1713349673887294936/tmp_refs/heads/guided-setup 1713349673887294936/guided-setup 1713349477642903740/tmp_refs/heads/guided-setup 1713349477642903740/guided-setup 1713349402857312468/tmp_refs/heads/guided-setup 1713349402857312468/guided-setup 1712973608570735852/tmp_refs/heads/guided-setup 1712973608570735852/guided-setup 1712895882270539360/tmp_refs/heads/guided-setup 1712895882270539360/guided-setup lightmare 2023-03-06 21:16:26 +0000
119ac50c58 forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant dataset mrq 2023-03-06 16:50:55 +0000
da0af4c498 one more mrq 2023-03-06 16:47:34 +0000
11a1f6a00e forgot to reorder the dependency install because whisperx needs to be installed before DLAS mrq 2023-03-06 16:43:17 +0000
12c51b6057 Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn' mrq 2023-03-06 16:39:37 +0000
999878d9c6 and it turned out I wasn't even using the aligned segments, kmsing now that I have to *redo* my dataset again mrq 2023-03-06 11:01:33 +0000
14779a5020 Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it done mrq 2023-03-06 10:47:06 +0000
0e3bbc55f8 added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend list mrq 2023-03-06 05:21:33 +0000
1e2436aac9 Update 'src/utils.py' 1719334166928034665/tmp_refs/heads/master 1719334166928034665/master 1719260950041989041/tmp_refs/heads/master 1719260950041989041/master 1719248301322868889/tmp_refs/heads/master 1719248301322868889/master 1719246887166609732/tmp_refs/heads/master 1719246887166609732/master 1719183646641714677/tmp_refs/heads/master 1719183646641714677/master 1719113356467142750/tmp_refs/heads/master 1719113356467142750/master 1719108626390395733/tmp_refs/heads/master 1719108626390395733/master 1719009932612071983/tmp_refs/heads/master 1719009932612071983/master 1717536764803649121/tmp_refs/heads/master 1717536764803649121/master 1717531137681158359/tmp_refs/heads/master 1717531137681158359/master 1717529216666677016/tmp_refs/heads/master 1717529216666677016/master 1717132130864114550/tmp_refs/heads/master 1717132130864114550/master 1715345972170092931/tmp_refs/heads/master 1715345972170092931/master 1714495941491635526/tmp_refs/heads/master 1714495941491635526/master 1713063111953728021/tmp_refs/heads/master 1713063111953728021/master yqxtqymn 2023-03-06 02:04:19 +0000
f657f30e2b Update 'src/utils.py' yqxtqymn 2023-03-06 01:59:58 +0000
4f123910fb Update 'src/webui.py' yqxtqymn 2023-03-06 01:59:42 +0000
9ca5192309 Update 'src/utils.py' yqxtqymn 2023-03-06 00:47:56 +0000
079cd32074 Update 'requirements.txt' yqxtqymn 2023-03-06 00:47:03 +0000
788a957f79 stretch loss plot to target iteration just so its not so misleading with the scale mrq 2023-03-06 00:44:29 +0000
e45ea6b26a Update 'src/utils.py' yqxtqymn 2023-03-06 00:28:34 +0000
2101131cfb Update 'requirements.txt' yqxtqymn 2023-03-06 00:14:10 +0000
be304928e5 Update 'src/utils.py' yqxtqymn 2023-03-06 00:13:45 +0000
5be14abc21 UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model) mrq 2023-03-05 23:55:27 +0000
287738a338 (should) fix reported epoch metric desyncing from defacto metric, fixed finding next milestone from wrong sign because of 2AM brain mrq 2023-03-05 20:42:45 +0000
206a14fdbe brianworms mrq 2023-03-05 20:30:27 +0000

Commit Graph Select branches Hide Pull Requests master #112 #114 #117 #118 #122 #124 #191 #301 #328 #333 #334 #336 #341 #350 #369 #393 #448 #455 #474 #475 #5 #57 #65 #66 #67 #76 Mono Color

Commit Graph

Select branches

Hide Pull Requests

master

#112

#114

#117

#118

#122

#124

#191

#301

#328

#333

#334

#336

#341

#350

#369

#393

#448

#455

#474

#475

#5

#57

#65

#66

#67

#76