ai-voice-cloning

Author	SHA1	Message	Date
mrq	d7e75a51cf	I forgot about the changelog and never kept up with it, so I'll just not use a changelog	2023-03-08 05:14:50 +00:00
mrq	ff07f707cb	disable validation if validation dataset not found, clamp validation batch size to validation dataset size instead of simply reusing batch size, switch to adamw_zero optimizier when training with multi-gpus (because the yaml comment said to and I think it might be why I'm absolutely having garbage luck training this japanese dataset)	2023-03-08 04:47:05 +00:00
mrq	f1788a5639	lazy wrap around the voicefixer block because sometimes it just an heros itself despite having a specific block to load it beforehand	2023-03-08 04:12:22 +00:00
mrq	83b5125854	fixed notebooks, provided paperspace notebook	2023-03-08 03:29:12 +00:00
mrq	b4098dca73	made validation working (will document later)	2023-03-08 02:58:00 +00:00
mrq	a7e0dc9127	oops	2023-03-08 00:51:51 +00:00
mrq	e862169e7f	set validation to save rate and validation file if exists (need to test later)	2023-03-07 20:38:31 +00:00
mrq	fe8bf7a9d1	added helper script to cull short enough lines from training set as a validation set (if it yields good results doing validation during training, i'll add it to the web ui)	2023-03-07 20:16:49 +00:00
mrq	7f89e8058a	fixed update checker for dlas+tortoise-tts	2023-03-07 19:33:56 +00:00
mrq	6d7e143f53	added override for large training plots	2023-03-07 19:29:09 +00:00
mrq	3718e9d0fb	set NaN alarm to show the iteration it happened it	2023-03-07 19:22:11 +00:00
mrq	c27ee3ce95	added update checking for dlas and tortoise-tts, caching voices (for a given model and voice name) so random latents will remain the same	2023-03-07 17:04:45 +00:00
mrq	166d491a98	fixes	2023-03-07 13:40:41 +00:00
mrq	df5ba634c0	brain dead	2023-03-07 05:43:26 +00:00
mrq	2726d98ee1	fried my brain trying to nail out bugs involving using solely ar model=auto	2023-03-07 05:35:21 +00:00
mrq	d7a5ad9fd9	cleaned up some model loading logic, added 'auto' mode for AR model (deduced by current voice)	2023-03-07 04:34:39 +00:00
mrq	3899f9b4e3	added (yet another) experimental voice latent calculation mode (when chunk size is 0 and theres a dataset generated, itll leverage it by padding to a common size then computing them, should help avoid splitting mid-phoneme)	2023-03-07 03:55:35 +00:00
mrq	5063728bb0	brain worms and headaches	2023-03-07 03:01:02 +00:00
mrq	0f31c34120	download dvae.pth for the people who managed to somehow put the web UI into a state where it never initializes TTS at all somehow	2023-03-07 02:47:10 +00:00
mrq	0f0b394445	moved (actually not working) setting to use BigVGAN to a dropdown to select between vocoders (for when slotting in future ones), and ability to load a new vocoder while TTS is loaded	2023-03-07 02:45:22 +00:00
mrq	e731b9ba84	reworked generating metadata to embed, should now store overrided settings	2023-03-06 23:07:16 +00:00
mrq	7798767fc6	added settings editing (will add a guide on what to do later, and an example)	2023-03-06 21:48:34 +00:00
mrq	119ac50c58	forgot to re-append the existing transcription when skipping existing (have to go back again and do the first 10% of my giant dataset	2023-03-06 16:50:55 +00:00
mrq	da0af4c498	one more	2023-03-06 16:47:34 +00:00
mrq	11a1f6a00e	forgot to reorder the dependency install because whisperx needs to be installed before DLAS	2023-03-06 16:43:17 +00:00
mrq	12c51b6057	Im not too sure if manually invoking gc actually closes all the open files from whisperx (or ROCm), but it seems to have gone away longside setting 'ulimit -Sn' to half the output of 'ulimit -Hn'	2023-03-06 16:39:37 +00:00
mrq	999878d9c6	and it turned out I wasn't even using the aligned segments, kmsing now that I have to redo my dataset again	2023-03-06 11:01:33 +00:00
mrq	14779a5020	Added option to skip transcribing if it exists in the output text file, because apparently whisperx will throw a "max files opened" error when using ROCm because it does not close some file descriptors if you're batch-transcribing or something, so poor little me, who's retranscribing his japanese dataset for the 305823042th time woke up to it partially done i am so mad I have to wait another few hours for it to continue when I was hoping to wake up to it done	2023-03-06 10:47:06 +00:00
mrq	0e3bbc55f8	added api_name for generation, added whisperx backend, relocated use whispercpp option to whisper backend list	2023-03-06 05:21:33 +00:00
mrq	788a957f79	stretch loss plot to target iteration just so its not so misleading with the scale	2023-03-06 00:44:29 +00:00
mrq	5be14abc21	UI cleanup, actually fix syncing the epoch counter (i hope), setting auto-suggest voice chunk size whatever to 0 will just split based on the average duration length, signal when a NaN info value is detected (there's some safeties in the training, but it will inevitably fuck the model)	2023-03-05 23:55:27 +00:00
mrq	287738a338	(should) fix reported epoch metric desyncing from defacto metric, fixed finding next milestone from wrong sign because of 2AM brain	2023-03-05 20:42:45 +00:00
mrq	206a14fdbe	brianworms	2023-03-05 20:30:27 +00:00
mrq	b82961ba8a	typo	2023-03-05 20:13:39 +00:00
mrq	b2e89d8da3	oops	2023-03-05 19:58:15 +00:00
mrq	8094401a6d	print in e-notation for LR	2023-03-05 19:48:24 +00:00
mrq	8b9c9e1bbf	remove redundant stats, add showing LR	2023-03-05 18:53:12 +00:00
mrq	0231550287	forgot to remove a debug print	2023-03-05 18:27:16 +00:00
mrq	d97639e138	whispercpp actually works now (language loading was weird, slicing needed to divide time by 100), transcribing audio checks for silence and discards them	2023-03-05 17:54:36 +00:00
mrq	b8a620e8d7	actually accumulate derivatives when estimating milestones and final loss by using half of the log	2023-03-05 14:39:24 +00:00
mrq	35225a35da	oops v2	2023-03-05 14:19:41 +00:00
mrq	b5e9899bbf	5 hour sleep brained	2023-03-05 13:37:05 +00:00
mrq	cd8702ab0d	oops	2023-03-05 13:24:07 +00:00
mrq	d312019d05	reordered things so it uses fresh data and not last-updated data	2023-03-05 07:37:27 +00:00
mrq	ce3866d0cd	added '''estimating''' iterations until milestones (lr=[1, 0.5, 0.1] and final lr, very, very inaccurate because it uses instantaneous delta lr, I'll need to do a riemann sum later	2023-03-05 06:45:07 +00:00
mrq	1316331be3	forgot to try and have it try and auto-detect for openai/whisper when no language is specified	2023-03-05 05:22:35 +00:00
mrq	3e220ed306	added option to set worker size in training config generator (because the default is overkill), for whisper transcriptions, load a specialized language model if it exists (for now, only english), output transcription to web UI when done transcribing	2023-03-05 05:17:19 +00:00
mrq	37cab14272	use torchrun instead for multigpu	2023-03-04 20:53:00 +00:00
mrq	5026d93ecd	sloppy fix to actually kill children when using multi-GPU distributed training, set GPU training count based on what CUDA exposes automatically so I don't have to keep setting it to 2	2023-03-04 20:42:54 +00:00
mrq	1a9d159b2a	forgot to add 'bs / gradient accum < 2 clamp validation logic	2023-03-04 17:37:08 +00:00

1 2 3 4 5 ...

268 Commits