1
1
forked from mrq/tortoise-tts

Commit Graph

  • 156bb5e7da Add files required for hifigan, including autoregressive.py modification main Jarod Mica 2023-11-26 21:51:28 -0800
  • 95f679f4ba possible fix for when candidates >= samples 1729687513034825389/tmp_refs/heads/main 1729687513034825389/main 1722913107047915787/tmp_refs/heads/main 1722913107047915787/main 1711408634705205808/tmp_refs/heads/main 1711408634705205808/main 1710056845093191611/tmp_refs/heads/main 1710056845093191611/main master mrq 2023-10-10 15:30:08 +0000
  • bf3b6c87aa added compat for coqui's XTTS mrq 2023-09-16 03:38:21 +0000
  • d7e6914fb8 Merge pull request 'main' (#47) from ken11o2/tortoise-tts:main into main mrq 2023-09-04 20:01:14 +0000
  • b7c7fd1c5f add arg use_deepspeed ken11o2 2023-09-04 19:14:53 +0000
  • 2478dc255e update TextToSpeech ken11o2 2023-09-04 19:13:45 +0000
  • 18adfaf785 add use_deepspeed to contructor and update method post_init_gpt2_config ken11o2 2023-09-04 19:12:13 +0000
  • ac97c17bf7 add use_deepspeed ken11o2 2023-09-04 19:10:27 +0000
  • b10c58436d pesky dot mrq 2023-08-20 22:41:55 -0500
  • cbd3c95c42 possible speedup with one simple trick (it worked for valle inferencing), also backported the voice list loading from aivc mrq 2023-08-20 22:32:01 -0500
  • 9afa71542b little sloppy hack to try and not load the same model when it was already loaded mrq 2023-08-11 04:02:36 +0000
  • e2cd07d560 Fix for redaction at end of text (#45) mrq 2023-06-10 21:16:21 +0000
  • 5ff00bf3bf added flags to rever to default method of latent generation (separately for the AR and Diffusion latents, as some voices don't play nicely with the chunk-for-all method) mrq 2023-05-21 01:46:55 +0000
  • c90ee7c529 removed kludgy wrappers for passing progress when I was a pythonlet and didn't know gradio can hook into tqdm outputs anyways mrq 2023-05-04 23:39:39 +0000
  • 086aad5b49 quick hotfix to remove offending codesmell (will actually clean it when I finish eating) mrq 2023-05-04 22:59:57 +0000
  • 04b7049811 freeze numpy to 1.23.5 because latest version will moan about deprecating complex mrq 2023-05-04 01:54:41 +0000
  • b6a213bbbd removed some CPU fallback wrappers because directml seems to work now without them mrq 2023-04-29 00:46:36 +0000
  • 2f7d9ab932 disable BNB for inferencing by default because I'm pretty sure it makes zero differences (can be force enabled with env vars if you'r erelying on this for some reason) mrq 2023-04-29 00:38:18 +0000
  • f025470d60 Merge pull request 'Update tortoise/utils/devices.py vram issue' (#44) from aJoe/tortoise-tts:main into main mrq 2023-04-12 19:58:02 +0000
  • eea4c68edc Update tortoise/utils/devices.py vram issue aJoe 2023-04-12 05:33:30 +0000
  • 815ae5d707 Merge pull request 'feat: support .flac voice files' (#43) from NtTestAlert/tortoise-tts:support_flac_voice into main mrq 2023-04-01 16:37:56 +0000
  • 2cd7b72688 feat: support .flac voice files NtTestAlert 2023-04-01 15:08:31 +0200
  • 0bcdf81d04 option to decouple sample batch size from CLVP candidate selection size (currently just unsqueezes the batches) mrq 2023-03-21 21:33:46 +0000
  • d1ad634ea9 added japanese preprocessor for tokenizer mrq 2023-03-17 20:03:02 +0000
  • af78e3978a deduce if preprocessing text by checking the JSON itself instead mrq 2023-03-16 14:41:04 +0000
  • e201746eeb added diffusion_model and tokenizer_json as arguments for settings editing mrq 2023-03-16 14:19:24 +0000
  • 1f674a468f added flag to disable preprocessing (because some IPAs will turn into ASCII, implicitly enable for using the specific ipa.json tokenizer vocab) mrq 2023-03-16 04:33:03 +0000
  • 42cb1f3674 added args for tokenizer and diffusion model (so I don't have to add it later) mrq 2023-03-15 00:30:28 +0000
  • 65a43deb9e why didn't I also have it use chunks for computing the AR conditional latents (instead of just the diffusion aspect) mrq 2023-03-14 01:13:49 +0000
  • 97cd58e7eb maybe solved that odd VRAM spike when doing the clvp pass mrq 2023-03-12 12:48:29 -0500
  • fec0685405 revert muh clean code mrq 2023-03-10 00:56:29 +0000
  • 0514f011ff how did I botch this, I don't think it affects anything since it never thrown an error mrq 2023-03-09 22:36:12 +0000
  • 00be48670b i am very smart mrq 2023-03-09 02:06:44 +0000
  • bbeee40ab3 forgot to convert to gigabytes mrq 2023-03-09 00:51:13 +0000
  • 6410df569b expose VRAM easily mrq 2023-03-09 00:38:31 +0000
  • 3dd5cad324 reverting additional auto-suggested batch sizes, per mrq/ai-voice-cloning#87 proving it in fact, is not a good idea mrq 2023-03-07 19:38:02 +0000
  • cc36c0997c didn't get a chance to commit this this morning mrq 2023-03-07 15:43:09 +0000
  • fffea7fc03 unmarried the config.json to the bigvgan by downloading the right one mrq 2023-03-07 13:37:45 +0000
  • 26133c2031 do not reload AR/vocoder if already loaded mrq 2023-03-07 04:33:49 +0000
  • e2db36af60 added loading vocoders on the fly mrq 2023-03-07 02:44:09 +0000
  • 7b2aa51abc oops mrq 2023-03-06 21:32:20 +0000
  • 7f98727ad5 added option to specify autoregressive model at tts generation time (for a spicy feature later) mrq 2023-03-06 20:31:19 +0000
  • 6fcd8c604f moved bigvgan model to a huggingspace repo mrq 2023-03-05 19:47:22 +0000
  • 0f3261e071 you should have migrated by now, if anything breaks it's on (You) mrq 2023-03-05 14:03:18 +0000
  • 06bdf72b89 load the model on CPU because torch doesn't like loading models directly to GPU (it just follows the default vocoder loading behavior) mrq 2023-03-03 13:53:21 +0000
  • 2ba0e056cd attribution mrq 2023-03-03 06:45:35 +0000
  • aca32a71f7 added BigVGAN in place of default vocoder (credit to https://github.com/deviandice/tortoise-tts-BigVGAN) mrq 2023-03-03 06:30:58 +0000
  • a9de016230 added storing the loaded model's hash to the TTS object instead of relying on jerryrig injecting it (although I still have to for the weirdos who refuse to update the right way), added a parameter when loading voices to load a latent tagged with a model's hash so latents are per-model now mrq 2023-03-02 00:44:42 +0000
  • 7b839a4263 applied the bitsandbytes wrapper to tortoise inference (not sure if it matters) mrq 2023-02-28 01:42:10 +0000
  • 7cc0250a1a added more kill checks, since it only actually did it for the first iteration of a loop mrq 2023-02-24 23:10:04 +0000
  • de46cf7831 adding magically deleted files back (might have a hunch on what happened) mrq 2023-02-24 19:30:04 +0000
  • 2c7c02eb5c moved the old readme back, to align with how DLAS is setup, sorta mrq 2023-02-19 17:37:36 +0000
  • 34b232927e Oops mrq 2023-02-19 01:54:21 +0000
  • d8c6739820 added constructor argument and function to load a user-specified autoregressive model mrq 2023-02-18 14:08:45 +0000
  • 00cb19b6cf arg to skip voice latents for grabbing voice lists (for preparing datasets) mrq 2023-02-17 04:50:02 +0000
  • b255a77a05 updated notebooks to use the new "main" setup mrq 2023-02-17 03:31:19 +0000
  • 150138860c oops mrq 2023-02-17 01:46:38 +0000
  • 6ad3477bfd one more update mrq 2023-02-16 23:18:02 +0000
  • 413703b572 fixed colab to use the new repo, reorder loading tortoise before the web UI for people who don't wait mrq 2023-02-16 22:12:13 +0000
  • 30298b9ca3 fixing brain worms mrq 2023-02-16 21:36:49 +0000
  • d53edf540e pip-ifying things mrq 2023-02-16 19:48:06 +0000
  • d159346572 oops mrq 2023-02-16 13:23:07 +0000
  • eca61af016 actually for real fixed incrementing filenames because i had a regex that actually only worked if candidates or lines>1, cuda now takes priority over dml if you're a nut with both of them installed because you can just specify an override anyways mrq 2023-02-16 01:06:32 +0000
  • ec80ca632b added setting "device-override", less naively decide the number to use for results, some other thing mrq 2023-02-15 21:51:22 +0000
  • dcc5c140e6 fixes mrq 2023-02-15 15:33:08 +0000
  • 729b292515 oops x2 mrq 2023-02-15 05:57:42 +0000
  • 5bf98de301 oops mrq 2023-02-15 05:55:01 +0000
  • 3e8365fdec voicefixed files do not overwrite, as my autism wants to hear the difference between them, incrementing file format fixed for real mrq 2023-02-15 05:49:28 +0000
  • ea1bc770aa added option: force cpu for conditioning latents, for when you want low chunk counts but your GPU keeps OOMing because fuck fragmentation mrq 2023-02-15 05:01:40 +0000
  • b721e395b5 modified conversion scripts to not give a shit about bitrate and formats since torchaudio.load handles all of that anyways, and it all gets resampled anyways mrq 2023-02-15 04:44:14 +0000
  • 2e777e8a67 done away with kludgy shit code, just have the user decide how many chunks to slice concat'd samples to (since it actually does improve vocie replicability) mrq 2023-02-15 04:39:31 +0000
  • 314feaeea1 added reset generation settings to default button, revamped utilities tab to double as plain jane voice importer (and runs through voicefixer despite it not really doing anything if your voice samples are already of decent quality anyways), ditched load_wav_to_torch or whatever it was called because it literally exists as torchaudio.load, sample voice is now a combined waveform of all your samples and will always return even if using a latents file mrq 2023-02-14 21:20:04 +0000
  • 0bc2c1f540 updates chunk size to the chunked tensor length, just in case mrq 2023-02-14 17:13:34 +0000
  • 48275899e8 added flag to enable/disable voicefixer using CUDA because I'll OOM on my 2060, changed from naively subdividing eavenly (2,4,8,16 pieces) to just incrementing by 1 (1,2,3,4) when trying to subdivide within constraints of the max chunk size for computing voice latents mrq 2023-02-14 16:47:34 +0000
  • b648186691 history tab doesn't naively reuse the voice dir instead for results, experimental "divide total sound size until it fits under requests max chunk size" doesn't have a +1 to mess things up (need to re-evaluate how I want to calculate sizes of bests fits eventually) mrq 2023-02-14 16:23:04 +0000
  • 47f4b5bf81 voicefixer uses CUDA if exposed mrq 2023-02-13 15:30:49 +0000
  • 8250a79b23 Implemented kv_cache "fix" (from 1f3c1b5f4a); guess I should find out why it's crashing DirectML backend mrq 2023-02-13 13:48:31 +0000
  • 80eeef01fb Merge pull request 'Download from Gradio' (#31) from Armored1065/tortoise-tts:main into main mrq 2023-02-13 13:30:09 +0000
  • 8c96aa02c5 Merge pull request 'Update 'README.md'' (#1) from armored1065-patch-1 into main Armored1065 2023-02-13 06:21:37 +0000
  • d458e932be Update 'README.md' Armored1065 2023-02-13 06:19:42 +0000
  • f92e432c8d added random voice option back because I forgot I accidentally removed it mrq 2023-02-13 04:57:06 +0000
  • a2bac3fb2c Fixed out of order settings causing other settings to flipflop mrq 2023-02-13 03:43:08 +0000
  • 5b5e32338c DirectML: fixed redaction/aligner by forcing it to stay on CPU mrq 2023-02-12 20:52:04 +0000
  • 824ad38cca fixed voicefixing not working as intended, load TTS before Gradio in the webui due to how long it takes to initialize tortoise (instead of just having a block to preload it) mrq 2023-02-12 20:05:59 +0000
  • 4d01bbd429 added button to recalculate voice latents, added experimental switch for computing voice latents mrq 2023-02-12 18:11:40 +0000
  • 88529fda43 fixed regression with computing conditional latencies outside of the CPU mrq 2023-02-12 17:44:39 +0000
  • 65f74692a0 fixed silently crashing from enabling kv_cache-ing if using the DirectML backend, throw an error when reading a generated audio file that does not have any embedded metadata in it, cleaned up the blocks of code that would DMA/transfer tensors/models between GPU and CPU mrq 2023-02-12 14:46:21 +0000
  • 94757f5b41 instll python3.9, wrapped try/catch when parsing args.listen in case you somehow manage to insert garbage into that field and fuck up your config, removed a very redudnant setup.py install call since that only is required if you're just going to install it for using outside of the tortoise-tts folder mrq 2023-02-12 04:35:21 +0000
  • ddd0c4ccf8 cleanup loop, save files while generating a batch in the event it crashes midway through mrq 2023-02-12 01:15:22 +0000
  • 1b55730e67 fixed regression where the auto_conds do not move to the GPU and causes a problem during CVVP compare pass mrq 2023-02-11 20:34:12 +0000
  • 3d69274a46 Merge pull request 'Only directories in the voice list' (#20) from lightmare/tortoise-tts:only_dirs_in_voice_list into main mrq 2023-02-11 20:14:36 +0000
  • 13b60db29c Only directories in the voice list lightmare 2023-02-11 18:26:51 +0000
  • 3a8ce5a110 Moved experimental settings to main tab, hidden under a check box mrq 2023-02-11 17:21:08 +0000
  • 126f1a0afe sloppily guarantee stop/reloading TTS actually works mrq 2023-02-11 17:01:40 +0000
  • 6d06bcce05 Added candidate selection for outputs, hide output elements (except for the main one) to only show one progress bar mrq 2023-02-11 16:34:47 +0000
  • a7330164ab Added integration for "voicefixer", fixed issue where candidates>1 and lines>1 only outputs the last combined candidate, numbered step for each generation in progress, output time per generation step mrq 2023-02-11 15:02:11 +0000
  • 841754602e store generation time per generation rather than per entire request mrq 2023-02-11 13:00:39 +0000
  • 44eba62dc8 fixed using old output dir because of my autism with prefixing everything with "./" broke it, fixed incrementing filenames mrq 2023-02-11 12:39:16 +0000
  • 58e2b22b0e History tab (3/10 it works) mrq 2023-02-11 01:45:25 +0000
  • c924ebd034 Numbering predicates on input_#.json files instead of "number of wavs" mrq 2023-02-10 22:51:56 +0000