DL-Art-School

Author	SHA1	Message	Date
James Betker	8e26400ce2	Add inference for unified gpt	2021-12-24 13:27:06 -07:00
James Betker	ead2a74bf0	Add debug_failures flag	2021-12-23 16:12:16 -07:00
James Betker	9677f7084c	dataset mod	2021-12-23 15:21:30 -07:00
James Betker	8b19c37409	UnifiedGptVoice!	2021-12-23 15:20:26 -07:00
James Betker	5bc9772cb0	grand: support validation mode	2021-12-23 15:03:20 -07:00
James Betker	e55d949855	GrandConjoinedDataset	2021-12-23 14:32:33 -07:00
James Betker	b9de8a8eda	More fixes	2021-12-22 19:21:29 -07:00
James Betker	191e0130ee	Another fix	2021-12-22 18:30:50 -07:00
James Betker	6c6daa5795	Build a bigger, better tokenizer	2021-12-22 17:46:18 -07:00
James Betker	c737632eae	Train and use a bespoke tokenizer	2021-12-22 15:06:14 -07:00
James Betker	66bc60aeff	Re-add start_text_token	2021-12-22 14:10:35 -07:00
James Betker	a9629f7022	Try out using the GPT tokenizer rather than nv_tacotron This results in a significant compression of the text domain, I'm curious what the effect on speech quality will be.	2021-12-22 14:03:18 -07:00
James Betker	ced81a760b	restore nv_tacotron	2021-12-22 13:48:53 -07:00
James Betker	7bf4f9f580	duplicate nvtacotron	2021-12-22 13:48:30 -07:00
James Betker	7ae7d423af	VoiceCLIP model	2021-12-22 13:44:11 -07:00
James Betker	09f7f3e615	Remove obsolete lucidrains DALLE stuff, re-create it in a dedicated folder	2021-12-22 13:44:02 -07:00
James Betker	a42b94ab72	gpt_tts_hf inference fixes	2021-12-22 13:22:15 -07:00
James Betker	48e3ee9a5b	Shuffle conditioning inputs along the positional axis to reduce fitting on prosody and other positional information The mels should still retain some short-range positional information the model can use for tone and frequencies, for example.	2021-12-20 19:05:56 -07:00
James Betker	53858b2055	Fix gpt_tts_hf inference	2021-12-20 17:45:26 -07:00
James Betker	712d746e9b	gpt_tts: format conditioning inputs more for contextual voice clues and less for prosidy also support single conditional inputs	2021-12-19 17:42:29 -07:00
James Betker	c813befd53	Remove dedicated positioning embeddings	2021-12-19 09:01:31 -07:00
James Betker	b4ddcd7111	More inference improvements	2021-12-19 09:01:19 -07:00
James Betker	f9c45d70f0	Fix mel terminator	2021-12-18 17:18:06 -07:00
James Betker	937045cb63	Fixes	2021-12-18 16:45:38 -07:00
James Betker	9b9f7ea61b	GptTtsHf: Make the input/target placement easier to reason about	2021-12-17 10:24:14 -07:00
James Betker	2fb4213a3e	More lossy fixes	2021-12-17 10:01:42 -07:00
James Betker	dee34f096c	Add use_gpt_tts script	2021-12-16 23:28:54 -07:00
James Betker	9e8a9bf6ca	Various fixes to gpt_tts_hf	2021-12-16 23:28:44 -07:00
James Betker	62c8ed9a29	move speech utils	2021-12-16 20:47:37 -07:00
James Betker	e7957e4897	Make loss accumulator for logs accumulate better	2021-12-12 22:23:17 -07:00
James Betker	4f8c4d130c	gpt_tts_hf: pad mel tokens with an <end_of_sequence> token.	2021-12-12 20:04:50 -07:00
James Betker	76f86c0e47	gaussian_diffusion: support fp16	2021-12-12 19:52:21 -07:00
James Betker	aa7cfd1edf	Add support for mel norms across the channel dim	2021-12-12 19:52:08 -07:00
James Betker	8917c02a4d	gpt_tts_hf inference first pass	2021-12-12 19:51:44 -07:00
James Betker	63bf135b93	Support norms	2021-12-11 08:30:49 -07:00
James Betker	959979086d	fix	2021-12-11 08:18:00 -07:00
James Betker	5a664aa56e	misc	2021-12-11 08:17:26 -07:00
James Betker	d610540ce5	mel norm computation script	2021-12-11 08:16:50 -07:00
James Betker	306274245b	Also do dynamic range compression across mel	2021-12-10 20:06:24 -07:00
James Betker	faf55684b8	Use slaney norm in the mel filterbank computation	2021-12-10 20:04:52 -07:00
James Betker	b2d8fbcfc0	build a better speech synthesis toolset	2021-12-09 22:59:56 -07:00
James Betker	32cfcf3684	Turn off optimization in find_faulty_files	2021-12-09 09:02:09 -07:00
James Betker	a66a2bf91b	Update find_faulty_files	2021-12-09 09:00:00 -07:00
James Betker	9191201f05	asd	2021-12-07 09:55:39 -07:00
James Betker	ef15a39841	fix gdi bug?	2021-12-07 09:53:48 -07:00
James Betker	6ccff3f49f	Record codes more often	2021-12-07 09:22:45 -07:00
James Betker	d0b2f931bf	Add feature to diffusion vocoder where the spectrogram conditioning layers can be re-trained apart from the rest of the model	2021-12-07 09:22:30 -07:00
James Betker	662920bde3	Log codes when simply fetching codebook_indices	2021-12-06 09:21:43 -07:00
James Betker	380a5d5475	gdi..	2021-12-03 08:53:09 -07:00
James Betker	101a01f744	Fix dvae codes issue	2021-12-02 23:28:36 -07:00

... 6 7 8 9 10 ...

1685 Commits