should be the final things to scramble around for, DAC's 24KHz model is unusable for this, but both encodec's 24KHz and DAC's 44KHz work
This commit is contained in:
parent
2437a86efa
commit
230da8b559
|
@ -143,7 +143,7 @@ For audio backends:
|
|||
* [`vocos`](https://huggingface.co/charactr/vocos-encodec-24khz): a higher quality EnCodec decoder.
|
||||
- encoding audio will use the `encodec` backend automagically, as there's no EnCodec encoder under `vocos`
|
||||
* [`descript-audio-codec`](https://github.com/descriptinc/descript-audio-codec): boasts better compression and quality
|
||||
- **Note** models using `descript-audio-codec` at 24KHz + 6kbps will NOT converge. Unknown if 44KHz fares any better.
|
||||
- **Note** models using `descript-audio-codec` at 24KHz + 8kbps will NOT converge. Audio encoded through the 44KHz seems to work.
|
||||
|
||||
`llama`-based models also support different attention backends:
|
||||
* `math`: torch's SDPA's `math` implementation
|
||||
|
|
|
@ -276,7 +276,7 @@ def encode(wav: Tensor, sr: int = cfg.sample_rate, device="cuda", levels=cfg.mod
|
|||
if not isinstance(levels, int):
|
||||
levels = 8 if model.model_type == "24khz" else None
|
||||
|
||||
with torch.autocast("cuda", dtype=torch.bfloat16, enabled=False): # or True for about 2x speed, not enabling by default for systems that do not have bfloat16
|
||||
with torch.autocast("cuda", dtype=cfg.inference.dtype, enabled=cfg.inference.amp):
|
||||
artifact = model.compress(signal, win_duration=None, verbose=False, n_quantizers=levels)
|
||||
|
||||
# trim to 8 codebooks if 24Khz
|
||||
|
|
Loading…
Reference in New Issue
Block a user