Bitsandbytes Support #1

Closed
opened 2023-02-22 23:23:39 +00:00 by mrq · 1 comment
Owner

Preamble

As a way to greatly reduce the required VRAM for training, I'm in the process of (lazily) implementing bitsandbytes as a branch: https://git.ecker.tech/mrq/DL-Art-School/src/branch/bitsandbytes.

The conversion is simple:

  • import bitsandbytes as bnb
  • replace nn.Embedding to bnb.nn.StableEmbedding (or bnb.nn.Embedding)
  • replace torch.optim.Adam to bnb.optim.Adam8bit (and other Adam*s)
  • replace nn.Linear to bnb.nn.Linear8bitLt

And through the magic of quanitazation, VRAM should be reduced. I don't care about the performance uplifts, if there are any, I just want users to be able to train on a 3080.

Usage

To install bitsandbytes:

python -m pip install bitsandbytes==0.35.0

Additionally, on Windows, copy the files under ./bitsandbytes_windows/ into ai-voice-cloning/venv/Lib/site-packages/bitsandbytes/. I am not responsible for anything from those DLLs, as I've sourced them from here.

Errors

Using bnb.nn.Embedding, the following error occurs:

X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\autograd\_functions.py:231: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
  0%|                                                                                            | 0/6 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "X:\programs\ai-voice-cloning\src\train.py", line 62, in <module>
    train(args.opt, args.launcher)
  File "X:\programs\ai-voice-cloning\src\train.py", line 53, in train
    trainer.do_training()
  File "X:\programs\ai-voice-cloning\./dlas\codes\train.py", line 330, in do_training
    self.do_step(train_data)
  File "X:\programs\ai-voice-cloning\./dlas\codes\train.py", line 211, in do_step
    gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log)
  File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 303, in optimize_parameters
    ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor))
  File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\steps.py", line 252, in do_forward_backward
    injected = inj(local_state)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\injectors\base_injectors.py", line 93, in forward
    results = method(*params, **self.args)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "X:\programs\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 418, in forward
    text_logits, mel_logits = self.get_logits(conds, text_emb, self.text_head, mel_emb, self.mel_head, get_attns=return_attentions, return_latent=return_latent)
  File "X:\programs\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 363, in get_logits
    first_logits = first_head(first_logits)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\nn\modules.py", line 260, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\autograd\_functions.py", line 403, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\autograd\_functions.py", line 235, in forward
    A = A.view(-1, A.shape[-1]).contiguous()
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Using bnb.nn.StableEmbedding, a different error occurs:

23-02-22 17:18:16.516 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
Traceback (most recent call last):
  File "X:\programs\ai-voice-cloning\src\train.py", line 62, in <module>
    train(args.opt, args.launcher)
  File "X:\programs\ai-voice-cloning\src\train.py", line 52, in train
    trainer.init(yaml, opt, launcher)
  File "X:\programs\ai-voice-cloning\./dlas\codes\train.py", line 145, in init
    self.model = ExtensibleTrainer(opt)
  File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 189, in __init__
    self.load()  # load networks from save states as needed
  File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 536, in load
    self.load_network(load_path, net, self.opt['path']['strict_load'], opt_get(self.opt, ['path', f'pretrain_base_path_{name}']))
  File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\base_model.py", line 131, in load_network
    network.load_state_dict(load_net_clean, strict=strict)
  File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnifiedVoice:
        Missing key(s) in state_dict: "text_embedding.norm.weight", "text_embedding.norm.bias", "mel_embedding.norm.weight", "mel_embedding.norm.bias", "mel_pos_embedding.emb.norm.weight", "mel_pos_embedding.emb.norm.bias", "text_pos_embedding.emb.norm.weight", "text_pos_embedding.emb.norm.bias".
X:\programs\ai-voice-cloning>

What Do??

I don't expect anyone to lend a hand (much less, even see this, as I refuse to leave my sphere of influence), but I'm mostly just documenting my efforts and leaving a framework incase some wizard swoops in and fixes it in exchange for sucking him off or something.

I'm sure I'll finagle my way into getting it to work, but these are simply my initial barriers.

# Preamble As a way to greatly reduce the required VRAM for training, I'm in the process of (lazily) implementing [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) as a branch: https://git.ecker.tech/mrq/DL-Art-School/src/branch/bitsandbytes. The conversion is simple: * `import bitsandbytes as bnb` * replace `nn.Embedding` to `bnb.nn.StableEmbedding` (or `bnb.nn.Embedding`) - to switch between the two, in [bitsandbytes_windows/nn/__init__.py](https://git.ecker.tech/mrq/DL-Art-School/src/branch/bitsandbytes/bitsandbytes_windows/nn/__init__.py), edit the last line. * replace `torch.optim.Adam` to `bnb.optim.Adam8bit` (and other Adam*s) * replace `nn.Linear` to `bnb.nn.Linear8bitLt` And through the magic of quanitazation, VRAM should be reduced. I don't care about the performance uplifts, if there are any, I just want users to be able to train on a 3080. # Usage To install bitsandbytes: ``` python -m pip install bitsandbytes==0.35.0 ``` Additionally, on Windows, copy the files under `./bitsandbytes_windows/` into `ai-voice-cloning/venv/Lib/site-packages/bitsandbytes/`. I am not responsible for anything from those DLLs, as I've sourced them from [here](https://github.com/kohya-ss/sd-scripts). # Errors Using `bnb.nn.Embedding`, the following error occurs: ``` X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\autograd\_functions.py:231: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") 0%| | 0/6 [00:06<?, ?it/s] Traceback (most recent call last): File "X:\programs\ai-voice-cloning\src\train.py", line 62, in <module> train(args.opt, args.launcher) File "X:\programs\ai-voice-cloning\src\train.py", line 53, in train trainer.do_training() File "X:\programs\ai-voice-cloning\./dlas\codes\train.py", line 330, in do_training self.do_step(train_data) File "X:\programs\ai-voice-cloning\./dlas\codes\train.py", line 211, in do_step gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_grad_norms=will_log) File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 303, in optimize_parameters ns = step.do_forward_backward(state, m, step_num, train=train_step, no_ddp_sync=(m+1 < self.batch_factor)) File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\steps.py", line 252, in do_forward_backward injected = inj(local_state) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\injectors\base_injectors.py", line 93, in forward results = method(*params, **self.args) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward return self.module(*inputs[0], **kwargs[0]) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "X:\programs\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 418, in forward text_logits, mel_logits = self.get_logits(conds, text_emb, self.text_head, mel_emb, self.mel_head, get_attns=return_attentions, return_latent=return_latent) File "X:\programs\ai-voice-cloning\./dlas/codes\models\audio\tts\unified_voice2.py", line 363, in get_logits first_logits = first_head(first_logits) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\nn\modules.py", line 260, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\autograd\_functions.py", line 403, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\bitsandbytes\autograd\_functions.py", line 235, in forward A = A.view(-1, A.shape[-1]).contiguous() RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. ``` Using `bnb.nn.StableEmbedding`, a different error occurs: ``` 23-02-22 17:18:16.516 - INFO: Loading model for [./models/tortoise/autoregressive.pth] Traceback (most recent call last): File "X:\programs\ai-voice-cloning\src\train.py", line 62, in <module> train(args.opt, args.launcher) File "X:\programs\ai-voice-cloning\src\train.py", line 52, in train trainer.init(yaml, opt, launcher) File "X:\programs\ai-voice-cloning\./dlas\codes\train.py", line 145, in init self.model = ExtensibleTrainer(opt) File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 189, in __init__ self.load() # load networks from save states as needed File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\ExtensibleTrainer.py", line 536, in load self.load_network(load_path, net, self.opt['path']['strict_load'], opt_get(self.opt, ['path', f'pretrain_base_path_{name}'])) File "X:\programs\ai-voice-cloning\./dlas/codes\trainer\base_model.py", line 131, in load_network network.load_state_dict(load_net_clean, strict=strict) File "X:\programs\ai-voice-cloning\venv\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UnifiedVoice: Missing key(s) in state_dict: "text_embedding.norm.weight", "text_embedding.norm.bias", "mel_embedding.norm.weight", "mel_embedding.norm.bias", "mel_pos_embedding.emb.norm.weight", "mel_pos_embedding.emb.norm.bias", "text_pos_embedding.emb.norm.weight", "text_pos_embedding.emb.norm.bias". X:\programs\ai-voice-cloning> ``` # What Do?? I don't expect anyone to lend a hand (much less, even see this, as I refuse to leave my sphere of influence), but I'm mostly just documenting my efforts and leaving a framework incase some wizard swoops in and fixes it in exchange for sucking him off or something. I'm sure I'll finagle my way into getting it to work, but these are simply my initial barriers.
mrq added the
todo
bug
enhancement
help wanted
labels 2023-02-22 23:23:56 +00:00
Author
Owner

Nevermind, I'm a genius. I got training to work on 6GiB of VRAM.

Nevermind, I'm a genius. I got training to work on 6GiB of VRAM.
mrq closed this issue 2023-02-23 02:39:36 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/DL-Art-School#1
No description provided.