Very Bad Training Time & Results. #403

Open
opened 2023-10-05 23:01:55 +07:00 by FortermalGreek · 2 comments

I tried to train a model using 7-Minute audio file. 500 epochs, with a batch size of 110 and gradient of 110. 

The estimated Wait time was 3 days. This is really weird. I have seen others with similar specs to my PC who only had to wait for a few hours or just a few minutes.

I ended up waiting the 3 days, and the results of the model were extremely bad.

I played a lot with the fine-tuning, and it stayed really bad. i reinstalled the software multiple times, factory reset my pc, nothing changed.

Google Colab seems to be working fine, but Colab is by nature really slow, It takes 11 Hours for the same data-set (Ironically way faster) The results seemed to be somewhat good on Colab

Is there any other fork of tortoise that may work? or what should i do?
Any help is appreciated, here are my specs:

  • Intel Core i5-9600K @ 3.70GHz
  • 16GB ram
  • RTX 3070
  • Windows 10
I tried to train a model using 7-Minute audio file. 500 epochs, with a batch size of 110 and gradient of 110.  The estimated Wait time was 3 days. This is really weird. I have seen others with similar specs to my PC who only had to wait for a few hours or just a few minutes. I ended up waiting the 3 days, and the results of the model were extremely bad. I played a lot with the fine-tuning, and it stayed really bad. i reinstalled the software multiple times, factory reset my pc, nothing changed. Google Colab seems to be working fine, but Colab is by nature really slow, It takes 11 Hours for the same data-set (Ironically way faster) The results seemed to be somewhat good on Colab Is there any other fork of tortoise that may work? or what should i do? Any help is appreciated, here are my specs: - Intel Core i5-9600K @ 3.70GHz - 16GB ram - RTX 3070 - Windows 10

This might be related to #399 where using the "latest" drivers is actually a detriment when training closed to max VRAM usage. I'd suggest either:

  • downgrading your drivers
  • reducing your batch size
This might be related to #399 where using the "latest" drivers is actually a detriment when training closed to max VRAM usage. I'd suggest either: * downgrading your drivers * reducing your batch size

@mrq I tried downgrading to NVIDIA drivers 531.79
and i played a lot with the fine tuning, I'm still really unhappy with the wait time.

I tried using DL Art School (From GitHub) and the wait time was also unusually long, the issue must have to do with DLAS. are there any alternatives to DLAS? or what should i do? :c

@mrq I tried downgrading to NVIDIA drivers 531.79 and i played a lot with the fine tuning, I'm still really unhappy with the wait time. I tried using DL Art School (From GitHub) and the wait time was also unusually long, the issue must have to do with DLAS. are there any alternatives to DLAS? or what should i do? :c
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mrq/ai-voice-cloning#403
There is no content yet.