I was playing Ark Survival Ascended when my system locked up. No response from the mouse or keyboard, screen frozen, sound loop about 1 second long. I let it sit for a minute, thinking maybe it’ll break out of it, and eventually had to force the power off with the power button.
I restarted my system, and now my performance in games is really bad, I’m getting about 20fps where I used to get 80-100, sometimes it gets so bad it goes into the single digits. I get stuttering sound as well and some pretty bad input lag. In Ark, I can see the textures slowly pop in over time, which normally happens in a matter of a second or two.
Looking at CoreCtrl, if I set it to high performance mode, the GPU’s power usage peaks around 150 Watts instead of 300+.
I’m running Nobara on a 7900X3D and an RX 7900XT with 32GB RAM.
Not sure how to go about diagnosing my issue here. I haven’t made any software changes, so I’m a little lost as to why this would happen.
Update: After trying everything suggested here, and all the googling I could manage, I ended up doing a full reinstall, and kept having issues. Eventually, I narrowed it down to the PCIE riser cable in my case (which I suppose I should have mentioned in the first place) which is supposed to be PCIE 4.0, but it seems to be what was causing my issues. I set my PCIE to 3.0 in the BIOS and everything is fine so far. I don’t notice any performance reduction at all, so it probably wasn’t saturating PCIE 4.0, but the riser isn’t good enough for it I guess.
Probably nothing (because your gpu has some power spikes, just not hitting max power), but I’d make sure the integrated gpu in the bios is turned off; it’s possible something happened when playing, and the bios reverted to selecting the igpu on your 7900x3d. When I first booted my 7800x3d this was occurring, and I fixed it by turning it off in the bios.
deleted by creator
I would wager on a cooling issue
As far as I can tell, temps are not the issue. The CPU doesn’t appear to go over 70C, and the GPU rarely goes over that as well. The junction temps get pretty hot, but stay under 100.
The performance is also pretty bad immediately after booting or waking up when it hasn’t even had time to heat up. And until this issue happened, everything was running fine for months.
It would not surprise me in the least to find out a big heatmonster like those X3D chips will hit throttling temps at idle if the CPU fans stop spinning. Probably within seconds of booting. Can you check the actual clock speeds of the cores at idle/load? See if you’re geting anywhere close to your 5.whatever ghz.
I’m running a 240mm AIO and the fans are working fine, and CoreCtrl is reporting 5.0 to 5.1 gHz
It’s idling at about 51C, which I think is pretty decent for this chip. When I first put the PC together, it was idling around 60, but I think the waterblock and thermal paste settled, and the temps stabilized.
This happened to me once when an AIO pump failed on me but it sounds like your temps are fine.
Fair enough. Not temps then.
Are all your fans working properly? It might not manifest as a temperature issue if it can throttle sufficiently.
I’d lean this way too.
Had similar problems when a fan stopped on my CPU.
This one, any mention of temps is conspicuously absent. OP should check them immediately.
Now that you mentioned it, I’ve had VERY similar issues on an old machine which had some cooling issues (it’s a laptop, what did you expect?). So I’d wager you’re right.
Just that game or all games performing poorly?
Just that game: verify or reinstall the game.
Other games too: if it’s a software issue, check Graphics Driver? For potential hardware issues, check if a GPU power cable is loose, reseat it, and make sure you don’t have two pcie power connectors from the same cable connected to both ports.
and make sure you don’t have two pcie power connectors from the same cable connected to both ports.
Why? Non question and I’m curious. I have 1 cable from PSU to GPU
The cable is only rated for so much power, which could be too much for double 8 pins. Having two operated cables ensures it can handle the load.
I personally used a single cable that carried double 8 pin. At some point, I had issues with my system crashing under GPU load. After investigating, I found that the GPU wasn’t getting the power it needed from the PSU. Looking at the cable, it had started to melt the plastic with the connector in the PSU. I replaced the cable and it was fine, but now I only use one cable per 8 pin connector.
Thanks! I didn’t know that.
Based on ATX standards https://xdevs.com/doc/Standards/ATX/ATX12V_Power_Supply_Design_Guide_Rev1.1.pdf,
Each 12V circuit from the PSU should deliver around 150W and definitely not more than 240VA (over current protection kick in rating). One cord runs on that single circuit so it can’t deliver more than that much power. I experienced this when I foolishly thought I could reduce cable clutter building my friends’ PC, only to realize the 3080 ran terribly and was drawing about 150W, not unlike OP, except in my case it was clear this was the issue.
Thanks! I didn’t know that.
Stuttering and texture pop-in makes me immediately wonder if your SSD shit itself.
Maybe see if there’s anything in the system logs and/or SMART data that indicates that might be a problem?
The SSD seems healthy as far as smartctl is concerned. No errors or warnings anywhere, no spare storage used, it’s only about half full currently.
I’m not sure what logs I’m looking for otherwise.
have you tried a fsck?
SSDs die suddenly.
Don’t forget to update us please in case you figure something out :)
I didn’t get a chance to look into it more, but I will update when I do.
Benchmark both cpu and gpu. It might be one of them failing. Check temperatures with the benchmark as well. Also test memory. I’m going to assume it’s throttling or your gpu is having issues. CPU would be obvious because it wouldn’t just be gaming that’s messing up. Normally if it’s only under heavy load it’s because of high usage applying a stress somewhere that’s forcing symptoms of throttling etc.
Looking at the system journal using
journalctl
is always a good start. Move to the page, which shows events around the time the described incident happened and try to see if there’s anything worth of your attention, likely highlighted as a warning (yellow) or an error (red).This is exactly where I would start and ty add to this good advice I would recommend using the -f flag and let it sit in the background.
I don’t think this is your specific issue but I’m sharing just in case.
Once I had a similar problem and the root cause was basically that in the course of unplugging all USB shit just in case, I replugged my VR headset in a different port. That caused the entire system to become very unresponsive and the logs we’re not helping at all. Maybe you left a bad USB plugged in from something? Probably not but it’s free to check.
You should look in dmesg, it’s always a mess but maybe your issue appears there.
Could be one of the 8pins to the GPU is not seated.