Why did Immich stopped working? Mini postmortem for self hosted services.

Immich stopped working with errors that Redis was failing. I didn't know Redis could fail! What's the in memory database doing!

Alec Di Vito
Alec Di Vito 3 min read

On July 9th, 2024, my photo management software I self host (immich) stopped working. I was getting an error in the api logs talking about something not working with "Redis". The error in particular was the following:

MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error

Ok, the error is with Redis, let's check the logs that it's reporting:

1:M 10 Jul 2024 22:39:04.073 * 10000 changes in 60 seconds. Saving...
1:M 10 Jul 2024 22:39:04.075 * Background saving started by pid 213
213:C 10 Jul 2024 22:39:04.138 # Write error while saving DB to the disk(fsync): Disk quota exceeded
1:M 10 Jul 2024 22:39:04.175 # Background saving error
1:M 10 Jul 2024 22:39:10.033 * 10000 changes in 60 seconds. Saving...
1:M 10 Jul 2024 22:39:10.034 * Background saving started by pid 214
214:C 10 Jul 2024 22:39:10.100 # Write error while saving DB to the disk(fsync): Disk quota exceeded

Ahhok, it seems like Redis is reporting issues with saving a file and the error being reported is with the disk quota being exceeded. I've configured the PVC in Kubernetes to be 30GB large, I would be really impressed if the Redis database was that big, because I may have some data, but not enough for a Redis database to be that large. However you can check if your taking up too much space by running the df -i command in the container. Let's try that.

$ df -i
Filesystem                                 Inodes  IUsed    IFree IUse% Mounted on
overlay                                  30276800 600293 29676507    2% /
tmpfs                                      999296     17   999279    1% /dev
10.0.0.200:/volume1/nfs-volume/redis-pvc        0      0        0     - /data
/dev/sda2                                30276800 600293 29676507    2% /etc/hosts
shm                                        999296      1   999295    1% /dev/shm
tmpfs                                      999296      9   999287    1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                      999296      1   999295    1% /proc/asound
tmpfs                                      999296      1   999295    1% /proc/scsi
tmpfs                                      999296      1   999295    1% /sys/firmware

As you can see from the result, I'm not really anywhere near maxing out any of my storage on this pod... On top of that, the data is being saved to NFS Synology box I have on my network. The Synology box has a max capacity of 2 TB and I know for a fact that only 1 TB of that is used. Why am I getting disk quota exceeded error?

Looking online, I found out that the quick solution to this issue was disabling the snapshots on every write as written in this gist link on github.

$ redis-cli
> config set stop-writes-on-bgsave-error no

Redis calls this out in their config file as well. There's even a question and answer on one of their pages as seen here.

Debugging the issue

Restarting redis or immich didn't solve the issue. Stumpped that turning it off and on again didn't fix any issues, I decided to see if I could replicate the issue. Becaue it's a homelab and not a bank, I did a little old

$ kubectl exec redis -it -- bash

And then went to the directory that was connected to the file share and tried to write hello world to a new random file.

echo "hello world" > example.file

To my surprise, this posted the same error I had seen before, saying that I have exceeded my quota share. I wasn't sure if Synology had the ability to set file system limits but the investigative journalist inside of me decided to check it out.

I logged into Synology and went to my shared folders that I use for my Kubernetes cluster, and lo and behold this is what I see...

Screenshot 2024-07-10 at 7.16.38 PM.png

Well, actually not this, because this was me fixing the issue, but you gotta believe me, it showed that I used 256GB of my 256GB quota.

Lessons learned

Synology lets you set the max amount of storage allowed on your drives. I must have set this up about 2 years ago and never configured it again. The issue is now solved and my immich uploads are (partially) working again 🙃

image.png