How I resolved MySQL restart errors after ‘Upgrade’ in Kubernetes

Upgrading a database has always been a slippery slope. Personally, once I set up a database, I will never upgrade it unless a for security reasons – which I have never heard off yet. I prefer remaining simple with solutions that work, instead of introducing complexity in upgrades.

However, as the world works, I found myself in a situation where I did not do an upgrade, yet my database was upgraded… Here’s what happened.

I was playing around with my Kubernetes cluster, a simple k3s installation. I wanted to migrate it from one server to another which I had acquired at a good deal on Black Friday. Now, I started working on it at around 8pm after coming from work and all that. I was tired. So my grand plan was to connect the new server with the old one in a master-master relationship, then update replicas of the storage to 2, so that a new replica may be created and data copied to it by Kubernetes itself. Everything worked well and the copy happened, but then out of a scarcity of wisdom, I immediately stopped the old server and disconnected it from the cluster.

I was supposed to first cordon it, then migrate the etcd database to the new server, then shut off the old server and make sure everything’s alright before finally removing the old server from the cluster. But as I said, I was tired.

The result was, my cluster went down, the storage went down and somehow got corrupted. When I managed to recover cluster and make it run, my database storage refused to mount. Instead of solving that problem, I made a bigger mistake by deleting the database storage so I can try restore it from backups. Unfortunately, my backups failed to restore. I tried 12 times over 72hrs but could not restore. The disk was in faulted status!

Luckily, I had a 3 month old backup I had used to migrate from another cluster sometime ago and this one restored. But a new challenge emerged. The backup that restored successfully, was made using MySQL 8.0.28 and my current Statefulset database installation used MySQL 8.0.30. So MySQL kept refusing to start due to ‘issues with this upgrade’.

Well, here’s how I resolved it

1.First, I enabled Debug mode on my Helm Chart for MySQL

env:
            - name: BITNAMI_DEBUG
              value: 'true'

2. This allowed me to see the error below

2022-09-06T13:13:39.928986Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.30) starting as process 1
2022-09-06T13:13:40.701091Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2022-09-06T13:13:40.954725Z 1 [ERROR] [MY-012526] [InnoDB] Upgrade is not supported after a crash or shutdown with innodb_fast_shutdown = 2. This redo log was created with MySQL 8.0.28 and it appears logically non empty. Please follow the instructions at http://dev.mysql.com/doc/refman/8.0/en/upgrading.html
2022-09-06T13:13:40.954752Z 1 [ERROR] [MY-012930] [InnoDB] Plugin initialization aborted with error Generic error.
2022-09-06T13:13:41.328075Z 1 [ERROR] [MY-010334] [Server] Failed to initialize DD Storage Engine
2022-09-06T13:13:41.328265Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2022-09-06T13:13:41.328299Z 0 [ERROR] [MY-010119] [Server] Aborting

Credits to this serverfault article

3.To fix this, I accessed MySQL via command line and renamed the ib_logfile0 and ib_logfile1 files. MySQL is supposed to recreated them on restart.(Some guys recommend the files be deleted altogether.)

4.After restarting MySQL, I encountered another error about missing ib_logfile* files or #innodb_redo folder. Sorry I didn’t capture the whole message. To resolve it, I created the file as follows

I have no name!@db:/$ cd /bitnami/mysql/data/
I have no name!@db:/$ mkdir '#innodb_redo'

5. I then restarted MySQL Statefulset again and this time it worked and my services were back online.

I literally felt like a stone was lifted from my chest! My data was back! I was lucky in the last 3 months, I was so busy I did not do a lot of writing and changes to my services.

Conclusion

Of course I learnt my lessons on due diligence. Before you embark on a similar migration operation, its critical to take offline backups of your database and verify they can restore. Also, do these things when fresh not tired, to avoid unnecessary errors and resultant pain and suffering.

I realise I may need to set up the database Statefulset afresh since I kept getting these errors. This is a red sign and instead of fighting with errors, I prefer to not having them at all.

2022-12-02T06:11:41.971377Z 0 [ERROR] [MY-011971] [InnoDB] Tablespace 'innodb_undo_002' Page [page id: space=4294967278, page number=501] log sequence number 12822223146 is in the future! Current system log sequence number 12789001578.
2022-12-02T06:11:41.971411Z 0 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB redo log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.

Anyway, I hope this narration will help someone out there someday should you encounter a similar scenario. Kubernetes does have a steep learning curve!

Leave a Reply

Your email address will not be published. Required fields are marked *