Backups retten.

Egal wie oft man es sagt, es bleibt für viele graue Theorie und damit eine nicht sehr reale Bedrohung. Aber Datenausfall kann jeden Treffen und gerade bei SSDs ist die Rettung der Daten nicht mehr so „einfach“, wie bei den alten HDDs mit Magnetscheiben.

Ron Gilbert, einer der Köpfe hinter Computerspielen wie Maniac Mansion, arbeitet zur Zeit an einem neuen Projekt und hatte jetzt einen Hardwareausfall. Und nur, weil er die wichtigsten Daten sicherte, ging alles relativ glimpflich aus. Und die Zeit bis zu dem Zustand „man kann weiter arbeiten“ ist Nervenkitzel pur. Gerade bei kleinen Firmen kann der Datenverlust das Aus bedeuten. Aber lesen sie selbst:

ThimbleCrash
by Ron Gilbert
Apr 11, 2016

A few days ago, I stumbled into my home office, a bowl of oatmeal in hand, getting ready for a quick check of the Twitters before my morning run, but oddly, my computer was off. I leave my Mac running all the time and it was strange that it didn’t just wake up from sleep. I powered it on and everything seemed normal.

The machine will sometimes restart in the middle of the night, and when it reboots, there is a nice message box telling me that it crashed and kindly shows me the logs. This morning, no such message enthusiastically greeted me.

Odd.

The next day I was editing a very large Photoshop files — touching on 4GB — when it kept popping up these errors when I tried to save saying I didn’t have the correct permissions to save.

Odd.

The next morning, I headed into the home office again, pre-run oatmeal in hand, and sat down to read the emails. Most of the new email that arrived during the night had no sender or subject.

Odd.

A few seconds later, a message box pops up, asking me to enter my iCloud password, I hit cancel and switched to my browser and pulled up Twitter and then Chrome asked me for my Twitter password and had me logged out. I went to another site, and I had also been logged out and it was asking for my password again, then my email program asked me for my password, I entered it and hit OK, then a new message box come on saying the login group of keychain was missing and did I want to reset it.

Odd.

Something was going wrong and I decide to just reboot and see if things were magically fixed, because, you know, that might happen. Right?

As the machine was shutting down, it dawned on me that rebooting my machine when it was telling me the login keychain was missing might not have been the smartest idea, and I was right.

Half way through the boot process, the machine just shut down. Three more attempts with the same results. I booted in verbose mode and watched the boot process, everything was normal until it got to the disk check, then it displayed a slew of errors and shutdown.

Crap.

I booted in recovery mode and ran Disk Utility and checked the disk, sure enough, there were a crap-ton™ of missing block error messages. No problem, I’ll just hit „Repair Disk“ and be up and running again.

Nope.

Repair Disk informed me that it was unable to repair the disk. I was somewhat disappointed the Mac didn’t emit a mechanical mocking laugh at this point.

I didn’t have a Thunderbolt Cable, so I couldn’t connect my iMac to my laptop and see if the drive was still readable.

I’m pretty religious about backing stuff up. Time Machine runs every hours and skips only my large video and audio files. It doesn’t back up my projects and source code, but they are all in Git. I had made a few changes the previous day and I had not pushed, but it was just a few lines of code, easily retyped. The big thing I didn’t have was backup of was my Windows VM.

Without it I can’t do windows build for testing. Nothing of importance is on the VM except a install of Visual Studio. The VM could be rebuilt in an afternoon, so losing it wasn’t climatic, just a pain. The one thing I was going to lose by doing a reformat was the podcast we did on Friday. I hadn’t edit it yet, so it hadn’t been archived for future generations to enjoy.

After a little more thinking about what might be on the machine and not backed up, I decided to reformat and reinstall from my Time Machine backup.

There was a very real possibility that this was a hardware problem and reformatting wasn’t going to save the day. In that case, I was going to have to send the machine out to get the drive replaced and that would take several days, if not a week if I wanted Apple to do it under Apple Care. With PAX looming in a few weeks, that was not an event I welcomed.

I can do just about everything on my laptop, except make Windows builds, so it wasn’t a catastrophe, just a big pain as half of our testing staff is Windows only.

I went into Disk Utility again and selected the volume, paused for a few seconds to contemplate the destructive nature of my next move, then hit Erase. A few moments later, I was informed that my drive could not be reformatted.

My iMac has what’s called a fusion drive. There is a 128MB SSD drive and a 3TB spinning drive that are fused into one big drive. The OS is smart enough to move files you don’t access very often to the slow spinning disk, keeping the files you need on the spiffy fast SSD drive. It’s a great idea. The new macs have it, and it’s been embedded in a lot of standalone drives, and there is a version for Windows machines.

The problem with fusion drives is you now have now have two points of failure. If either drive goes bad, you lose the data on both drives, which is what happened to me.

Apparently, while the iMac is happy to have a fusion drive, Disk Utility has not caught up yet, and there is no way to reformat it.

Gads.

At this point, I’m kind of stuck. All I want to do is reformat the drive and start over. Visions of days waiting to speak to Apple and weeks of waiting to get my machine back are dancing through my head, all while PAX stalks closer and closer.

I call up the local Apple store and see when I can get an appointment to visit the Genius Bar. I don’t have a lot of faith in the Genius Bar to help with this issue. Normally it is filled with people trying to figure out how to get email on their iPhones. I imagine I’ll bring the computer in and the „genius“ behind the „bar“ will shrug and tell me they need to send it in and there will be a two week wait, but if I’m having trouble getting email on my iPhone, they’d be happy to help.

I place the call and much to my surprise, they have a free slot at 4:45 that afternoon. Great. I pack the computer up and haul it in.

As expected, there are about 30 people being helped at the Genius Bar and other than a few laptops, they are all iPhones. I plop my giant iMac on the counter and wait, feeling quite out of place.

At 4:50 a nice person comes over and asks what the problem is. I tell him the machine won’t boot due to a disk error. He then proceeds to talk to me like I’m a 4 year old, explaining that a hard disk has this spinny thing in them and sometimes those can go bad.

Seriously.

I then tell him I’m a Mac developer (I probably rolled my eyes), at which point he actually seems relieved and switches to full on nerd mode. He plugs my machine into the store network and boots from there, then proceeds to run some fancy diagnostic stuff I don’t have access to. The good news is he doesn’t find anything physically wrong with the drive.

He connects the iMac to my laptop and we mount it as a external drive. Everything seems to still be there, so I spend the next half hour copying the Windows VM and the podcast to my laptop and we reformat the machine using a bunch of shell commands, while he’s happy to explain what is happening.

I ask why Disk Utility can’t just reformat the drive. He says the Apple Utilities haven’t caught up to the fusion drives and (politely) expresses some amount of frustration at this fact. I get the impression he’s done this a lot.

I pack up the newly reformatted machine and head home. Time Machine restored perfectly, I then pulled all the repos from git, and other than needing to reenter all my passwords, the machine is back like nothing happened.

I know you hear this a lot, but back up your shit. This story would not have had a happy ending if I didn’t back up everything obsessively. I run Time Machine for local backups and use Arq to keep offsite archives on amazon’s S3 storage (Time Machine can’t help you if your house burns down).

I did manage to restore Friday’s podcast recording, so I’ll try and have that edited and up tomorrow.

I lost a day, but I got a nice clean desk out of it, so I’ll call that a win.

– Ron

Text used with kind permission. Original contend here: https://blog.thimbleweedpark.com/crash

Dieser Beitrag wurde unter Sicherheit abgelegt und mit , , verschlagwortet. Setze ein Lesezeichen auf den Permalink.