PDA

View Full Version : Offical Forum Downtime Explanation




Tris
08-13-2007, 03:12 PM
Hello Guys,

Firstly let me introduce myself, I am the admin who got the site back up and running. I was not going to post here but I saw a few suggestions and questions around the place that I thought I should address.



1. Why did the site go down?


Josh spotted some problems with the site as it kept crashing and on reboot fsk came up with some weird errors. The datacenter confirmed the hardware was failing. Josh decided it would be better to backup now before it was to late.



2. How was it fixed?


Anouther hardrive was installed and the operating system was installed on that. Then the failed/old hardrive was mounted as a slave, and I used this to grab as much of the old data as I could to put it onto the new hardrive. Luckily there was not much that was corrupted and it was not the hardest job in the world.



3. Why not use RAID?


Some people have suggested using a raid setup. Although useful raid would unlikely have helped in this situation because if there is ext3/file system corruption (as I think was going to happen in this case) then it is no use as all as it will just get copied onto the RAID.




I cant remember if there were any other important technical questions that you need answering but please feel free to throw them at me and I will try to monitor it as best as I can.

JoshLowry
08-13-2007, 03:18 PM
Tris gets the Hero of the day award for getting the server up and running.

Very helpful in a time of need! Thanks again Tris...

Tris
08-13-2007, 03:19 PM
Tris gets the Hero of the day award for getting the server up and running.

Very helpful in a time of need! Thanks again Tris...
No Worries, I should be thanking you for the site :) I like it. Seems a nice "little" community. And I have learn't a lot, although even if I am not American.

DeadheadForPaul
08-13-2007, 03:21 PM
thanks guys :)

mconder
08-13-2007, 03:23 PM
raid would unlikely have helped in this situation because if there is ext3/file system corruption

True enough.

BLS
08-13-2007, 03:32 PM
Hello Guys,

Firstly let me introduce myself, I am the admin who got the site back up and running. I was not going to post here but I saw a few suggestions and questions around the place that I thought I should address.



Josh spotted some problems with the site as it kept crashing and on reboot fsk came up with some weird errors. The datacenter confirmed the hardware was failing. Josh decided it would be better to backup now before it was to late.



Anouther hardrive was installed and the operating system was installed on that. Then the failed/old hardrive was mounted as a slave, and I used this to grab as much of the old data as I could to put it onto the new hardrive. Luckily there was not much that was corrupted and it was not the hardest job in the world.



Some people have suggested using a raid setup. Although useful raid would unlikely have helped in this situation because if there is ext3/file system corruption (as I think was going to happen in this case) then it is no use as all as it will just get copied onto the RAID.




I cant remember if there were any other important technical questions that you need answering but please feel free to throw them at me and I will try to monitor it as best as I can.


I disagree on point 3. A RAID 5 array would have flagged a corrupt drive with read/write errors (which is likely the cause of the data corruption) and removed it from the array. But ultimately, the System Partition should be on a RAID 1 (mirror) on disks that are seperate from a minimum 3 disk RAID 5 array.

But either way, it's not worth fighting over. Thank you for getting our site back up!

Nash
08-13-2007, 03:35 PM
Hello Guys,

Some people have suggested using a raid setup. Although useful raid would unlikely have helped in this situation because if there is ext3/file system corruption (as I think was going to happen in this case) then it is no use as all as it will just get copied onto the RAID.



I would think it got corrupted due to the hardware failure and not the other way around?

If you run RAID 0 then it's not gonna help you because the corruption is fragmented across both disks, however if you run RAID 1 corruption is less likely since the disks are mirrored instead of striped.

Anyways I'm not the sys admin looking at the data here so I don't really know, just saying :)

Thanks for getting us back online Tris! I missed my fix this weekend.

ButchHowdy
08-13-2007, 03:43 PM
I was hoping for a juicy conspiracy theory.

Oh well, let me go fix my foil fez!

ButchHowdy
08-13-2007, 03:43 PM
XX

ButchHowdy
08-13-2007, 03:44 PM
XX

CurtisLow
08-13-2007, 03:48 PM
Thanks for bringing the forum back to life! Darn that was a rough 2 days lol

http://img259.imageshack.us/img259/5973/1f03sn0.jpg

cujothekitten
08-13-2007, 03:48 PM
Well good thing the site is back up. I was out of town and wasn't able to get any info about the straw poll until I got home. I came here to check the news and BAM, the site's down :(

Glad it's back though, work needs to be done :D

Perry
08-13-2007, 04:05 PM
Well depending on what caused the corruption a raid 1 might or might not have helped. I would think however that a raid 1 would be the bare minimum you would want to run. This board is getting some decent traffic already. Can you imagine what it might look like in six months?
Ask for donations if you must. Hard drives are dirt cheap nowadays.
There is a good thing going on here and twice it has been lost during a crucial moment. I would be willing to contribute see to it that things flow smoothly here.