This site was down for 11 hours

Fangu

Great Old One
See here

Something probably crashed for x reason. Restarting the Apache/MySQL/PHP services fixed it. Sometimes a restart is all it takes.

The reason it took so long to notice was the site was down from around 2 AM Amsterdam time. When that happens on a Saturday night, there's a chance the Europeans with the passwords might not notice or be notified right away. The best way to report when the site is down is in the chat. If Yop is asleep, Aaron and Ryu might not be, and they can let him (or anyone with Yop's phone number, like myself) know that the site is down through Skype or SMS.

I'm also able to do simple stuff like restarting, but this time I messed up inputting the root password :wacky:
 

Geostigma

Pro Adventurer
AKA
gabe
It's all a cover up.

Yop drunkenly went to the server farm where TLS is hosted, pulled out the server blade and had a sword fight with the network admin on site.
 

Russell

.. ? ..
AKA
King of the Potato People
history-channel-alien-guy-meme.jpg
 

Ⓐaron

Factiō Rēpūblicāna dēlenda est.
AKA
The Man, V
I don't actually have the numbers of anyone with root server access, so while I knew the site was down, I couldn't let anyone know :monster:
 

Fangu

Great Old One
^ Plus, everybody would be asleep :monster:

I check Skype regularly though; I checked the Skype chat before going jogging at 10:30 this morning but there was no notification of OMG TLS DOWN FOR ALMOST 10 HOURS so I assumed it was just a short hiccup :monster:
 

Ⓐaron

Factiō Rēpūblicāna dēlenda est.
AKA
The Man, V
Well I haven't been able to get into the Skype chat for months either :monster:
 

Octo

KULT OF KERMITU
AKA
Octo, Octorawk, Clarky Cat, Kissmammal2000
I hope you used the time productively Umatbru....
 

Tennyo

Higher Further Faster
Is this like when Facebook goes down and the whole world freaks out and no one knows what to do and all the local news stations jump on it because they have nothing better to report?
 

Cthulhu

Administrator
AKA
Yop
And because Facebook has a 24/7 ops team that explodes on high alert whenever something goes down, :monster:. They also have two billion users or so vs the couple dozen we have.

Anyway, in all likelihood one of the three software systems used to run the site software and serve it went on the fritz and needed a kick up the boot. I think I had it set to automatically restart daily, but I'll have to double check. I haven't looked at any logs, but previously there was nothing useful in there.

I'll have to:
* Give more people access (and instructions) to restart services. I'm a bit hesitant with that because it's easy to destroy the server with a single command, which they would also be authorized to do. But hey, risk ftw.
* Give moar people my phone #, in case of stuff like this. SSHing into the server at 2 AM to restart stuff shouldn't be too bad a problem. I'd probably sleep right through text messages though.
* Set up some kind of monitoring system so I get an automagic text message whenever something goes down. IDK how reliable those are though.
 

Cthulhu

Administrator
AKA
Yop
Yeah I got a text just now. I had a look, got this error from the Apache webserver log:

[Sun Nov 23 21:35:32.084784 2014] [mpm_event:error] [pid 32185:tid 139634504406912] AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

Which indicates there were too many open requests and no workers left to handle them. I restarted the webserver, freeing up busy workers, but that didn't fix it - and I saw in my magick status page that all workers were busy, but unable to process requests. This leads to the next layer in the stack, PHP, which is actually a pool of workers that process requests forwarded to it by apache. I restarted that, and sure enough, fixed. I dug into the settings and it turns out that the max amount of workers there was set to 5, which is rather low and for very low-powered servers (usually). I upped that, but I still think there might be someone or something that somehow assigns a task to said workers that executes indefinitely - like a ddos or something? Might just be accidental though. I've got an eye on Apache's status page atm, nothing out of the ordinary there.

(As an aside, I saw that we were getting hotlinked from here, lol) Edit: I can totally see you guys checking that page out, lol
 

Octo

KULT OF KERMITU
AKA
Octo, Octorawk, Clarky Cat, Kissmammal2000
(As an aside, I saw that we were getting hotlinked from here, lol) Edit: I can totally see you guys checking that page out, lol

They said that Heather Mason is a sexier chica than Tifa? Phail.
 

Cthulhu

Administrator
AKA
Yop
Alright, so I monitored the logs this morning and saw a rather sudden spike in CPU usage. Double-checking the access log, I saw a fuckton of requests scrolling by from the top 10 stats on the home page, all from one IP address. That address corresponded with Trainer Red's account; I IP banned the account for now (since it was sucking up all processing power - and in all likelihood that was enough to bring the site down yesterday), and sent an email to Trainer Red.

I think she has the TLS front page open on her browser all the time, possibly in the background, and a browser addon or something that somehow spams the reload button infinitely, or something that sets the auto-refresh to 0, or something to that effect. I've edited her account to disable the stats, too, but unless she refreshes the home page that probably won't have any effect.

Does anyone have any means of contacting her beyond the e-mail address in her profile? I'd rather not keep her banned like this, :monster:. It's the type of ban that makes TLS look like it doesn't actually exist on the interwebz. Very effective, but also not very helpful.
 

Octo

KULT OF KERMITU
AKA
Octo, Octorawk, Clarky Cat, Kissmammal2000
Jesus. I have TLS open in the background quite a lot too. I don't think I've got any contact details for her, but presumably she'll see the email? Will she have to start a new account or something?

It's the type of ban that makes TLS look like it doesn't actually exist on the interwebz.

Is this what was used for username? And if not...why not? :monster:
 

Hisako

消えないひさ&#
AKA
Satsu, BRIAN BLESSED, MIGHTY AND WISE Junpei Iori: Ace Detective, Maccaffrickstonson von Lichtenstafford Frabenschnaben, Polite Krogan, Robert Baratheon
Wow that is hilarious and weird and WHAT THE FUCK RED

<3
 

Tennyo

Higher Further Faster
Closing tabs saves lives. :monster:

This is hilarious, though. But I hope she sees the email and isn't all like, "IP banned? Screw you guys!" and then never comes back. :(
 

Kai Schulen

... ... ...▼
AKA
Trainer Red
Oh shit, my bad. Sometimes I fall asleep at my desk and (I guess I leave all the tabs open...) well, aside from that I have no real excuse.

My bad.
 
Last edited:

Cabaret

Donator
I'll have to:
* Give more people access (and instructions) to restart services. I'm a bit hesitant with that because it's easy to destroy the server with a single command, which they would also be authorized to do. But hey, risk ftw.
* Give moar people my phone #, in case of stuff like this. SSHing into the server at 2 AM to restart stuff shouldn't be too bad a problem. I'd probably sleep right through text messages though.
* Set up some kind of monitoring system so I get an automagic text message whenever something goes down. IDK how reliable those are though.

You would only need to give one person on a different timezone the access and a cyanide pill so if they did destroy it they could do the honourable thing.

I think having you on call 24/7 for this kind of thing is unreasonable, but then I guess it only happens once in a blue moon. You should ask for a pay rise though if you're gonna go down that road.

TBH I saw the site was down, thought I should tell Yopy, then nah he'll be asleep, it can wait til the morning. Then Fangu came on and I deferred to her greater wisdom in these matters.

I think what stopped me was explicit instruction from Yopy not to wake him up in the middle of the night with crazy talk - to be fair I think that this condition of our steamy passions is really just for me, as I probably would and on a regular basis cos he sleeps too much and I got bollocks to ramble about! :monster:
 

Cthulhu

Administrator
AKA
Yop
Pff, you know the honorable thing is to stick a knife in one's gut and drag horizontally. Vertically, and/or multiple times if they were really badass, :monster:.

I've actually been on call in case something was up about work about two and a half years ago (and IIRC I got a little bonus for that, too). Of course, it was completely ineffective because I slept through the text messages I got from the monitoring software, and even if I didn't I ignored it because usually it was a false alarm anyway, :monster:.

ahem, anyway, Red, thanks <3. We haven't had any issues since you woke up and probably refreshed the page / disabled the top 10 stats / turned off those addons.

I had a look at the Javascript code used to reload the stats, what I saw didn't look broken at all; it instructed the browser to reload the stats exactly once every 10 minutes, hardcoded values and shit. All I can think of is that one of your plugins or addons or whatever was programmed poorly and apparently caused that timeout to be reduced to 0, or the requests to just repeat indefinitely at the maximum speed (several times a second). IDK though. Anyway, seems to be fix0red now, :monster:
 

Octo

KULT OF KERMITU
AKA
Octo, Octorawk, Clarky Cat, Kissmammal2000
I had a look at the Javascript code used to reload the stats, what I saw didn't look broken at all; it instructed the browser to reload the stats exactly once every 10 minutes, hardcoded values and shit. All I can think of is that one of your plugins or addons or whatever was programmed poorly and apparently caused that timeout to be reduced to 0, or the requests to just repeat indefinitely at the maximum speed (several times a second). IDK though. Anyway, seems to be fix0red now, I reversed the polarity of the neutron flow.

Fixed :monster:
 
Top Bottom