|
Monday, 20 April 2009 20:49 |
|
We have a high traffic site that is getting a lot of hits per day. Its a very dynamic site which has a lot of overhead on the system.
We have several servers running the site (web servers, storage servers, DB servers) and all is setup ok, but at times the site crashes for no apparent reason.
We've setup a lot of tracking and analysis but still cannot understand why the site mysteriously locks up. Basically it seems that the web servers run out of RAM, then it starts swapping. When it starts swapping, the server crashes which redirects the load to the remaining webservers (hardware loadbalance). This puts more load on the others and then these other ones crash...creating a snowball effect.
We have many servers with lots of ram...before we go out and stuff them with even more ram, I want someone that is very skilled in system administration to be able to look at our system and understand exactly whats going on, understand why the site is crashing (the true exact reason), and what the solution is. Also need to optimize where possible (near-future plans include installing liteapache for service images...things like this)
We are running on Linux servers (some RHEL5 and others Fedora)
All are AMD64
Apache 2.2.X
MySQL
PHP5
This is mainly a bid for consultation services. We just need to know what is going on and I cannot afford to have the main people in the company waste any more time trying to figure out the problems themselves.
|