[Resolved] Can't access LiteSpeed Control or Server under 1000 req/Sec

JoseDieguez · May 12, 2015

Hello.

Have a server with some high traffic sites, but the server as it's sites are slow and giving error connection timed put after 5-10 seconds of not loading when high req/Sec .

Can't or real slow sites, and can't access LiteSpeed control area / real time stats, because it's slow or gives connection timed out.

Server it's a E5 1650 V3 with 16GB Ram, on a 1gbps. with SSD Drives.

Here is litespeed with top, it was difficult to get this screenshot.

Code:

http://i.imgur.com/iGGf7u7.png

is req/sec too high that litespeed can't handle it?, the WaitQ is always 0.

Need any advice.. thanks.

Pong · May 12, 2015

Can you do real stats with one screenshot and top command with another one?

Some information has been covered by each other.

Any error message in the log file?

JoseDieguez · May 12, 2015

The info covered is not important, as is always 0.

Code:

http://i.imgur.com/wR69nkV.jpg

At the moment there is no issue with 200-300 req/sec

Will check the logs at litespeed, but i have never been good at that.

Pong · May 12, 2015

When the problem happens next time, please capture two full screenshot again. Check the error log for the problem period of time.

mistwang · May 12, 2015

Most likely your server was hitting by ip_conntrack limit, do "dmesg" see if you get anything related to that limit. Google on how to increase that limit.

JoseDieguez · May 13, 2015

mistwang said:

Hi, thanks for the answer.

i did "dmesg > boot_messages" so i can check it, and the messages that called my attention were:

Code:

===
PROTO=ICMP TYPE=8 CODE=0 ID=46217 SEQ=33434
[3266489.393188] __ratelimit: 274 callbacks suppressed
[3266489.393190] TCP: time wait bucket table overflow (CT0)
[3266489.393256] TCP: time wait bucket table overflow (CT0)
[3266489.393289] TCP: time wait bucket table overflow (CT0)
[3266489.393299] TCP: time wait bucket table overflow (CT0)
[3266489.393346] TCP: time wait bucket table overflow (CT0)
[3266489.393353] TCP: time wait bucket table overflow (CT0)
[3266489.393359] TCP: time wait bucket table overflow (CT0)
[3266489.393365] TCP: time wait bucket table overflow (CT0)
[3266489.393371] TCP: time wait bucket table overflow (CT0)
[3266489.393377] TCP: time wait bucket table overflow (CT0)

===
*A lot of these*
[3266520.541135] __ratelimit: 957 callbacks suppressed
[3266520.541138] TCP: time wait bucket table overflow (CT0)
[3266520.541185] TCP: time wait bucket table overflow (CT0)
[3266520.541193] TCP: time wait bucket table overflow (CT0)
[3266520.551806] TCP: time wait bucket table overflow (CT0)
[3266520.557488] TCP: time wait bucket table overflow (CT0)
[3266520.570828] TCP: time wait bucket table overflow (CT0)
[3266520.581495] TCP: time wait bucket table overflow (CT0)
[3266520.608441] TCP: time wait bucket table overflow (CT0)
[3266520.617335] TCP: time wait bucket table overflow (CT0)

===

[3267634.585800] __ratelimit: 2236 callbacks suppressed
[3267634.585802] nf_conntrack: table full, dropping packet.
[3267634.586571] nf_conntrack: table full, dropping packet.
[3267634.589506] nf_conntrack: table full, dropping packet.
[3267634.591634] nf_conntrack: table full, dropping packet.
[3267634.594419] nf_conntrack: table full, dropping packet.
[3267634.601298] nf_conntrack: table full, dropping packet.
[3267634.604193] nf_conntrack: table full, dropping packet.
[3267634.626663] nf_conntrack: table full, dropping packet.
[3267634.627424] nf_conntrack: table full, dropping packet.
[3267634.628611] nf_conntrack: table full, dropping packet.

===
[3267649.617171] __ratelimit: 1790 callbacks suppressed
[3267649.617173] nf_conntrack: table full, dropping packet.
[3267649.620159] nf_conntrack: table full, dropping packet.
[3267649.624240] nf_conntrack: table full, dropping packet.
[3267649.633670] nf_conntrack: table full, dropping packet.
[3267649.633832] nf_conntrack: table full, dropping packet.
[3267649.634564] nf_conntrack: table full, dropping packet.
[3267649.634584] nf_conntrack: table full, dropping packet.
[3267649.636567] nf_conntrack: table full, dropping packet.
[3267649.638342] nf_conntrack: table full, dropping packet.
[3267649.639731] nf_conntrack: table full, dropping packet.

===
[3267674.675032] __ratelimit: 3279 callbacks suppressed
[3267674.675034] TCP: time wait bucket table overflow (CT0)
[3267674.675845] nf_conntrack: table full, dropping packet.
[3267674.677296] nf_conntrack: table full, dropping packet.
[3267674.678506] nf_conntrack: table full, dropping packet.
[3267674.679211] nf_conntrack: table full, dropping packet.
[3267674.682280] nf_conntrack: table full, dropping packet.
[3267674.683075] nf_conntrack: table full, dropping packet.
[3267674.683265] nf_conntrack: table full, dropping packet.
[3267674.684271] nf_conntrack: table full, dropping packet.

===

[3267769.940163] __ratelimit: 5044 callbacks suppressed
[3267769.940165] nf_conntrack: table full, dropping packet.
[3267769.940235] nf_conntrack: table full, dropping packet.
[3267769.940241] nf_conntrack: table full, dropping packet.
[3267769.942057] nf_conntrack: table full, dropping packet.
[3267769.942685] nf_conntrack: table full, dropping packet.
[3267769.943113] nf_conntrack: table full, dropping packet.
[3267769.944260] nf_conntrack: table full, dropping packet.
[3267769.946035] nf_conntrack: table full, dropping packet.
[3267769.946058] nf_conntrack: table full, dropping packet.
[3267769.946081] nf_conntrack: table full, dropping packet.

===

JoseDieguez · May 13, 2015

Sorry for double post.

There is a lot of those, i see that there is a nf_conntrack, would that be same as ip_conntrack?

will search about it.

JoseDieguez · May 13, 2015

Monitoring LiteSpeed real time stats,

The same account that reached the 700-1000 req/sec (now using cloudflare to see if that helps). it had 16 on WaitQ, but only like for a sec, And with low req/sec

Code:

http://i.imgur.com/1aDA0kV.png

Also there is one account with high req in processing, but sites are fast enough.

NiteWave · May 13, 2015

for nf_conntrack / ip_conntrack issue, there is a wiki to address it:
https://www.litespeedtech.com/support/wiki/doku.php/litespeed_wiki:config:nf-conntrack-table-full

JoseDieguez · May 13, 2015

I saw that the Max value was something like 65.500 and the count (at this moment with 150-200 req/sec) was about 48.000

Followed your wiki, now the max is
655360

I really hope this helps, i am using LiteSpeed in 3 servers, and until now it has been a nice experience...

JoseDieguez · May 14, 2015

As an update.

Today we had around 35-40 thousand "users online" according to Google Analytics and among.us .

and the server was with big "server average load", but the sites were fast, so it's perfectly fine.

The only thing that i still don't get, is, that for example, at the moment of those big spikes, server load is "6-8", with about 700-1200 req/sec.

Now with less than 200 req/sec, CPU usage from 20-40 %, the Server Load is "5-6" it's like server is still with high server load, even tough there is no aparently reason.

I would love to reduce the server load, specially when there is not high traffic.

NiteWave · May 14, 2015

JoseDieguez said:

this number looks high, surely high traffic.

check with "top -c", which processes have the high load, lsphp5 or litespeed process ?

JoseDieguez · May 14, 2015

NiteWave said:

I have a doubt, "lsphp5" is th same as "lsphp"?

Since i only see "lsphp" without the 5...

Well, At this precise moment, with load of 5, 300 req/sec and 73 in processing (according to litespeed real time stats) and WaitQ 0.

Here is top-c :

Code:

http://i.imgur.com/52riBrK.png

Btw, the high traffic only last less than one hour probably, at the moment, it may be 5 or 6 thousand online, according to among.us, with 270-300 req/sec between all sites.

NiteWave · May 14, 2015

load 5-6 is not high if you've 8/16/32 cores.

by the "top -c", lsphp (so are you using cloudlinx's php selector ? if not usually it's lsphp5 process) consumed most CPU. no good way to lower the CPU usage unless you enable XCache or Zend OPcache in ProcessGroup mode
more info please refer:
http://www.litespeedtech.com/support/wiki/doku.php?id=litespeed_wiki:php:which_php_setup_am_i_using

you can enable ProcessGroup mode + OPcache for busy account only
while have other account running under nornal Worker mode.

JoseDieguez · May 14, 2015

NiteWave said:

Thanks for the reference, that's a big KB i haven't read. it seems i will have a long night.

As i use PHP Selector, i am using
suEXEC Worker mode

From what i have read, the process group would give better performance, but at a cost of memory.. So, or i uninstall php selector or i change to suexec process group.

Don't you have any service to help changin permanently from one suexec to another? that would be great, since i am pretty sure opcode cache would help

NiteWave · May 14, 2015

as the wiki say, among 3 suEXEC modes, Worker and ProcessGroup mode is able to work with CloudLinux's PHP Selector. but the 3rd mode -- Daemon mode can't work with PHP Selector.

default is Worker mode for all users. the end user can't select ProcessGroup mode. since it's in httpd.conf.
the only different setting (in httpd.conf) is (for example) :
<IfModule LiteSpeed>
LSPHP_ProcessGroup on
LSPHP_Workers 15
</IfModule>
other settings are same. end users should not be aware in Worker mode or ProcessGroup mode.

so you only have the high traffic account run in ProcessGroup mode, and ensure to include the opcache module in the account's PHP selector.

JoseDieguez · May 15, 2015

Just to be sure.

processGroup Mode, the only "negative" is that will consume more memory... but if i have enough memory, there won't be any issue.

As i am using PHP 5.4 will be using xCache (already installed), still trying to think a good configuration for it, there is poor documentation about it specially for shared hosting.

if you have any consideration on the xcache config would be much appreciated. I already configured 4 accounts (trough vhost) to use processgroup mode, if everything goes well, i will make the change at server load, so everyone use processgroup mode and therefor xcache opcode cache.

Thanks.

NiteWave · May 15, 2015

JoseDieguez said:

yes.

regarding xcache configuration:
http://xcache.lighttpd.net/wiki/XcacheIni
I think the most important one is xcache.size.
each account will assign that size memory. if you set it to 256M for example, when there are 100 websites are accessed at a time, 100 x 256M ~ 25G memory are allocated for xcache only.(of course if the website has been idle for a while, the lsphp processes for this account will be killed and memory is released)

JoseDieguez · May 19, 2015

Thanks NiteWave,

I have zero experience with opcode cache, so will be a fun game getting the correct parameters.

I think i have everything correct, to use processgroup mode, and xcache enabled on every user on the server.

The server has some big amount for its ram (i think 241 domains for 16g) but about 95% have from 0 to 100 views per day, so the majority are not resource hungry.

At less for now, it seems it has been working properly, i just need to re-check if everything is correctly configured at LS, and tune the xcache parameters.

Here is a screenshot

Code:

http://i.gyazo.com/57a627bf819b35e9202e3d0eb0b49de5.png

I did all the changes right in the middle between days 14th and 15th, so it's noticeable the cpu load it has been lower.

And today we had 29 thousand visitors online, everything working as suppose to be, once again LiteSpeed showing why is better

(i have very well tuned apache, and when i change to apache, the server load remains, maybe 10-20% higher, but the ram usage goes to the sky with and without opcode cache).

Will propably post again during the week, just to make sure my config at litespeed is correct (i have to replicate the changes to the other servers).

Thanks for the support.

NiteWave · May 19, 2015

per the screenshot, 15th -- 18th week, the load dropped to roughly 50%
this is good example for ProcessGroup + XCache vs Worker without opcache. load drops about 50%
and compared with apache, consumed much less memory.

Thanks for the sharing.

[Resolved] Can't access LiteSpeed Control or Server under 1000 req/Sec

Member

Administrator

Member

Administrator

LiteSpeed Staff

Member

Member

Member

Administrator

Member

Member

Administrator

Member

Administrator

Member

Administrator

Member

Administrator

Member

Administrator