Trouble shooting LSAPI, limiting children

#1
Hello,

I'm having problem with a rails application with LSWS. The applications runs fine with webrick but running with lsws I get a bunch of 503 and 500 errors and I can see that raisrunner childer dies.

To be able to debug I try to limit the number of railsrunners but even after setting max connections to 1 and LSAP_CHLIDREN=1 I still get between 1 and 4 railsrunners. I use the virtual host templates. Does that have anything to do with it?

Maybe I change the wrong values? The values I change now is under Server -> Ruby On Rails

Cheers,
Henke
 
#2
OK, I think I managed to limit the number of railsrunners by also setting max connection = 1 under the VirtualTemplates -> Context -> Rails tab. But the railsrunner still dies and complains about Broken PIPE. I also see this error.
2008-09-26 15:51:36.958 [WARN] [10.168.1.130:56850-0#safecube] LSAPI Packet header is invalid,('','','P','r','o','c','e','s')

I tried to strace the child but what happens is that I get a 500 error then when I try to reload och click another page a new railsrunner starts and the current railsrunner which I strace gets a broken pipe. Any good clues on how to make a even better trouble shooting?

Cheers,
Henk
 

mistwang

LiteSpeed Staff
#3
Looks like some part of your rails application write to STDOUT (file descriptor 1) directly via low level I/O functions, it breaks LSAPI protocol.

Since all ruby processes are fork off the parent one, you just do

strace -f -p <pid_of_parent>

So, you can check the first request.
 
#4
Hmm interesting with the STDOUT problem. I do have some backend scripts that issue a IO.popen but that haven't messed up anything before. Btw here is the strace.

henke@cubiq-test:/application# ps ax | grep Rails
8888 ? SNs 0:03 ruby /usr/local/lsws/fcgi-bin/RailsRunner.rb
8895 ? SN 0:00 ruby /usr/local/lsws/fcgi-bin/RailsRunner.rb
8960 pts/5 S+ 0:00 grep Rails
henke@cubiq-test:/application# strace -f -p 8888
Process 8888 attached - interrupt to quit
select(1, [0], NULL, NULL, {0, 420000}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = 1 (in [0], left {0, 190000})
nanosleep({0, 10000}, NULL) = 0
select(1, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1222463333, 938653}, NULL) = 0
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 6093
select(1, [0], NULL, NULL, {1, 0}) = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
select(1, [0], NULL, NULL, {0, 880000}) = 0 (Timeout)
rt_sigaction(SIGUSR1, {0x46f9b0, [], SA_RESTORER, 0x7fad91c0c100}, NULL, 8) = 0
getpgrp() = 8888
kill(4294958408, SIGUSR1) = 0
--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
rt_sigreturn(0xa) = 0
gettimeofday({1222463357, 948067}, NULL) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x7fad890d3ed0, ~[TRAP FPE KILL USR1 PIPE TERM STKFLT CHLD CONT STOP TSTP TTIN XCPU XFSZ PROF WINCH PWR SYS RT_2 RT_5 RT_7 RT_16 RT_17 RT_18 RT_19 RT_20 RT_21 RT_22 RT_23 RT_24 RT_25 RT_26 RT_27 RT_28 RT_29 RT_30 RT_31], SA_RESTORER, 0x7fad91c0c100}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff9a813fe0) = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGUSR1}], WNOHANG|WSTOPPED, NULL) = 8895
wait4(-1, 0x7fff9a8139dc, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff) = -1 EINTR (Interrupted system call)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0x7fad890d3c10, ~[TRAP FPE KILL USR1 PIPE TERM STKFLT CHLD CONT STOP TSTP TTIN XCPU XFSZ PROF WINCH PWR SYS RT_2 RT_5 RT_7 RT_16 RT_17 RT_18 RT_19 RT_20 RT_21 RT_22 RT_23 RT_24 RT_25 RT_26 RT_27 RT_28 RT_29 RT_30 RT_31], SA_RESTORER, 0x7fad91c0c100}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
sendto(4, "X\0\0\0\4", 5, 0, NULL, 0) = -1 EPIPE (Broken pipe)
rt_sigpending([PIPE]) = 0
rt_sigtimedwait([PIPE], 0, 0, 8) = 13
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
close(4) = 0
rt_sigaction(SIGUSR1, {SIG_DFL}, {0x46f9b0, [], SA_RESTORER, 0x7fad91c0c100}, 8) = 0
kill(8888, SIGUSR1) = 0
--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
Process 8888 detached

I also see the following in stderr.log
WARNING: there is already a transaction in progress

(originally Broken pipe - /application/log/development.log)
WARNING: there is already a transaction in progress

(originally Broken pipe - /application/log/development.log)

(originally Broken pipe - /application/log/development.log)


Can you see anything from this?

Cheers,
Henke
 
Last edited:

mistwang

LiteSpeed Staff
#5
The output is not useful as it does not show the first request being processed by it.

You need to set "Max connections" to 2, so a parent RailsRunner.rb process will be started, then you strace that process.
 
#6
Hi again, I did a new strace with the method you suggesten and got more data.

Before sending it in I decided to make a fresh rails app to limit the amount of code. I then noticed that my app worked as expected. After looking at what I made different I noticed that I could solve the problem by replacing all my require_dependency with regular require. Have anyone else seen this? This app is kinda special as it uses C functions callable from ruby through Swig.

If you still want the strace or anything else let me know.

Cheers
 
#7
Looks like I spoke to soon. My application crashes with LSWS even if I replaced the require_dependency statements. Here is a snippet from a strace on a RailsRunners that bailes. I have no clue what it can be. Have never had this problem before and works in webrick.

UPDATED: Added complete strace of child here -> http://www.flowsolutions.se/strace.log

Cheers,
Henke

Code:
     0.000116 close(3)                  = 0
     0.000112 open("/application/app/models/page_backup.rb", O_RDONLY) = 3
     0.000125 fstat(3, {st_mode=S_IFREG|0664, st_size=166, ...}) = 0
     0.000131 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0af55ba000
     0.000124 read(3, "class PageBackup < Page\n\tdef ini"..., 4096) = 166
     0.000219 read(3, "", 4096)         = 0
     0.000125 read(3, "", 4096)         = 0
     0.000230 read(3, "", 4096)         = 0
     0.000108 close(3)                  = 0
     0.000105 munmap(0x7f0af55ba000, 4096) = 0
     0.002405 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
     0.000158 sendto(5, "Q\0\0\0bSELECT id, data FROM tbl_se"..., 99, 0, NULL, 0) = 99
     0.000895 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000148 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
     0.000116 recvfrom(5, "T\0\0\0002\0\2id\0\0\0A&\0\1\0\0\0\27\0\4\377\377\377\377"..., 16384, 0, NULL, NULL) = 14234
     0.000687 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000330 stat("/application/app/controllers/fileview.rb", 0x7ffffd6d6b80) = -1 ENOENT (No such file or directory)
     0.000158 stat("/application/app/fileview.rb", 0x7ffffd6d6b80) = -1 ENOENT (No such file or directory)
     0.000124 stat("/application/app/models/fileview.rb", {st_mode=S_IFREG|0664, st_size=553, ...}) = 0
     0.002505 open("/application/app/models/fileview.rb", O_RDONLY) = 3
     0.000263 close(3)                  = 0
     0.000114 open("/application/app/models/fileview.rb", O_RDONLY) = 3
     0.000115 close(3)                  = 0
     0.000108 open("/application/app/models/fileview.rb", O_RDONLY) = 3
     0.000125 fstat(3, {st_mode=S_IFREG|0664, st_size=553, ...}) = 0
     0.000127 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0af55ba000
     0.000119 read(3, "class Fileview\n\tattr_reader()\n\ta"..., 4096) = 553
     0.000302 read(3, "", 4096)         = 0
     0.000110 close(3)                  = 0
     0.000104 munmap(0x7f0af55ba000, 4096) = 0
     0.002496 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
     0.000716 sendto(5, "Q\0\0\0bSELECT id, data FROM tbl_se"..., 99, 0, NULL, 0) = 99
     0.000747 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000137 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
     0.000116 recvfrom(5, "T\0\0\0002\0\2id\0\0\0A&\0\1\0\0\0\27\0\4\377\377\377\377"..., 16384, 0, NULL, NULL) = 14234
     0.000751 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000340 stat("/application/app/controllers/treelist_computer.rb", 0x7ffffd6d6b80) = -1 ENOENT (No such file or directory)
     0.000156 stat("/application/app/treelist_computer.rb", 0x7ffffd6d6b80) = -1 ENOENT (No such file or directory)
     0.000125 stat("/application/app/models/treelist_computer.rb", {st_mode=S_IFREG|0664, st_size=504, ...}) = 0
     0.002546 open("/application/app/models/treelist_computer.rb", O_RDONLY) = 3
     0.000195 close(3)                  = 0
     0.000114 open("/application/app/models/treelist_computer.rb", O_RDONLY) = 3
     0.000137 close(3)                  = 0
     0.000109 open("/application/app/models/treelist_computer.rb", O_RDONLY) = 3
     0.000125 fstat(3, {st_mode=S_IFREG|0664, st_size=504, ...}) = 0
     0.000126 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0af55ba000
     0.000119 read(3, "class Treelist_Computer < Treeli"..., 4096) = 504
     0.000272 read(3, "", 4096)         = 0
     0.000112 read(3, "", 4096)         = 0
     0.000108 read(3, "", 4096)         = 0
     0.000107 close(3)                  = 0
     0.000104 munmap(0x7f0af55ba000, 4096) = 0
     0.000374 stat("/application/app/controllers/treelist.rb", 0x7ffffd6d1710) = -1 ENOENT (No such file or directory)
     0.000142 stat("/application/app/treelist.rb", 0x7ffffd6d1710) = -1 ENOENT (No such file or directory)
     0.000167 stat("/application/app/models/treelist.rb", {st_mode=S_IFREG|0664, st_size=181, ...}) = 0
     0.002266 open("/application/app/models/treelist.rb", O_RDONLY) = 3
     0.000154 close(3)                  = 0
     0.000111 open("/application/app/models/treelist.rb", O_RDONLY) = 3
     0.000114 close(3)                  = 0
     0.000109 open("/application/app/models/treelist.rb", O_RDONLY) = 3
     0.000123 fstat(3, {st_mode=S_IFREG|0664, st_size=181, ...}) = 0
     0.000126 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0af55ba000
     0.000116 read(3, "class Treelist\n\tattr_reader(:id,"..., 4096) = 181
     0.000264 read(3, "", 4096)         = 0
     0.000109 read(3, "", 4096)         = 0
     0.000109 read(3, "", 4096)         = 0
     0.000107 close(3)                  = 0
     0.000104 munmap(0x7f0af55ba000, 4096) = 0
     0.004553 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
     0.000172 sendto(5, "Q\0\0\0bSELECT id, data FROM tbl_se"..., 99, 0, NULL, 0) = 99
     0.000697 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000116 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
     0.000234 recvfrom(5, "T\0\0\0002\0\2id\0\0\0A&\0\1\0\0\0\27\0\4\377\377\377\377"..., 16384, 0, NULL, NULL) = 14234
     0.001446 gettimeofday({1222467995, 424263}, NULL) = 0
     0.000155 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
     0.000158 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
     0.000180 write(1, "\n\nProcessing TestController#test"..., 86) = 86
     0.012896 write(1, "  Session ID: af0c6adf37cff1370f"..., 47) = -1 EPIPE (Broken pipe)
     0.000133 --- SIGPIPE (Broken pipe) @ 0 (0) ---
     0.000136 rt_sigreturn(0xd)         = -1 EPIPE (Broken pipe)
     0.000195 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.002134 gettimeofday({1222467995, 447377}, NULL) = 0
     0.007440 gettimeofday({1222467995, 447675}, NULL) = 0
     0.000144 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
     0.000126 sendto(5, "Q\0\0\1i            SELECT a.attnam"..., 362, 0, NULL, 0) = 362
     0.001324 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000230 poll([{fd=5, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
     0.000122 recvfrom(5, "T\0\0\0s\0\4attname\0\0\0\4\341\0\2\0\0\0\23\0@\377\377"..., 16384, 0, NULL, NULL) = 1124
     0.005926 gettimeofday({1222467995, 475936}, NULL) = 0
     0.020599 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
     0.000781 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
     0.000241 write(1, "\n\nProcessing TestController#test"..., 86) = -1 EPIPE (Broken pipe)
     0.000126 --- SIGPIPE (Broken pipe) @ 0 (0) ---
     0.000228 rt_sigreturn(0xd)         = -1 EPIPE (Broken pipe)
     0.000201 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000286 gettimeofday({1222467995, 478007}, NULL) = 0
     0.000124 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
     0.000138 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
     0.000204 write(1, "/!\\ FAILSAFE /!\\  Sat Sep 27 00:"..., 1988) = -1 EPIPE (Broken pipe)
     0.000136 --- SIGPIPE (Broken pipe) @ 0 (0) ---
     0.000101 rt_sigreturn(0xd)         = -1 EPIPE (Broken pipe)
     0.000148 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000124 writev(1, [{"LS\6\0V\0\0\0", 8}, {"Error during failsafe response: "..., 78}], 2) = -1 EPIPE (Broken pipe)
     0.000124 --- SIGPIPE (Broken pipe) @ 0 (0) ---
     0.000084 rt_sigreturn(0xd)         = -1 EPIPE (Broken pipe)
     0.000107 close(1)                  = 0
     0.000140 write(2, "\n", 1)         = 1
     0.000208 write(2, "(originally Broken pipe - /appli"..., 59) = 59
     0.000144 write(2, "\n", 1)         = 1
     0.002051 gettimeofday({1222467995, 481855}, NULL) = 0
     0.000136 select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
     0.996152 getppid()                 = 13163
     0.000121 select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
     0.999824 getppid()                 = 13163
     0.000077 select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
     0.999814 getppid()                 = 13163
     0.000139 select(1, [0], NULL, NULL, {1, 0}) = 0 (Timeout)
     1.000263 getppid()                 = 13163
     0.000121 select(1, [0], NULL, NULL, {1, 0}) = ? ERESTARTNOHAND (To be restarted)
     0.720046 --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
     0.000209 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000223 rt_sigaction(SIGINT, {SIG_DFL}, {0x46f9b0, [], SA_RESTORER, 0x7f0af4ad8100}, 8) = 0
     0.011335 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.037686 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
     0.040412 sendto(4, "X\0\0\0\4", 5, 0, NULL, 0) = 5
     0.006397 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000219 close(4)                  = 0
     0.001292 rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
     0.000269 sendto(5, "X\0\0\0\4", 5, 0, NULL, 0) = 5
     0.006348 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
     0.000148 close(5)                  = 0
     0.002396 rt_sigaction(SIGUSR1, {SIG_DFL}, {0x46f9b0, [], SA_RESTORER, 0x7f0af4ad8100}, 8) = 0
     0.000238 kill(13170, SIGUSR1)      = 0
     0.000133 --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
Process 13170 detached
 
Last edited:
#9
Hi and thanks for the reply,
I found the code, and it is the rails builtin function that writes to its logs.
I think it is some lower level library that crashes and breaks the open pipe to development.log which in turn breaks LSAPI protocol when the web server tries to write to the log the next time.

This is probably why I get these in stderr.log
(originally Broken pipe - /application/log/development.log)
Thanks for the info why LSWS crashes. Now I only need to find where the PIPE really breaks.

Cheers,
Henke
 
Top