TimRiker | hmm. cpu spike is visible now.... happens every day, eats up all the ram on the box. | 03:31 |
---|---|---|
TimRiker | added timeout script with memory limit. still not sure what causes the cpu/mem to spike. | 04:07 |
DocScrutinizer05 | hi tim! | 04:50 |
DocScrutinizer05 | I just thought I'm dreaming, my keep-alive script made infobot reply "OK" to "noreplyplease" :-) | 04:51 |
DocScrutinizer05 | TimRiker: I have an estimated 30 logs of infobot ssh restart | 04:52 |
DocScrutinizer05 | some weird stuff, though not much useful. Most informative: it seems the infobot process doesn't get restarted, it keeps same process number | 04:53 |
DocScrutinizer05 | ~factinfo noreplyplease | 04:53 |
infobot | noreplyplease -- created by TimRiker <~TimRiker@bzflag/projectlead/TimRiker> 11m 54s ago; it has been requested 7 times, last by DocScrutinizer05, 27s ago. | 04:53 |
DocScrutinizer05 | hmmm, what's the purpose? | 04:54 |
DocScrutinizer05 | I gonna stop the /ping-ibbot script for now, until it's clear what's the deal with this ok reply | 04:55 |
TimRiker | DocScrutinizer05, your query makes the bot query four other infobot instances every time. avoiding that overhead. | 04:55 |
DocScrutinizer05 | oh, ok | 04:55 |
DocScrutinizer05 | hmmm, so I have to live with 3 always highhlighted query tabs | 04:56 |
TimRiker | check out the mrtg logs here: https://rikers.org/mrtg/ every day the box eats all 8G of ram on the server. | 04:57 |
DocScrutinizer05 | yeag, the ibot process eats ram, that's also obvious from my logs here | 04:57 |
TimRiker | I installed "timeout" and limited the ram. Not found the cause, but it should stop it from slowing down the whole machine every day. | 04:58 |
DocScrutinizer05 | I noticed the downtime moved ahead from locat time 2:19 to meanwhile 2:28 | 05:02 |
DocScrutinizer05 | well, once yesterday | 05:03 |
DocScrutinizer05 | I still glare at cron.daily and logrotate (and systemd who ate both) for that daily ibot madness | 05:04 |
DocScrutinizer05 | the convatenated logs I pm'ed you the pastebin URL are maybe most interestimn at very beginning and last 200somelines | 05:06 |
DocScrutinizer05 | seems the restart doesn'T do anything at all anyway | 05:07 |
DocScrutinizer05 | btw my ssh session to infobot hangs when I run ssh under timeout | 05:07 |
DocScrutinizer05 | weird | 05:08 |
TimRiker | ibot does not have a cron job. it uses the built in log rotation. which might be buggy. | 05:08 |
DocScrutinizer05 | quite possible | 05:09 |
DocScrutinizer05 | during the last weeks I've seen the bot come back within minutes after 2:19 "regular" offline | 05:09 |
DocScrutinizer05 | two or three times. usually it takes 90 minutes though | 05:10 |
DocScrutinizer05 | the restart job anyway doesn't restart infobot. there's some bug in it | 05:12 |
DocScrutinizer05 | maybe it only restarts some wrapper, or whatever | 05:12 |
TimRiker | I don't remember what calls the log to html code... looking. | 05:13 |
TimRiker | ah. processLog() calls scripts/processlog | 05:16 |
DocScrutinizer05 | I mean syslogs. There once been some /etc/cron.daily/logrotate which became obsolete with advent of systemd | 05:19 |
DocScrutinizer05 | no idea which "smart" automatism systemd introduced instead | 05:20 |
DocScrutinizer05 | quite easily could conflict with internal logrotates. Or got messed up during migration sysvinit->systemd | 05:21 |
DocScrutinizer05 | also see /etc/logrotate.d/* | 05:22 |
DocScrutinizer05 | or /etc/logrotate.d/*apt* | 05:22 |
DocScrutinizer05 | ibot? | 05:22 |
TimRiker | ibot. yeah, no entries in logrotate.d either. there's just the loopibot script that calls/restart infobot, and infobot calls processlogs when the date changes and it closes and opens a log file. | 05:24 |
TimRiker | no cron, logrotate, etc involved other that to start the loop in reboot. | 05:24 |
TimRiker | s/that/than/ | 05:24 |
infobot | TimRiker meant: no cron, logrotate, etc involved other than to start the loop in reboot. | 05:24 |
DocScrutinizer05 | you *could* use a `at` (or cron) batchjob timed to attach ibot process to strace at 00:17 UTC and detach at 00:30 maybe ;-) | 05:26 |
DocScrutinizer05 | somehow that sneaky critter that causes this must get shot | 05:27 |
DocScrutinizer05 | IOW there must be a timed trigger causing this | 05:27 |
TimRiker | yep. today's change was to use timeout. We'll see if that helps. I should get on when the logs rotate and watch what happens. | 05:29 |
DocScrutinizer05 | is there a NOOP command to ibot? | 05:30 |
DocScrutinizer05 | something I could (try to) send and not get any reply? | 05:31 |
DocScrutinizer05 | HMMM a (IRC) /ping maybe? | 05:34 |
TimRiker | what's the point in sending if you don't want a reply? How do you know if it's not listening? | 05:35 |
TimRiker | ~ping | 05:35 |
infobot | ~pong | 05:35 |
DocScrutinizer05 | IRC server tells me | 05:35 |
TimRiker | if you're depending on the irc server, can't you just add it to your notify list? | 05:36 |
DocScrutinizer05 | https://i.imgur.com/kBoZuo1.png | 05:36 |
DocScrutinizer05 | notify is not at all any reliable | 05:37 |
DocScrutinizer05 | [6 Dec 2018 05:35:22] [CTCP] Sending CTCP-PING request to infobot. | 05:37 |
DocScrutinizer05 | [6 Dec 2018 05:35:22] [CTCP] Received CTCP-1544070922 reply from infobot: . | 05:37 |
DocScrutinizer05 | might work, it will not open a query at least | 05:37 |
DocScrutinizer05 | but won't notice stale bot? | 05:38 |
TimRiker | looks like it does not reply to ctcp clientinfo | 05:41 |
DocScrutinizer05 | let me know when I should test the restart ssh | 05:41 |
TimRiker | whenever you like. is it not restarting? | 05:42 |
DocScrutinizer05 | nag, as I said above and shows in the logs I pmed you, the ibot process keeps same PID over several restart runs | 05:42 |
DocScrutinizer05 | nah* | 05:43 |
DocScrutinizer05 | ok I shoot | 05:43 |
TimRiker | it does a killall infobot which should work. | 05:43 |
TimRiker | remember that when you kill the bot, it tries to do a graceful shutdown and restart, so it's not instant unless something is broken. | 05:45 |
TimRiker | running killall infobot on the command line works almost immediately. | 05:46 |
DocScrutinizer05 | maybe not when bot hangs ;-) might need SIGKILL not SIGTERM | 05:47 |
TimRiker | ssh and running the script works fine. hmm. perhaps I should wait a second and then do a -9 ? | 05:48 |
DocScrutinizer05 | actually I'm pretty sure a plain killall (without brute force signal) won't help when process in swap hell or the like | 05:48 |
DocScrutinizer05 | make that 10 s | 05:48 |
TimRiker | the loop sleeps 10 seconds, so it has to be less than that or we'll kill it twice. | 05:49 |
DocScrutinizer05 | aaah wait | 05:49 |
TimRiker | setting it to 3 seconds. it should be able to shut down that quickly. | 05:49 |
DocScrutinizer05 | it restarts immediately when killed? | 05:49 |
TimRiker | it waits 10 seconds and then restarts. | 05:49 |
DocScrutinizer05 | sorry, I fail to envision it. What I know is: sometimes it takes quite a while to flush all buffers etc, before a process ends gracefully on plain kill | 05:51 |
DocScrutinizer05 | kill infobot; sleep 10; kill -s0 infobot&&kill -s9 infobot | 05:56 |
DocScrutinizer05 | ooh, doesn't work with killall | 05:57 |
DocScrutinizer05 | killall knows --older-than though ;-) | 05:58 |
DocScrutinizer05 | killall infobot; sleep 20; killall -s9 --older-than 30s infobot | 05:59 |
DocScrutinizer05 | gotta say thank you to Werner Almesberger for it, once more when I meet him next time | 06:01 |
DocScrutinizer05 | TimRiker: I'm puzzled by the ~/timeout/timeout -m 500000 | 06:06 |
DocScrutinizer05 | what does it do? | 06:07 |
DocScrutinizer05 | oooh -m emeory? | 06:07 |
TimRiker | https://github.com/pshved/timeout | 06:08 |
* DocScrutinizer05 clearly needs a smarter timeout | 06:08 | |
DocScrutinizer05 | we'll see | 06:14 |
DocScrutinizer05 | 500MB | 06:14 |
DocScrutinizer05 | haha yeah! >>Sometimes waitpid call from inside SIGALRM handler returns -1 for no apparent reason. This return value is ignored, and the appropriate warning is printed, but the cause of such a behavior is still unknown.<< | 06:20 |
DocScrutinizer05 | ~!status | 06:43 |
DocScrutinizer05 | ~+status | 06:43 |
infobot | Since Thu Dec 6 04:47:41 2018, there have been 0 modifications, 0 questions, 0 dunnos, 0 morons and 0 commands. I have been awake for 56m 7s this session, and currently reference 119780 factoids. I'm using about 80496 kB of memory. With 0 active forks. Process time user/system 2.81/0.18 child 0/0 | 06:43 |
DocScrutinizer05 | ~+uptime | 06:43 |
infobot | - Uptime for infobot - | 06:43 |
infobot | Now: 56m 18s running infobot 1.5.4 (SVN) -- linux | 06:43 |
infobot | 1: 59d 8h 41m 19s running infobot 1.5.4 (SVN) -- linux, ended Sun Nov 14 18:39:57 2010 | 06:44 |
infobot | 2: 57d 3h 9m 23s running infobot 1.5.4 (SVN) -- linux, ended Fri Jun 26 20:39:27 2009 | 06:44 |
infobot | 3: 53d 8h 24m 4s running infobot 1.5.4 (SVN) -- linux, ended Tue Nov 7 05:34:20 2017 | 06:44 |
Generated by irclog2html.py 2.17.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!