产生worker timeout 的背景
while 1:
.....
time.sleep(1)
gunicorn运行起来,只等待了30s,就卡住了,没报任何异常或err,查了gunicorn 官方文档,原来是线程默认等待30s 就kill 掉,再restart
http://docs.gunicorn.org/en/stable/settings.html
timeout
-t INT, --timeout INT
30
Workers silent for more than this many seconds are killed and restarted.
Generally set to thirty seconds. Only set this noticeably higher if you’re sure of the repercussions for sync workers. For the non sync workers it just means that the worker process is still communicating and is not tied to the length of time required to handle a single request.
根本原因找到了,在gunicorn启动加了--timeout 120 ,还是超过30s 就worker timeout.搜了一圈stack没发现好的解决方法。
解决这个问题,目前最好的方法,就是在程序改代码,原先是主线程调用,用threading包装一下
如:
import threading
t = threading.Thread(name = '', target = func ,kwargs{})
t.daemon = True
t.start()
t = threading.Thread(name='result_package', target=result_package, args=(pack_name, task, issue))
t.daemon = True t.start()
这样就在主线程下,把方法包装起来。
顺便用
Event().wait(15) 替代 time.sleep(16)
这样写法的好处是不占用cpu,释放!
刚开始,分析原因花了不少时间,几行代码就把worker timeout解决了。之前试了map.thread不行。
准备用队列(celery+redis)替代原来的逻辑,只是工作量有点大,太重了。