nodemcu-firmware/components/task/Kconfig

23 lines
962 B
Plaintext
Raw Permalink Normal View History

Fix net module data loss & RTOS task unsafety (#2829) To avoid races between the lwIP callbacks (lwIP RTOS task) and the Lua handlers (LVM RTOS task), the data flow and ownership has been simplified and cleaned up. lwIP callbacks now have no visibility of the userdata struct. They are limited to creating small event objects and task_post()ing them over to the LVM "thread", passing ownership in doing so. The shared identifier then becomes the struct netconn*. On the LVM side, we keep a linked list of active userdata objects. This allows us to retrieve the correct userdata when we get an event with a netconn pointer. Because this list is only ever used within the LVM task, no locking is necessary. The old approach of stashing a userdata pointer into the 'socket' field on the netconn has been removed entirely, as this was both not thread/RTOS-task safe, and also interfered with the IDFs internal use of the socket field (even when using only the netconn layer). As an added benefit, this removed the need for all the SYS_ARCH_PROTECT() locking stuff. The need to track receive events before the corresponding userdata object has been established has been removed by virtue of not reordering the "accept" and the "recv" events any more (previously accepts were posted with medium priority, while the receives where high priority, leading to the observed reordering and associated headaches). The workaround for IDF issue 784 has been removed as it is now not needed and is in fact directly harmful as it results in a double-free. Yay for getting rid of old workarounds! DNS resolution code paths were merged for the two instances of "socket" initiated resolves (connect/dns functions). Also fixed an instance of using a stack variable for receiving the resolved IP address, with said variable going out of scope before the DNS resolution necessarily completed (hello, memory corruption!). Where possible, moved to use the Lua allocator rather than plain malloc. Finally, the NodeMCU task posting mechanism got a polish and an adjustment. Given all the Bad(tm) that tends to happen if something fails task posting, I went through a couple of iterations on how to avoid that. Alas, the preferred solution of blocking non-LVM RTOS tasks until a slot is free turned out to not be viable, as this easily resulted in deadlocks with the lwIP stack. After much deliberation I settled on increasing the number of available queue slots for the task_post() mechanism, but in the interest of user control also now made it user configurable via Kconfig.
2019-07-14 23:20:20 +02:00
menu "NodeMCU task slot configuration"
config NODEMCU_TASK_SLOT_MEMORY
int "Task slot buffer size"
default 2000
range 80 16000
help
NodeMCU uses a fixed size RTOS queue for messaging between internal
LVM tasks as well as from other RTOS tasks. If this queue is too
small, events and data will go missing. On the other hand, if the
queue is too big, some memory will go unused.
Fix net module data loss & RTOS task unsafety (#2829) To avoid races between the lwIP callbacks (lwIP RTOS task) and the Lua handlers (LVM RTOS task), the data flow and ownership has been simplified and cleaned up. lwIP callbacks now have no visibility of the userdata struct. They are limited to creating small event objects and task_post()ing them over to the LVM "thread", passing ownership in doing so. The shared identifier then becomes the struct netconn*. On the LVM side, we keep a linked list of active userdata objects. This allows us to retrieve the correct userdata when we get an event with a netconn pointer. Because this list is only ever used within the LVM task, no locking is necessary. The old approach of stashing a userdata pointer into the 'socket' field on the netconn has been removed entirely, as this was both not thread/RTOS-task safe, and also interfered with the IDFs internal use of the socket field (even when using only the netconn layer). As an added benefit, this removed the need for all the SYS_ARCH_PROTECT() locking stuff. The need to track receive events before the corresponding userdata object has been established has been removed by virtue of not reordering the "accept" and the "recv" events any more (previously accepts were posted with medium priority, while the receives where high priority, leading to the observed reordering and associated headaches). The workaround for IDF issue 784 has been removed as it is now not needed and is in fact directly harmful as it results in a double-free. Yay for getting rid of old workarounds! DNS resolution code paths were merged for the two instances of "socket" initiated resolves (connect/dns functions). Also fixed an instance of using a stack variable for receiving the resolved IP address, with said variable going out of scope before the DNS resolution necessarily completed (hello, memory corruption!). Where possible, moved to use the Lua allocator rather than plain malloc. Finally, the NodeMCU task posting mechanism got a polish and an adjustment. Given all the Bad(tm) that tends to happen if something fails task posting, I went through a couple of iterations on how to avoid that. Alas, the preferred solution of blocking non-LVM RTOS tasks until a slot is free turned out to not be viable, as this easily resulted in deadlocks with the lwIP stack. After much deliberation I settled on increasing the number of available queue slots for the task_post() mechanism, but in the interest of user control also now made it user configurable via Kconfig.
2019-07-14 23:20:20 +02:00
The default value is chosen to be on the safe side for most use
cases. Lowering this value will yield more available RAM for use
in Lua, but at the increased risk of data loss. Conversely,
increasing this value can help resolve aforementioned data loss
issues, if encountered.
Fix net module data loss & RTOS task unsafety (#2829) To avoid races between the lwIP callbacks (lwIP RTOS task) and the Lua handlers (LVM RTOS task), the data flow and ownership has been simplified and cleaned up. lwIP callbacks now have no visibility of the userdata struct. They are limited to creating small event objects and task_post()ing them over to the LVM "thread", passing ownership in doing so. The shared identifier then becomes the struct netconn*. On the LVM side, we keep a linked list of active userdata objects. This allows us to retrieve the correct userdata when we get an event with a netconn pointer. Because this list is only ever used within the LVM task, no locking is necessary. The old approach of stashing a userdata pointer into the 'socket' field on the netconn has been removed entirely, as this was both not thread/RTOS-task safe, and also interfered with the IDFs internal use of the socket field (even when using only the netconn layer). As an added benefit, this removed the need for all the SYS_ARCH_PROTECT() locking stuff. The need to track receive events before the corresponding userdata object has been established has been removed by virtue of not reordering the "accept" and the "recv" events any more (previously accepts were posted with medium priority, while the receives where high priority, leading to the observed reordering and associated headaches). The workaround for IDF issue 784 has been removed as it is now not needed and is in fact directly harmful as it results in a double-free. Yay for getting rid of old workarounds! DNS resolution code paths were merged for the two instances of "socket" initiated resolves (connect/dns functions). Also fixed an instance of using a stack variable for receiving the resolved IP address, with said variable going out of scope before the DNS resolution necessarily completed (hello, memory corruption!). Where possible, moved to use the Lua allocator rather than plain malloc. Finally, the NodeMCU task posting mechanism got a polish and an adjustment. Given all the Bad(tm) that tends to happen if something fails task posting, I went through a couple of iterations on how to avoid that. Alas, the preferred solution of blocking non-LVM RTOS tasks until a slot is free turned out to not be viable, as this easily resulted in deadlocks with the lwIP stack. After much deliberation I settled on increasing the number of available queue slots for the task_post() mechanism, but in the interest of user control also now made it user configurable via Kconfig.
2019-07-14 23:20:20 +02:00
The assigned memory size here gets partitioned to the different
task priorities; some rounding down may take place as a result.
Fix net module data loss & RTOS task unsafety (#2829) To avoid races between the lwIP callbacks (lwIP RTOS task) and the Lua handlers (LVM RTOS task), the data flow and ownership has been simplified and cleaned up. lwIP callbacks now have no visibility of the userdata struct. They are limited to creating small event objects and task_post()ing them over to the LVM "thread", passing ownership in doing so. The shared identifier then becomes the struct netconn*. On the LVM side, we keep a linked list of active userdata objects. This allows us to retrieve the correct userdata when we get an event with a netconn pointer. Because this list is only ever used within the LVM task, no locking is necessary. The old approach of stashing a userdata pointer into the 'socket' field on the netconn has been removed entirely, as this was both not thread/RTOS-task safe, and also interfered with the IDFs internal use of the socket field (even when using only the netconn layer). As an added benefit, this removed the need for all the SYS_ARCH_PROTECT() locking stuff. The need to track receive events before the corresponding userdata object has been established has been removed by virtue of not reordering the "accept" and the "recv" events any more (previously accepts were posted with medium priority, while the receives where high priority, leading to the observed reordering and associated headaches). The workaround for IDF issue 784 has been removed as it is now not needed and is in fact directly harmful as it results in a double-free. Yay for getting rid of old workarounds! DNS resolution code paths were merged for the two instances of "socket" initiated resolves (connect/dns functions). Also fixed an instance of using a stack variable for receiving the resolved IP address, with said variable going out of scope before the DNS resolution necessarily completed (hello, memory corruption!). Where possible, moved to use the Lua allocator rather than plain malloc. Finally, the NodeMCU task posting mechanism got a polish and an adjustment. Given all the Bad(tm) that tends to happen if something fails task posting, I went through a couple of iterations on how to avoid that. Alas, the preferred solution of blocking non-LVM RTOS tasks until a slot is free turned out to not be viable, as this easily resulted in deadlocks with the lwIP stack. After much deliberation I settled on increasing the number of available queue slots for the task_post() mechanism, but in the interest of user control also now made it user configurable via Kconfig.
2019-07-14 23:20:20 +02:00
endmenu