Commit Graph

1 Commits

Author SHA1 Message Date
Johny Mattsson f44e6e9639 Fix net module data loss & RTOS task unsafety (#2829)
To avoid races between the lwIP callbacks (lwIP RTOS task) and the Lua
handlers (LVM RTOS task), the data flow and ownership has been simplified
and cleaned up.

lwIP callbacks now have no visibility of the userdata struct. They are
limited to creating small event objects and task_post()ing them over
to the LVM "thread", passing ownership in doing so. The shared identifier
then becomes the struct netconn*.

On the LVM side, we keep a linked list of active userdata objects. This
allows us to retrieve the correct userdata when we get an event with
a netconn pointer. Because this list is only ever used within the LVM
task, no locking is necessary.

The old approach of stashing a userdata pointer into the 'socket' field
on the netconn has been removed entirely, as this was both not
thread/RTOS-task safe, and also interfered with the IDFs internal use
of the socket field (even when using only the netconn layer). As an
added benefit, this removed the need for all the SYS_ARCH_PROTECT()
locking stuff.

The need to track receive events before the corresponding userdata object
has been established has been removed by virtue of not reordering the
"accept" and the "recv" events any more (previously accepts were posted
with medium priority, while the receives where high priority, leading
to the observed reordering and associated headaches).

The workaround for IDF issue 784 has been removed as it is now not needed
and is in fact directly harmful as it results in a double-free. Yay for
getting rid of old workarounds!

DNS resolution code paths were merged for the two instances of "socket"
initiated resolves (connect/dns functions).

Also fixed an instance of using a stack variable for receiving the resolved
IP address, with said variable going out of scope before the DNS resolution
necessarily completed (hello, memory corruption!).

Where possible, moved to use the Lua allocator rather than plain malloc.

Finally, the NodeMCU task posting mechanism got a polish and an adjustment.
Given all the Bad(tm) that tends to happen if something fails task posting,
I went through a couple of iterations on how to avoid that. Alas, the
preferred solution of blocking non-LVM RTOS tasks until a slot is free
turned out to not be viable, as this easily resulted in deadlocks with the
lwIP stack. After much deliberation I settled on increasing the number of
available queue slots for the task_post() mechanism, but in the interest
of user control also now made it user configurable via Kconfig.
2019-07-14 23:20:20 +02:00