All set_data_callback, set_part_complete_callback, onNotFound, and
onConnect methods were taking std::function/callback by value despite
every caller passing a lambda temporary. Change to rvalue reference
to eliminate unnecessary move+destroy overhead.
All subscribe_home_assistant_state, get_home_assistant_state, and
add_state_subscription_ methods were taking std::function by value
despite every caller passing a temporary or std::move()'d value.
Change to rvalue reference to eliminate unnecessary move+destroy
overhead at each forwarding layer.
Change set_timer_common_ to take std::function<void()>&& instead of
by value. This avoids materializing a std::function copy on the stack
at each call site — the caller just passes a pointer to the rvalue.
On BK7231N (Thumb-1), each forwarder (set_timeout, set_interval) was
118 bytes due to the inlined std::function move constructor + register
spilling. With rvalue reference, they shrink to 32 bytes each.
All 12 callers already pass rvalues (std::move or lambda temporaries),
so this is a purely mechanical change with no semantic difference.
BK7231N: forwarders 118 -> 32 bytes each, ~258 bytes saved total
ESP32: forwarders 46 -> 30 bytes each, ~20 bytes saved total
Change set_timer_common_ to take std::function<void()>&& instead of
by value. This avoids materializing a std::function copy on the stack
at each call site — the caller just passes a pointer to the rvalue.
On BK7231N (Thumb-1), each forwarder (set_timeout, set_interval) was
118 bytes due to the inlined std::function move constructor + register
spilling. With rvalue reference, they shrink to 32 bytes each.
All 12 callers already pass rvalues (std::move or lambda temporaries),
so this is a purely mechanical change with no semantic difference.
BK7231N: forwarders 118 -> 32 bytes each, ~258 bytes saved total
ESP32: forwarders 46 -> 30 bytes each, ~20 bytes saved total
On BK7231N (Thumb-1/Cortex-M0), GCC inlines ~unique_ptr<SchedulerItem>
(~30 bytes: null check + ~std::function + operator delete) at every
destruction site, while ESP32/ESP8266/RTL8720CF outline it into a single
shared helper. This causes significant flash bloat in scheduler functions.
Use a custom deleter (SchedulerItemDeleter) with its operator() defined
in the .cpp file, ensuring the compiler emits exactly one copy of the
destruction code. All destruction sites now generate a simple function
call instead of inlining the full body.
BK7231N savings (bytes):
- call(): 816 -> 670 (-146)
- process_to_add(): 390 -> 308 (-82)
- __adjust_heap: 430 -> 312 (-118)
- pop_raw_locked_(): 192 -> 140 (-52)
- cleanup_(): 130 -> 112 (-18)
- SchedulerItemDeleter: +32 (new, single copy)
- Net: ~384 bytes saved
ESP32/ESP8266/RTL8720CF are unaffected (already outline the destructor).
On BK7231N (Thumb-1/Cortex-M0), GCC inlines ~unique_ptr<SchedulerItem>
(~30 bytes: null check + ~std::function + operator delete) at every
destruction site, while ESP32/ESP8266/RTL8720CF outline it into a single
shared helper. This causes significant flash bloat in scheduler functions.
Use a custom deleter (SchedulerItemDeleter) with its operator() defined
in the .cpp file, ensuring the compiler emits exactly one copy of the
destruction code. All destruction sites now generate a simple function
call instead of inlining the full body.
BK7231N savings (bytes):
- call(): 816 -> 670 (-146)
- process_to_add(): 390 -> 308 (-82)
- __adjust_heap: 430 -> 312 (-118)
- pop_raw_locked_(): 192 -> 140 (-52)
- cleanup_(): 130 -> 112 (-18)
- SchedulerItemDeleter: +32 (new, single copy)
- Net: ~384 bytes saved
ESP32/ESP8266/RTL8720CF are unaffected (already outline the destructor).
The BK7231N Thumb-1 compiler inlines this function 3 times into
cancel_item_locked_, bloating it from ~140 B to 666 B. ESP32 and
RTL8720CF compilers already outline it, so this attribute only
affects BK7231N. Saves ~486 B of flash.
USE_ESP32 and USE_LIBRETINY are compiler flags (-D) always available
to .c files. USE_LWIP_FAST_SELECT is in the generated defines.h which
may not be force-included for .c files on all build systems. Use
platform flags directly in the .c file; .cpp/.h files continue using
USE_LWIP_FAST_SELECT from the generated defines.
Extend the ESP32 lwip_select() replacement (direct rcvevent reads +
FreeRTOS task notifications) to all LibreTiny platforms (bk72xx,
rtl87xx, ln882h).
All LibreTiny platforms have LwIP >= 2.1.3 with
lwip_socket_dbg_get_socket() and FreeRTOS task notifications. The
thread safety argument is actually stronger on LibreTiny since all
platforms are single-core ARM Cortex-M, eliminating cross-core
memory ordering concerns entirely.
Introduces USE_LWIP_FAST_SELECT feature define (set from Python
codegen for ESP32 and LibreTiny) replacing per-platform USE_ESP32
guards. The only platform-specific difference is FreeRTOS header
paths (freertos/FreeRTOS.h on ESP-IDF vs FreeRTOS.h on LibreTiny).
Expected impact on LibreTiny (same as ESP32):
- ~17x faster socket polling (direct rcvevent vs lwip_select)
- ~3.5 KB flash savings (dead code elimination of lwip_select)
- ~56 bytes static RAM savings (fd_set members excluded)
- ~200-300 bytes heap savings (UDP wake socket eliminated)