Neither yield() nor delay() can be called from ISR context or during
flash operations. yield() calls vPortYield/::yield which are not
ISR-safe, and delay() calls vTaskDelay/::delay which block the calling
task. IRAM_ATTR was applied as a blanket attribute when the HAL
abstraction was created in 2021 (#2303) but was never actually needed
for these two functions.
Retains HOT attribute for compiler optimization hints.
std::list uses per-node heap allocation (one allocation per element)
and has poor cache locality. std::vector is contiguous and uses a
single allocation.
Changes:
- Replace std::list<Header> with std::vector<Header> in perform()
virtual method signature across all platforms (IDF, Arduino, Host)
- Update all start(), get(), post() overloads accordingly
- Add ESPDEPRECATED overloads for std::list<Header> callers
- Update online_image to use std::vector<Header>
- Clean up request_headers construction in play() using structured
bindings and reserve()
The sizes are known at Python codegen time, so use FixedVector with
init() for a single allocation and no reallocation machinery. This
eliminates _M_realloc_append template instantiations for these types.
request_headers_ and json_ are populated at config time via
add_request_header()/add_json() and only iterated linearly at
runtime. std::map's red-black tree is unnecessary overhead for
these small collections. Drop the <map> include.
Move header lowercasing from the per-request start() path to config time:
- Python codegen now lowercases collect_headers values before passing to C++
- add_collect_header() stores values as-is (already lowered by Python)
- start() with std::vector is now a direct passthrough to perform()
- Deprecated std::set overload still lowercases for external callers
Rename collect_headers_ to lower_case_collect_headers_ and update all
parameter names throughout the chain to make the lowercase invariant
explicit in the API contract.
This eliminates per-request allocation of a temporary vector and
str_lower_case() calls on every HTTP request, reducing stack usage
in the perform() call chain where stack space is critical for HTTPS
TLS handshakes.
component.h is included before defines.h in application.h,
so the #ifdef guards on declarations were evaluated before the
define was visible. Keep declarations always visible (harmless
unused declarations cost nothing) and only guard the definitions.
The setup_priority override mechanism (struct, vector, linear scan,
allocation, and cleanup) is only needed when a user explicitly sets
setup_priority: in their YAML config. In practice this is almost
never used - components define their priorities via C++ virtual
methods instead.
Gate the entire mechanism behind USE_SETUP_PRIORITY_OVERRIDE, which
is only defined when the codegen encounters a setup_priority: config
entry. This eliminates dead code (struct, std::vector with reserve,
new/delete, linear scan in get_actual_setup_priority) from nearly
all builds.
Also removes the unnecessary reserve(10) call since the override
count is always very small.
Components are indexed by ESPHOME_COMPONENT_COUNT which is a uint16_t-sized
StaticVector. Using size_t for the dump_config index wastes 2 bytes of storage
and adds padding. Move it to the uint16_t group for better struct packing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the global `Application App` with placement-new construction
in aligned .bss storage. This eliminates the global constructor and
destructor chain that were:
- Calling xSemaphoreCreateMutex() at static init time (via Scheduler's
Mutex member) before app_main() runs
- Redundantly zero-initializing all members that .bss already zeroes
- Registering __cxa_atexit for ~Application() destructor chain
(~Application, ~vector, ~Mutex) that never runs on embedded
The storage is a char[] with a GCC asm label matching the mangled name
of esphome::App. Other translation units see a typed extern Application
(identical codegen, no indirection), while the defining TU sees a
trivially-destructible char array — so the compiler never emits
__cxa_atexit or the destructor chain.
Construction happens in pre_setup() via placement new, which is always
the first method called on App in the generated setup() function.
Add a 15-second timeout for completing the API handshake (Noise
transport + HelloRequest). Previously, a client could connect and
stall mid-handshake, holding a connection slot for up to 150 seconds
(the keepalive disconnect timeout). With max_connections defaulting
to 8 on ESP32, this allowed all slots to be blocked with minimal
effort.
Normal clients complete the full handshake in milliseconds, so 15
seconds is generous. The check short-circuits for authenticated
connections (single bitfield compare) so there is no overhead for
established sessions.
Add a 15-second timeout for completing the API handshake (Noise
transport + HelloRequest). Previously, a client could connect and
stall mid-handshake, holding a connection slot for up to 150 seconds
(the keepalive disconnect timeout). With max_connections defaulting
to 8 on ESP32, this allowed all slots to be blocked with minimal
effort.
Normal clients complete the full handshake in milliseconds, so 15
seconds is generous. The check short-circuits for authenticated
connections (single bitfield compare) so there is no overhead for
established sessions.
Single-byte varints (0-127) are the most common case in protobuf
messages (booleans, small enums, field tags). Skip the loop entirely
for these values by checking the first byte before entering the
multi-byte parsing loop.
Device IDs are FNV hashes (uint32) that frequently exceed 2^28,
requiring 5 varint bytes. This test verifies the firmware correctly
decodes these values in incoming SwitchCommandRequest messages and
encodes them in state responses.
Convert COLOR_OFF and COLOR_ON from extern const to inline constexpr.
The Color class already has constexpr constructors so these can be
compile-time constants, allowing the compiler to optimize default
parameter values and eliminate the runtime storage.
Split the rarely-taken warning path into a separate noinline cold
function so the hot path (called every component every loop iteration)
is minimal. Also make WARN_IF_BLOCKING_OVER_MS constexpr so the
compiler uses an immediate compare instead of a memory load, and
merge the two ESP_LOGW calls into one.
finish() shrinks from 108 to 30 bytes. Total flash savings: -116 bytes.
Convert setup_priority floats, component state uint8_t constants, and
status LED constants from extern const (defined in component.cpp) to
inline constexpr in the header. This lets the compiler use immediate
values instead of memory loads across all translation units.
Also removes the dead HARDWARE_LATE declaration (declared extern but
never defined).
Saves ~364 bytes flash on ESP32-S3.
On 32-bit platforms (ESP32 Xtensa), 64-bit shifts in varint parsing
compile to __ashldi3 library calls. Since the vast majority of protobuf
varint fields (message types, sizes, enum values, sensor readings) fit
in 4 bytes, the 64-bit arithmetic is unnecessary overhead on the common
path.
Split parse() into two phases:
- Bytes 0-3: uint32_t loop with native 32-bit shifts (0, 7, 14, 21)
- Bytes 4-9: noinline parse_wide_() with uint64_t, only for BLE
addresses and other 64-bit fields
The code generator auto-detects which proto messages use int64/uint64/
sint64 fields and emits USE_API_VARINT64 conditionally. On non-BLE
configs, parse_wide_() and the 64-bit accessors (as_uint64, as_int64,
as_sint64) are compiled out entirely.
Saves ~40 bytes flash on non-BLE configs. Benchmark shows 25-50%
faster parsing for 1-4 byte varints (the common case).