Components are indexed by ESPHOME_COMPONENT_COUNT which is a uint16_t-sized
StaticVector. Using size_t for the dump_config index wastes 2 bytes of storage
and adds padding. Move it to the uint16_t group for better struct packing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the global `Application App` with placement-new construction
in aligned .bss storage. This eliminates the global constructor and
destructor chain that were:
- Calling xSemaphoreCreateMutex() at static init time (via Scheduler's
Mutex member) before app_main() runs
- Redundantly zero-initializing all members that .bss already zeroes
- Registering __cxa_atexit for ~Application() destructor chain
(~Application, ~vector, ~Mutex) that never runs on embedded
The storage is a char[] with a GCC asm label matching the mangled name
of esphome::App. Other translation units see a typed extern Application
(identical codegen, no indirection), while the defining TU sees a
trivially-destructible char array — so the compiler never emits
__cxa_atexit or the destructor chain.
Construction happens in pre_setup() via placement new, which is always
the first method called on App in the generated setup() function.
Add a 15-second timeout for completing the API handshake (Noise
transport + HelloRequest). Previously, a client could connect and
stall mid-handshake, holding a connection slot for up to 150 seconds
(the keepalive disconnect timeout). With max_connections defaulting
to 8 on ESP32, this allowed all slots to be blocked with minimal
effort.
Normal clients complete the full handshake in milliseconds, so 15
seconds is generous. The check short-circuits for authenticated
connections (single bitfield compare) so there is no overhead for
established sessions.
Add a 15-second timeout for completing the API handshake (Noise
transport + HelloRequest). Previously, a client could connect and
stall mid-handshake, holding a connection slot for up to 150 seconds
(the keepalive disconnect timeout). With max_connections defaulting
to 8 on ESP32, this allowed all slots to be blocked with minimal
effort.
Normal clients complete the full handshake in milliseconds, so 15
seconds is generous. The check short-circuits for authenticated
connections (single bitfield compare) so there is no overhead for
established sessions.
Single-byte varints (0-127) are the most common case in protobuf
messages (booleans, small enums, field tags). Skip the loop entirely
for these values by checking the first byte before entering the
multi-byte parsing loop.
Device IDs are FNV hashes (uint32) that frequently exceed 2^28,
requiring 5 varint bytes. This test verifies the firmware correctly
decodes these values in incoming SwitchCommandRequest messages and
encodes them in state responses.
Convert COLOR_OFF and COLOR_ON from extern const to inline constexpr.
The Color class already has constexpr constructors so these can be
compile-time constants, allowing the compiler to optimize default
parameter values and eliminate the runtime storage.
Split the rarely-taken warning path into a separate noinline cold
function so the hot path (called every component every loop iteration)
is minimal. Also make WARN_IF_BLOCKING_OVER_MS constexpr so the
compiler uses an immediate compare instead of a memory load, and
merge the two ESP_LOGW calls into one.
finish() shrinks from 108 to 30 bytes. Total flash savings: -116 bytes.
Convert setup_priority floats, component state uint8_t constants, and
status LED constants from extern const (defined in component.cpp) to
inline constexpr in the header. This lets the compiler use immediate
values instead of memory loads across all translation units.
Also removes the dead HARDWARE_LATE declaration (declared extern but
never defined).
Saves ~364 bytes flash on ESP32-S3.
On 32-bit platforms (ESP32 Xtensa), 64-bit shifts in varint parsing
compile to __ashldi3 library calls. Since the vast majority of protobuf
varint fields (message types, sizes, enum values, sensor readings) fit
in 4 bytes, the 64-bit arithmetic is unnecessary overhead on the common
path.
Split parse() into two phases:
- Bytes 0-3: uint32_t loop with native 32-bit shifts (0, 7, 14, 21)
- Bytes 4-9: noinline parse_wide_() with uint64_t, only for BLE
addresses and other 64-bit fields
The code generator auto-detects which proto messages use int64/uint64/
sint64 fields and emits USE_API_VARINT64 conditionally. On non-BLE
configs, parse_wide_() and the 64-bit accessors (as_uint64, as_int64,
as_sint64) are compiled out entirely.
Saves ~40 bytes flash on non-BLE configs. Benchmark shows 25-50%
faster parsing for 1-4 byte varints (the common case).
Reuse a single stack buffer for both name_id and legacy id
construction instead of two separate buffers. ArduinoJson
copies from char* before the buffer is overwritten.
Use .c_str() instead of passing StringRef directly to
ArduinoJson assignments, routing through the already-
instantiated set<const char*> template and eliminating
the 24-byte set<StringRef> wrapper from the binary.
std::list uses per-node heap allocation (one allocation per element)
and has poor cache locality. std::vector is contiguous and uses a
single allocation.
Changes:
- Replace std::list<Header> with std::vector<Header> in perform()
virtual method signature across all platforms (IDF, Arduino, Host)
- Update all start(), get(), post() overloads accordingly
- Add ESPDEPRECATED overloads for std::list<Header> callers
- Update online_image to use std::vector<Header>
- Clean up request_headers construction in play() using structured
bindings and reserve()