ESP32-S3 AI Voice Display Box Setup Guide

This guide is for the ESP32-S3 AI voice display box. It combines voice input, speaker output, display status, WiFi, and backend services, so the safest setup is to validate each path separately.

Bring-up order

Power the board and confirm serial boot.
Test display output with a fixed status screen.
Test microphone levels in a quiet room and while speaking.
Test speaker output with a short local sample.
Connect WiFi and run one backend voice request.
Add display states: idle, listening, thinking, speaking, and error.

Audio and display behavior

Keep recording muted briefly after TTS playback so the device does not hear itself. Use the display to show state, not long instructions. If a user cannot tell whether the device is listening or speaking, the demo will feel unreliable even if the backend works.

Troubleshooting

No microphone input: check I2S pin map, gain, and sample format.
Speaker is noisy: reduce volume, check grounding, and keep audio wiring away from high-current paths.
Display freezes: reduce refresh work during audio streaming.
Voice response is slow: test backend latency from a browser before changing firmware.

ESP32-S3 AI Voice Display Box / XiaoZhi-compatible voice assistant guide

External references: Espressif ESP32-S3 getting started and Arduino documentation.

ESP32-S3 AI Voice Display Box Setup Guide

Bring-up order

Audio and display behavior

Troubleshooting

Related products and references

Related kit