ESP32-S3 AI Voice Display Box Setup Guide
This guide is for the ESP32-S3 AI voice display box. It combines voice input, speaker output, display status, WiFi, and backend services, so the safest setup is to validate each path separately.
Bring-up order
- Power the board and confirm serial boot.
- Test display output with a fixed status screen.
- Test microphone levels in a quiet room and while speaking.
- Test speaker output with a short local sample.
- Connect WiFi and run one backend voice request.
- Add display states: idle, listening, thinking, speaking, and error.
Audio and display behavior
Keep recording muted briefly after TTS playback so the device does not hear itself. Use the display to show state, not long instructions. If a user cannot tell whether the device is listening or speaking, the demo will feel unreliable even if the backend works.
Troubleshooting
- No microphone input: check I2S pin map, gain, and sample format.
- Speaker is noisy: reduce volume, check grounding, and keep audio wiring away from high-current paths.
- Display freezes: reduce refresh work during audio streaming.
- Voice response is slow: test backend latency from a browser before changing firmware.
Related products and references
ESP32-S3 AI Voice Display Box / XiaoZhi-compatible voice assistant guide
External references: Espressif ESP32-S3 getting started and Arduino documentation.