ESP32-S3 AI Voice Display Box Setup Guide

This guide is for the ESP32-S3 AI voice display box. It combines voice input, speaker output, display status, WiFi, and backend services, so the safest setup is to validate each path separately.

Bring-up order

  1. Power the board and confirm serial boot.
  2. Test display output with a fixed status screen.
  3. Test microphone levels in a quiet room and while speaking.
  4. Test speaker output with a short local sample.
  5. Connect WiFi and run one backend voice request.
  6. Add display states: idle, listening, thinking, speaking, and error.

Audio and display behavior

Keep recording muted briefly after TTS playback so the device does not hear itself. Use the display to show state, not long instructions. If a user cannot tell whether the device is listening or speaking, the demo will feel unreliable even if the backend works.

Troubleshooting

ESP32-S3 AI Voice Display Box / XiaoZhi-compatible voice assistant guide

External references: Espressif ESP32-S3 getting started and Arduino documentation.

Related kit

If you want the same parts, here is the closest kit.

View Kits