As per https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/examples/micro_speech, Audio Preprocessor model feeds to classifier model with 49 spectrographic features, each feature consisting of 40 channels of data. Thus, the preprocessing produces a single channel image that is 40 pixels wide, and 49 rows high. The PREPROCESS flag can be used to set other pre-processors (MFCC, Average) in this notebook (https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb)