Research has shown participants associate high pitch tones with small objects, and low pitch tones with large objects. Yet it remains unclear when these associations emerge in neural signals, and whether or not they are likely the result of predictive coding mechanisms being influenced by multisensory priors. Here we investigated these questions using a modified version of the implicit association task, 128-channel human EEG, and two approaches to single-trial analysis (linear discriminant and mutual information). During two interlaced discrimination tasks (auditory high/low tone and visual small/large circle), one stimulus was presented per trial and the auditory stimulus-response assignment was manipulated. On congruent trials preferred pairings (high tone, small circle) were assigned to the same response key, and on incongruent trials non-preferred pairings were (low tone, small circle). The results showed participants (male and female) responded faster during auditory congruent than incongruent trials. The EEG results showed that acoustic pitch and visual size were represented early in the trial (~100 ms and ~220 ms), over temporal and frontal regions. Neural signals were also modulated by congruency early in the trial for auditory (<100ms) and visual modalities (~200ms). For auditory trials, EEG components were predictive of reaction times, but for visual trials they were not. These EEG results were consistent across analysis methods, demonstrating they are robust to the statistical methodology used. Overall, our data support an early origin of cross-modal associations, and suggest that these may originate during early sensory processing potentially due to predictive coding mechanisms.