Neuroscience research suggests that humans possess multimodal neurons that selectively activate in response to different pictures or words representing specific individuals, landmarks, or objects (e.g., a multimodal neuron will fire in response to a photograph or sketch of Halle Berry). More recently, OpenAI researchers observed that artificial neural networks also contain multimodal neurons that respond to the same subject in photographs, drawings, and typography. Specifically, they find these multimodal neurons in CLIP, a general-purpose vision system that outperforms ResNet-50 on more challenging datasets (containing sketches, cartoons, and statues of objects). The authors warn readers that the behavior of multimodal neurons may make models like CLIP more vulnerable to adversarial attacks, bias, and overgeneralization.