
Introduction to Multimodal Generative AI
Innovation Classroom (Curated by EdCity)
Overview
The workshop aims to introduce participants to the emerging field of “Multimodal Generative AI” through explanation, real-world examples and a hands-on experiment. The participants will gain understanding into how AI systems interpret meaning from diverse inputs, moving beyond single modality. This session is designed to foster curiosity and understanding on how AI can interpret and generate content across different types of data. An engaging hands-on experiment will use a model to explore image-to-text generation. In this activity, participants will explore how an AI model can interpret images and automatically generate descriptive captions, bridging visual understanding and language generation in a creative and interactive way.
Participants will understand and experiment with multimodal generative AI using the BLIP (Bootstrapping Language-Image Pre-training) model.
• Understand the concept and applications of multimodal generative AI
• Learn how BLIP model processes and generates text descriptions (captions) for a given image input
• Gain hands-on experience
• Engage in an interactive activity to explore how AI “sees” and describes images