TL;DR: I’ve spent the last decade coaching leaders in Silicon Valley on the invisible mechanics of presence: eye contact, pacing, gesture timing, and the clarity of the takeaway. Last week I ran a small experiment. I uploaded a raw, unpolished 90-second story into Gemini and paired with a prompt. I expected generic encouragement. I got timestamped feedback that felt… uncomfortably useful.

I’m sitting at my desk staring at the green recording light. Go, Michael! Speak!

Off the top of my head, I share a story about an 8th-grade English class presentation I gave about redwood trees. One take. No fancy setup.

You can watch it here.

Then I drag the file into Gemini, paste in a prompt, and hit enter, fully expecting the usual LLM fluff: “You were engaging! Great job!”

Instead, Gemini starts giving me feedback the way any good communications coach would: visual (gestures, facial expressions, lighting), vocal (pacing, pausing, volume), and verbal (clarity, wording, the strength of the takeaway).

And it’s specific.

It catches this at 0:45: “Excellent use of illustrative gestures. You physically ‘hoisted’ the imaginary version of yourself.” It interpret my body language?!

At 0:56 it calls out the line: “And there was no applause.” It understood my meta-linguistic move?!

At 1:15 it flags my takeaway and basically says: you could make this conclusion land harder. Then, it offers an upgrade: “Persuasion isn’t about the parade you expect; it’s about the connection you earn.” Damn, I wish I had thought of that in the moment!

I’ve spent ten years being the person who gives this kind of nuance about executive presence. So yeah. I was floored.

How is this even possible?

Some people think presence or charisma is undefinable. I don’t. It’s just a bunch of microskills that happen simultaneously and fast because they’ve been embodied through reps.

As a coach, I’ve trained myself to notice those microskills: when the speaker’s

  • eyes drift

  • pacing collapses

  • gestures lag behind the idea

  • conclusion doesn’t quite click.

I am able to prioritize the feedback and deliver the feedback in a way that builds confidence. Gemini is doing something similar, just at scale. It’s not “watching” your video like a movie. It’s sampling visual moments and audio cues and pattern-matching them against what typically reads as confident, connected, and clear. Not mystical. Not a soul-read. Just a lot of signal detection across multiple channels at once.

Why this matters

This isn’t just about AI checking your ideas and grammar. It’s about getting fast feedback on whether your body, voice, and words are telling the same story. That you are embodied. And that changes expectations.

“I didn’t realize I rambled” or “I didn’t know I sounded defensive” might have worked in the past. But as tools like this become normal, it starts to look less like an innocent blind spot and more like something you could’ve checked.

Managers aren’t going to hand-hold basic delivery forever. The new expectation becomes: did you run the tape before the board meeting?

  • Stop treating your communication like an instinct you either have or don’t. Treat it like an experiment.

  • Next time you have a five-minute update or a high-stakes pitch, record a quick rehearsal on your phone. Don’t even watch it. Just upload it.

  • It’s low-stakes. It’s private. And it’s more honest than what your best friend will tell you.

Try it Yourself

  1. Record a 90-second impromptu story or meeting update.

  2. Upload the raw video to a multimodal AI (like Gemini).

  3. Use this prompt that is evidence based so that you can double check it’s analysis

Video Self-Assessment Prompt (Copy and Paste)

Role: You are an expert strategic communication coach.

Task: Analyze my uploaded video and give me an evidence-based benchmark assessment that is specific and actionable.

  1. Hot Take (1–2 sentences) What’s the overall impact? Does it land and feel memorable? What’s the speaker’s vibe?

  2. Benchmark Table Create ONE table with these columns:

  • Category (Visual / Vocal / Verbal)

  • What Went Well

  • What Can Improve

  • Evidence (timestamps + short verbatim quotes when helpful)

Use these category definitions:

  • Visual: body language, gestures, eye contact, presence

  • Vocal: rate, tone, volume, pausing

  • Verbal: language choice, clarity, conciseness

For every “What Can Improve” point, add one sentence explaining the rhetorical impact (what it makes the audience think/feel/do).

  1. Single Growth Priority Name the one change that would most improve the video before final submission. Include: (a) what to change, (b) why it matters, (c) how to practice it in 10 minutes.

Next steps

Presence used to feel like this elusive thing. Something you picked up if you had access to the right rooms, the right mentors, the right feedback. That gate is opening. Which is good news… and also kind of intense.

If you want help stepping into the age of AI and connect on a human level through your leadership communication, contact me at [email protected] and visit leadinstride.com to learn more.

Reply

or to participate

Keep Reading

No posts found