YouTube can see and hear a lot more than we can imaging.  YouTube has eyes and ears and let’s find out how it helps channel creators.

When a new video is published youtube might not know anything about the video. Well, its partly true, Youtube knows something about the New Video.

YouTube uses its own eyes and ears to gather information about the new video.

YouTube cross verifies the metadata from the channel creators with the information gathered by its own eyes and ears. Making it extremely hard to fool Youtube.


Let’s begin with YouTube’s eyes  – Introducing.

Vision A.I 

With Vision A.I, YouTube can Detect Faces.

Vision AI knows if a face is happy or sad. It knows if a face is happy or sad. It can detect and identify objects. It can also read the text in Thumbnails. Vision AI can read more than 50 different languages. 

YouTube can run a safe search and tell if your thumbnail is adult or spoof, violent or racy. To check YouTube’s Superpower in action. Let’s go to vision AI’s API and upload a few popular thumbnail images from YouTube.

Let’s visit YouTube and grab a thumbnail Images. 

Let’s type Get Rich Quick, right-click and copy the link address. Paste it in Thumbnail Grabber and we download this Thumbnail.

We upload the first thumbnail and here are the results.

The results from Vision AI

One face detected and there is no emotions attached to this face [neither joy, sorrow, anger]

Now we move to the objects tab.

Vision AI has detected the Outerwear, Wheel, Person, Car, and Clothing correctly.

The Labels part is where it gets interesting.

I’m able to see a SuperCar that’s Yellow in Color. But I didn’t have a clue about the name of the car Lamborghini Aventador.

I’ll click on Lamborghini Aventador and it links to google search. That’s the exact car. Vision AI picked it up.

So here is the Logo Tab.

Vision AI has identified the Logo correctly as PBS. There is a score 90% present next to the logo. So with 90 percent confidence YouTube or VisionAI knows that this is PBS logo

Now let’s move to Web Tab.

This is where Web Entities, Pages with Matched Images and Fully Matched Images. Partially Matched Images. Web Entities are pointing to Lamborghini Aventador. To Money, Wealth, Youtube, Everything is right about the Thumbnail image. If we scroll down. It shows Get Rich Slowly – This video is not about getting rich quick. It is about why getting rich quick will not work and how to get rich slowly and steadily.

This is not obvious from the thumbnail but Youtube has detected it right. If we scroll down and click on the first link. This is Philip Olson from thumbnail and he runs a channel Two Cents. From just scanning the thumbnail for a few seconds Vision AI has collected a lot of information. 

Do you still doubt YouTube’s SuperPowers?

Moving to the Text Tab,

YouTube can read the Text in your Thumbnail. Like we see here, it is not 100% accurate – 

Get Rich Quick,  

  • Get is not read,
  •  Rich is read wrong as Richiere and 
  • Quick followed by question mark is Right
  • The number 15 is not present in Thumbnail.

Moving to Properties Tab

Properties Tab is where you get the Color Codes of Thumbnail Images and you can copy and use it.

Let’s move to Safe Search Tab. One green indicator is present.

Do you spot five indicator bars of which one is green?

One green indicator means very unlikely to be Adult. Or spoof or Medical or Violent or Racy, One green and two green is alright. Three green boxes means we are in the borderline and we have to tread cautiously. Maybe revise the thumbnail a bit. Four and five green bars means we are inviting trouble.

In this case, the Get Rich Quick thumbnail featuring Philip Olson is Very Unlikely to be Adult, Spoof, Medical, Violent or Racy. So we are good to go!

We will revisit the Safe Search again with another example

We will upload a new Thumbnail.

So there is one face detected with no emotions. The objects detected are a Mic and a Person.

In the Labels, we find Spokesperson, Public Speaking, Orator, Music Artist. All possible cases because the person is holding a Mic

Now in Logos, it reads Solar Turbines instead of Startup Stories. It is not read correctly.

However, Vision AI reads it only with a 57% confidence. which means it could be half right or half wrong.

In the case of PBS logo, the confidence score is 90%

Let’s check the Web Tab and here is what we are presented with.

So Vision AI has detected it is celebrity entrepreneur JackMa and has a score greater than 12. I think it is an error. For usually the entity scores vary between 0-1. 1 being absolutely certain

So in this case, YouTube will be more than certain that the video is about  Jack Ma’s motivational speech. That is the information read from the Thumbnail. 

Moving to Text Tab,

The Text that is read is not accurate. The character ‘t’ is misread in Startup and missing in stories. BELIEVE DREAMS is read as BELIENE DREAMS

And IN YOUR has not been detected at all.

The Takehome here is while choosing fonts we have to choose fonts that are easy to read and very clear. Even for human eyes, the text presented in thumbnail is not easy to read. Naturally, YouTube misread the text and is not able to identify the language

If we hover over BELIENE YouTube hasn’t identified it as English language

Now it gets more interesting in Safe Search.

YouTube thinks this thumbnail is highly likely to be Spoof.

Now Spoof means it is a trick or false information.

Why would a thumbnail with Jack Ma on it consider to be Spoof?

So let’s upload a YouTube thumbnail from another motivational speaker Tony Robbins. YouTube has identified Faces and Smiling Faces. Persons & Text in Thumbnail is read well and accurate. How to 10x your business and Tony Robbins has got it all right and is 100% accurate.

Again in Safe Search, this thumbnail is considered Spoof. May be YouTube classifies Get Rich Quick schemes and Motivational messages as Spoof

So we will upload more thumbnail images that are motivational.

Getting through Hard Times is considered Spoof

The take-home here is in case if your thumbnail is considered Spoof in Safe Search. Word the Text differently or redesign the Thumbnail until it’s good to go!

We will wind up with the thumbnail from Mr.Beast. This is his 24 hrs underwater challenge videos.

There is a Face detected with No Emotion and the objects are right Person and Swimwear. The label says Water, Underwater, Leisure, Swimming Pool, and the web rightly identifies as Mr. Beast YouTube and Underwater Challenge

In the Text, I failed to notice Gold’s Gym, and 25. This is EPIC Stuff from Vision AI

Vision AI has got it 100% right and my eyes clearly missed it

In Safe Search, Thumbnail is under likely to be Racy. It could be due to the exposed parts in the body

One place where Vision AI went wrong.

Vision AI missed the Weights completely and identified it as a Swimwear. It identifies the weight as a Swimwear

That’s the only thing VisionAI got it wrong. But wait, it only identifies it with a 53% confidence

That’s where we can trust Vision AI’s confidence scores.

 I had loads of fun when I first stumbled upon Vision AI. 

I want you to upload a few thumbnails and learn more about YouTube’s eyes.

Youtube’s Vision is not 100% accurate

Now that we know Super Powers of YouTube’s Eyes. Let’s learn to optimize our videos and rank better.

Before uploading your custom thumbnail in your YouTube Channel hold on. Upload it in Vision AI’s API first. Learn a thing or two on how YouTube processes your Thumbnail beforehand.

Make sure the Keywords used in Title, Tags  and Description are consistent with how YouTube interprets your Thumbnail.

Also if your thumbnail liked to be Adult, Violent. It will be shown to less people and you will receive less money for those videos. 

Now that we know what YouTube can see.

Let’s find out what YouTube can hear?

YouTube uses speech recognition technology

To automatically caption, the audio it hears in the video. 

Youtube can Autocaption 50+ Languages. 

This Auto Captioning means YouTube. Listens to your Videos.

If the Keywords used in Title, Tags and Description are consistent with the Auto Captions from YouTube then YouTube will trust your video.

However, mispronunciations, accents, and background noises might lead to bad and embarrassing captions.

Even YouTube admits that it is not 100% accurate, but it’s one of the Best in the world’s Speech recognition technology till date. 

Here is how we can use the information gathered so far

To Optimize our videos on YouTube

Take Advantage of the Audio SEO

which plays a key role in checking keyword consistency

So pronounce your keywords Loud and clear and make sure Auto CC picks it up well

As a YouTuber, it’s your responsibility to write Close Captions [Custom CC] & not to relay on Auto CC from YouTube

Write close Captions to reach out to Wider audiences who don’t speak your language. 

Which translates to More Views and More Money. 


1.Upload your thumbnail in Vision AI

Make sure your keywords in Title, Tags and Descriptions is consistent with YouTube’s Vision

2. Add Close Caption

 Add Custom CC and don’t depend on Auto CC from YouTube. Always Pronounce your Keywords Loud and clear, to ensure Keyword Consistency which YouTube Algorithm loves. 

