ChatGPT can see, hear, speak, generate images, access internet

Michael Petraeus

September 29, 2023

Disclaimer: Any opinions expressed below belong solely to the author.

As smart as ChatGPT has already proven itself to be, it’s been thus far hindered by very limited input and output channels. You could only type with it, and it would type back. Text in, text out.

Another limitation was the fact that it operated only on what it has been taught up to September 2021 — two years out of date, which prevented it from being a useful alternative to search engines for seeking current information.

But OpenAI has been busy at work, preparing a slew of updates, through which it is taking one massive swipe at all of its competitors at once.

We have already covered its updated image generator, Dall-E, which is being merged into ChatGPT, while providing demonstrably superior capabilities to rivals like Midjourney and open-source Stable Diffusion.

Starting October, you will be able to simply ask it to quickly render any image for you at will and near instantly.

Turns out, it was only an appetiser.

A picture is worth a thousand words

As humans, we know firsthand how much better it is to see something instead of only imagining it based on a verbal description. Conversely, trying to describe something accurately is far more tedious than simply showing it.

It was only a question of time then, that artificial intelligence (AI) would have to acquire the same senses that we have in order for communication between us to be as seamless and productive as possible.

That moment has finally come, as earlier this week, OpenAI announced that by mid-October, its flagship product is going to be able to also see, hear and speak, changing the way we interact with it:

“Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it.

When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step-by-step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.”

If we could quantify it, I’m sure it would qualify to be called an exponential growth in usability. Forget having to type — show it, say it, take a picture, ask a question and listen to the answer.

Given how ubiquitous cameras and microphones have become — with pretty much everyone carrying one (or more) in our pockets all the time — ChatGPT is morphing into an ever-present assistant that you can ask for help on the go.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023

It could also make augmented reality (AR) gear, such as the recently-announced second generation of Meta’s RayBan smart glasses equipped with a camera, speakers and microphone, really useful by allowing you to seamlessly consult ChatGPT without even reaching into your pocket.

meta rayban smart glasses — Will Mark allow it, though? / Image Credit: Meta

This may very well be the missing element that all sorts of smart glasses have lacked over the years since the launch to now-retired Google Glass a decade ago.

Instead of trying to create a persistent display in front of our eyes, why not simply use a built-in camera to have an AI assistant see what we see and hear what we both hear and say to him?

No need for sophisticated optical trickery — just a James Bond-esque gadget, seeing and processing more information on the fly that we could ever do, providing the right information straight into our ears.

The end of Google

As if that wasn’t enough for one week, OpenAI has topped it off with a cherry that everybody was waiting for: ChatGPT will finally start roaming the internet in real-time. No more limitations, you can ask it about anything and it will return answers and links to sources.

ChatGPT can now browse the internet to provide you with current and authoritative information, complete with direct links to sources. It is no longer limited to data before September 2021. pic.twitter.com/pyj8a9HWkB
— OpenAI (@OpenAI) September 27, 2023

Following partnership with Microsoft (and the company’s massive investment in OpenAI), the feature is built on top of Bing, browsing through which you will need to enable in ChatGPT to get live results.

It is also a major challenge to Google, whose Bard failed to take off in quite a similar fashion just yet.

Interfacing via existing search engines is painfully clunky compared to an intelligent bot. Many results are still elevated through questionable SEO practices and it often takes some time before you dig out the information you were really looking for (while avoiding scams, spam, and malware).

In contrast, a chatbot provides you with an accurate answer in seconds, skipping through all the irrelevant noise. It’s bound to change the way we look for information on the internet, even if it doesn’t entirely wipe out traditional search engines.

They may remain as a safety valve, in case you want to manually verify the veracity of whatever the seemingly intelligent assistant has told you — but it’s likely to be an edge, niche case, than the mainstream it is today.

It doesn’t take long to realise that it poses an existential threat to Google — one of the largest tech companies on the planet, which has been enjoying a de facto monopoly on web search all over the world, that also accounts for close to 60 per cent of its annual revenue coming from advertising targeting specific keywords.

google revenue breakdown — Image Credit: Oberlo

Even if Google is able to come out on top in the battle of AI chatbots somehow — which doesn’t seem likely at the moment — the format AI answers are provided in precludes placement of many ads next to it.

Currently, a search results page is a list of hundreds of links, spread across multiple pages (on desktop) or a long scrollable wall (on mobile), in between which advertising for dozens of companies is placed.

But as AI bots simply provide an answer and/or point to a particular source, you can’t offer nearly as many slots for ads as before, without completely overwhelming the user.

While it seems it should still be possible to draw some money from queries in which users specifically ask for providers of particular services (who could pay to be promoted), monetisation of millions of other search terms may become extremely difficult.

This affects not only Google’s search revenue but display advertising as well, as banners are placed next to content — but much less of it will be consumed if people get what they want within an app such as ChatGPT.

If they don’t browse websites, they won’t read articles and click on ads, hurting both publishers and Google.

Together, this undermines nearly 70 per cent of Google’s current revenue and it doesn’t seem that the company has a response to it, still playing catch-up to ChatGPT’s usability.

OpenAI may not be moving very quickly. After all, it’s been almost a year since the launch of ChatGPT 3.5 and its fourth iteration launched in March has brought only moderate improvements, which came at a relatively high price and use limits — but when it does, it sure puts the fear of God into its rivals.

Generative AI art competitors, other intelligent chatbots, and even some of the world’s largest companies, like Google or Meta (which announced a partnership with Bing — so, by extension, ChatGPT) either have to find a response or a way join it before it puts them out of business.

Featured Image Credit: CoinGeek

Also Read

In an impressive video, Tesla Bot recognises own limbs and shows awareness of surroundings

Categories: Tech giants, Singaporean, Editor's Picks