Sora, ChatGPT and OpenAI

Since November 2023, the world has seen some crazy shifts in the tech space. It was just 15 months ago that OpenAI unveiled ChatGPT, and since then, it has come a long way. A couple of weeks ago, it shared a glimpse of its newest AI model, Sora.

Sora is a powerful text-to-video tool that generates realistic and imaginative scenes from text instructions. Here is an some example of the videos it has generated.

0:00
/0:17

This video has been compressed for upload hence quality is reduced. If this video does not play, access this link to view Sora's outputs.

So, how does Sora work? It uses research from previous OpenAI models like DALL·E 3 and ChatGPT to achieve this.

Now, some of you may not be familiar with DALL·E 3. Essentially, DALL·E 3 is OpenAI's text-to-image platform that does very well. I have used it several times, and it does a good job of understanding prompts and generating relevant images. However, it still has a long way to go. The biggest flaw that I see in DALL·E 3 is that it can only generate new images and cannot modify existing inputs.

What does that mean? The ideal mechanism would be that if you input a raw photo, you should be able to ask the AI to remove the red eye, make some basic modifications to it, and make it a perfect image. But OpenAI's technology cannot do this yet. I believe it would be game-changing if OpenAI models could reach this level. It would be a true step towards achieving Artificial General Intelligence (AGI).

The ultimate goal of all AI models is to achieve AGI. AGI is when a machine achieves enough intelligence to be comparable to a human.

Circling back to Sora, out of everything OpenAI has created, I believe this is the biggest leap yet. Because it is not just text comprehension or image generation that needs to be worked on. When creating a video, there are several things involved, including understanding motion, lighting & shadows, depth, and many more.

I feel this is a huge achievement considering what text-to-video generation looked like in 2023.

Text to video generation in April 2023

OpenAI has been a pioneer in AI research and is here to stay. They were recently valued at $80 billion dollars! However, they also face some legal trouble as Elon Musk (co-chair of OpenAI until 2018) has claimed that OpenAI is steering away from its foundation as a non-profit organization.

Regardless of the outcome, OpenAI is here to stay and will likely be a leader in AI research. Another company that I think will give OpenAI a hard time is Google.

Google, as many mistake, is not just a search engine but rather a data collection company, and its ecosystem of services is widely used by everyone. Google also leads in AI research but does a better job of deploying these models to the wider public.

Google's new LLM, Gemini 1.5 Pro (formerly known as Bard), has beaten OpenAI's ChatGPT in several tests and proven to be a more accurate and reliable chatbot. Google also has leveraged its infrastructure and rolled out image input and output for its prompts. Over the last week, Google has also announced that Gemini can access things like your calendar and make basic modifications to it. This tight integration between the chatbot and commonly used tech services will add to the convenience and may lead to more people using Google's LLM rather than OpenAI in the long run.

Google may also leverage its stance in the smartphone OS market to introduce Gemini as an in-built default feature for basic tasks. But I am sure that if Google adds this "feature" too explicitly, the US Government and EU will summon them to anti-trust hearings.

Well, the AI race has just started and is going to be a dominant race for several years to come. AI products will come and go, but the real race is not between the companies building AI-powered products, but between the companies who are building the AI that these AI-powered products will work on, aka, OpenAI.