From Language to Images: AI Makes Itself Indispensable

ChatGPT has proven itself in recent weeks to be an extremely capable, powerful, and practical language tool, already used by millions of users in a wide range of disciplines. But the GPT-3 language model, developed by the American research company OpenAI, can do much more. It is also the basis of the text and image generator Dall-E 2.

#Company #Artificial Intelligence

Luca Bino
+41 58 263 22 29
luca.bino@umb.ch

You need a painting of a 1968 Ford Mustang - in expressionist style? Or simply a photograph of a large server room in impressive colors? No problem: DALL-E 2 will generate the desired images in seconds[i]. DALL-E is a variant of the GPT-3 language model developed by OpenAI, which is in the process of making a name for itself worldwide as an AI benchmark. The program is based on the so-called transformer architecture, a type of neural network architecture that is primarily used in natural language processing - for example, in language translation or text generation. The knowledge of the GPT-3 model consists of a huge amount of Internet texts that the program has recorded. GPT-3 reacts to input with a response that can be a continuation of the prompt, an image, or a combination of both. The model can also be made a specialist for certain tasks - for example, translations, answering questions, or even generating images.

Large data volumes and complex dependencies

DALL-E 2 works great for image generation, even though the AI tool is not an image generation model at all, but a language model that can generate both images and text based on the input prompt. The Transformer architecture is particularly well-suited to this task because it can handle the large amounts of data and complex dependencies that must be mastered in image generation. There are also pure image transformation models that are specifically designed for image generation. These models also use a transformer architecture to generate images, but unlike DALL-E, they do not work with a description of the desired image as input.

Azure for OpenAI - ChatGPT for Bing

Incidentally, DALL-E stands for "Demonstrated Attribute Logic Learned from Examples"; both ChatGPT and DALL-E are available via the OpenAI API. The interface makes it possible for developers to integrate the models into their applications. The main partner company of OpenAI research is Microsoft; the software giant has invested billions of dollars in OpenAI since 2019 and, according to press reports, is in the process of making another ten billion available. Azure is the exclusive cloud provider for OpenAI and will be optimized for customers deploying global AI applications[iv]. Furthermore, Microsoft will leverage ChatGPT's immense capabilities by integrating the AI chatbot into its Bing search engine - which could well be a game changer in the Google-dominated search engine market[v].

[i] DALL·E 2 openai.com

[ii]10 Best AI Art Generators

[iii]Artists file class-action lawsuit against AI image generator companies | Ars Technica

[iv]Microsoft announces new investment in ChatGPT-maker OpenAI

[v]ChatGPT: One million people have joined the waitlist for Microsoft's AI-powered Bing

This image was created from the search terms "Lake Zurich and Zurich skyline in the style of Van Gogh".

Semantic segmentation for image and text

An example of a pure image transformation model is GPT for images, where the model is trained on a huge dataset of images to understand image representation in the first place. Subsequently, the model can be fine-tuned with a smaller dataset for a specific task, for example object recognition or semantic segmentation. Semantic segmentation is a concept in which an image is divided into segments that have certain semantic properties - for example, objects, backgrounds, landscapes, trees, people, or animals. The concept is also applied by GPT in machine language processing: Texts are divided into semantic parts in order to capture the meaning of the text.

Texts, images, music, programming languages

DALL-E 2 in its multifunctionality is already used globally, also to try out new use cases - for example, the composition of music and the development of new programming languages. In addition to DALL-E, there are already numerous apps for image generation[ii] which use deep learning models. This raises questions about intellectual property as well. For example, a group of artists recently filed a class action lawsuit against several companies that offer image generators, claiming that using billions of Internet images to train AI tools would violate the rights of millions of artists[iii]. OpenAI has commercially licensed much of the training data from companies like Shutterstock and is not on the list of companies sued.

This image was created from the search terms "Yellow data centre on Mars".

Chat GPT and GPT-3: Things AI can do

Doctors, lawyers, and consultants are among the highest paid professionals today. Generative artificial intelligence is already proving that it could - and in some cases soon will - take over many of the tasks performed by these specialists.

In the US, a team of researchers has demonstrated that OpenAI's ChatGPT can pass the United States' rigorous medical licensing exam[i].This is a three-part exam that is required of all medical school graduates in the US for admission to medical school. ChatGPT demonstrated a high degree of consistency and insight in its explanations, the researchers announced. They concluded that large language models like ChatGPT had the potential to help with medical education and decision making. According to the researchers, there are already clinics that have begun experimenting with using ChatGPT.

GPT3 can do more than medicine, it can do management too[ii]. A professor at the prestigious Wharton School of Business conducted a study using GPT-3 (the language model on which ChatGPT is based) for the final exam of an MBA core course. The professor found that GPT-3 performed best on basic business management and process analysis questions. For these questions, he said, the AI model provided both correct answers and excellent explanations of them.

Finally, OpenAI's artificial intelligence model also narrowly passed an important legal exam[iii]: On the multiple-choice component of the bar exam (MBE), GPT-3.5 scored just over 50 percent correct and passed the evidence and tort exams. The bar exam is the test that law school graduates must pass to officially practice law. It consists of three parts, with the MBE being the first.

[i]AI Bot ChatGPT Passes US Medical Licensing Exams

[ii]Would Chat GPT Get a Wharton MBA?

[iii]GPT Takes the Bar Exam

From Language to Images: AI Makes Itself Indispensable

Large data volumes and complex dependencies

Azure for OpenAI - ChatGPT for Bing

Semantic segmentation for image and text

Texts, images, music, programming languages

Chat GPT and GPT-3: Things AI can do

Similar topics

Grüezi! UMB Voicebot Supports Kreuzlingen Residents' Office through AI – and Understands Swiss German Too.

Ski Cross Olympic Champion Ryan Regez Starts the Season and Will Be a Sustainability Ambassador at UMB.