AI neural network able to produce images from text prompts

OpenAi in partnership with Microsoft recently developed a new neural network called DALL-E. This network is a version of the company’s GPT-3 language model and is able to create images from text captions, showing how AI understand language through visual representation.

 

DALL-E was trained by using a set of images linked to text prompts, which it uses to create an image corresponding to a new text. The AI then tries to understand the prompt before producing the image, building it step by step. If the neural network was shown parts of a pre-existing image with the text, it will be using these visual elements in that image.

 

The neural network, however, struggles with prompts that aren’t worded correctly and with how to position objects relative to each other.

 

The model used to train DALL-E develops 512 images for each prompt, which are then filtered with a separate computer model called CLIP to get the best 32 results.

 

Experts at the Cornell University in New York declared being very impressed by the work although there was more to do in looking at the ethical implication of a model like this.

 

Moreover, there are concerns about DALL-E’s natural language as it’s very cultural and context-appropriate. For example, there could be gaps between British and American English that the AI wouldn’t understand.

More
articles