The DALL•E program, which is similar to rival initiatives Stable Diffusion and Midjourney in that it can produce realistic or fantasy graphics from descriptive text, the AI research company has garnered a lot of interest.
With the launch of Point•E, an open source project that creates 3D graphics from text prompts, OpenAI has expanded the capabilities of its text-to-image program from two dimensions into three.
Point•E uses a distinct machine learning model called GLIDE even though it uses the same bullet point symbol as OpenAI’s DALL•E branding. And right now, it’s not quite as powerful. Point•E creates a low-resolution point cloud—a collection of points in space—that mimics a traffic cone when given a text direction like “a traffic cone.” The final product falls well short of a 3D rendering used in a movie or video game. But that’s not how it works. Point clouds are an intermediary step; after being imported into a 3D programme like Blender, they can be transformed into textured meshes that more closely resemble recognisably 3D imagery.
Point•E’s claim is that it “generates point clouds efficiently”; this is where the “E” in this case comes from. Unlike cutting-edge techniques, which require numerous GPU hours to complete a rendering, it can construct 3D models utilising only one to two minutes of GPU time. According to one assessment, it is 600 times faster than Google’s DreamFusion text-to-3D model.
But Point•E isn’t a project that’s ready for the market. It is fundamental research that could someday result in quick, on-demand 3D model creation. With more effort, it might make creating virtual worlds simpler and more accessible for those lacking specialised 3D graphics knowledge. Or maybe it will make the process of making 3D printed objects easier since Point•E facilitates the production of point clouds for use in manufacturing products.
There are further potential issues that need to be resolved. For instance, Point•E is anticipated to have biases inherited from its training dataset, just like DALL•E does.
And there is no assurance that the source models were utilised with permission or in accordance with any applicable licencing restrictions for the collection, which consists of several million 3D models and related information of unknown provenance. Legally, that can turn out to be a major headache.
A claim of intellectual property infringement against Github Copilot, a service that suggests programming code to developers using OpenAI’s Codex model, has already been fueled by the AI community’s casual attitude toward training machine learning models using the work of others without explicit permission. As text-to-image models become more widely used, they might undergo comparable testing.