Things move fast in this space and last weeks tech very quickly get super-seeded by the next rung up the ladder sometimes jumping 3. Having your finger on the Ai pulse and keeping informed on what is happening is paramount to doing that. There are so many great people and small companies doing even greater things in this space and documenting their process.
I found myself saving all of these tweets, posts and articles into my personal Notion and then thought, "Why not throw them all into a blog so everyone can benefit? In these weekly posts, I will condense all that I have stumbled upon and share some of these pearls of wisdom.
Replicate: Train and deploy your own DreamBooth model
The amazing people over at Replicate have put together a repo for you so you can train your own custom DreamBooth text-to-image model using a GitHub Actions workflow. Easy to follow and with a great step by step guide, this is a great entry point into the generative Ai space.
Generative AI has been abuzz with DreamBooth. It's a way to train Stable Diffusion on a particular object or style, creating your own version of the model that generates those objects or styles. You can train a model with as few as three images and the training process takes less than half an hour.
Notably, DreamBooth works with people, so you can make a version of Stable Diffusion that can generate images of yourself.
Open AI launch ChatGPT3
This is a big deal and if you have not already heard about it, ChatGPT3 is a game changer. OpenAI this week announced ChatGPT, a dialogue-based AI chat interface for its GPT-3 family of large language models.
It's currently free to use with an OpenAI account during a testing phase. Unlike the GPT-3model found in OpenAI's Playground and API, ChatGPT provides a user-friendly conversational interface and is designed to strongly limit potentially harmful output.
GPT, or Generative Pretrained Transformer, is a type of chatbot that uses natural language processing (NLP) to generate responses to user inputs. It is a type of language model that has been trained on a large corpus of text data, allowing it to generate human-like responses to a wide range of inputs.
Stable Diffusion V2 Released
As if SD v1 was not already mindblowing, the team have released Stable Diffusion V2 with a host of new and upgraded features.
Numerous new models and advances have been developed as a result of the initial Stable Diffusion V1, which was spearheaded by CompVis and altered the way open source AI models function. With 33K stars gained in less than two months, it has one of the quickest ascents to 10K Github stars of any program.
The initial Stable Diffusion V1 release was driven by the dynamic duo of Patrick Esser (Runway ML) and Robin Rombach (Stability AI) from the CompVis Group at LMU Munich, under the direction of Prof. Dr. Björn Ommer. With the help of LAION and Eleuther AI, they expanded on the Latent Diffusion Models they had previously developed in the lab.
Depth-to-Image Diffusion Model
The prior image-to-image feature from V1 is extended by the new depth-guided stable diffusion model, depth2img, with entirely new opportunities for creative applications.
The samples from the basic model itself can be used with this method. Consider this sample, which was created by an anonymous Discord member. The MiDaS model initially derives a monocular depth estimate from this input using the gradio or streamlit script depth2img.py; the diffusion model is then conditional on the (relative) depth output.
New Text-to-Image Diffusion Models
The robust text-to-image models included in the Stable Diffusion 2.0 release, which significantly outperforms earlier V1 releases, were trained using a brand-new text encoder (OpenCLIP), created by LAION with assistance from Stability AI.
The text-to-image models in this edition can create images with 512x512 and 768x768 pixel resolutions as their default settings.
Image Upscaling with Stable Diffusion
Additionally, an Upscaler Diffusion model that quadratically increases the resolution of images is included in Stable Diffusion 2.0. Stable Diffusion 2.0 may now produce images with resolutions of 2048x2048 or greater when combined with our text-to-image models.
Image Inpainting with Stable Diffusion
SD V2 also includes a new text-guided inpainting model that is optimized for the new Stable Diffusion 2.0 base text-to-image and makes it incredibly simple to rapidly and intelligently replace different portions of an image.
Dolly Zoom: LumaLabAI (Karen X Cheng)
In this great breakdown tut thread Karen breaks down the steps to create an Ai powered dolly zoom camera movement using Luma AI and NeRF technology.
A neural radiance field (NeRF) is a fully-connected neural network that can produce inventive renderings of intricate 3D scenes from a sparse collection of 2D photos. To replicate input views of a scene, it has been trained to employ a rendering loss. It employs interpolation to create a finished scene by using input images that depict a scene as input. The generation of images using synthetic data with NeRF is quite efficient.
Is this your first time hearing about LumaLabs and what you can now do with just your smartphone and some awesome software, heres a little more flavour.
Luma's mission is to enable everyone to capture and experience the world in lifelike 3D. To bring about the next step function change in how we share memories, explore products, and spaces on the internet. To propel photos and videos into our mixed-reality 3D future.
Related Posts
MidJourney is Coming to the Web
Apr 07, 2023
SkyNet is Here
Dec 14, 2022