Exploring Locally-Hosted AI Image Generation (with Stable Diffusion)

So here’s the thing; I was a little late to the AI imaging trend, and it seems every week there’s something new to learn about the latest technologies and models which excites the internet and leaves me further and further behind. What is a ‘prompt engineer’, and what does Salvador Dalí have to do with any of this? It’s been a task, but these days I consider myself well-versed in AI imaging technologies and have spent an embarrassing amount of time playing around with these tools to make interesting, amusing, and outright fantastical creations. But all this has been through some online portal or application; some website which charges me the privilege to speed up the generation of my 100th attempt at an obscure composition (see: a cartoon image of a green frog wearing a cowboy hat, riding a motorcycle, being chased by giant flies), and I can’t help but wonder how people get such intricate and expertly crafted results – results which I simply can’t match with the websites I’ve relied on so heavily.

Enter locally-hosted AI imaging tools and Stable Diffusion

Locally-hosted imaging tools combined with Stable Diffusion will leverage your own personal computer and offer a suite of add-ons and extras to help the struggling obsessive refine their outputs and provide greater control over your AI art. Reflecting on my journey, I aim to summarise and share my exploration of this topic, and potentially demystify the notions others may have – like I once did – around the limits and restrictions of AI imaging in today’s landscape.

It almost seems absurd: your personal computer has the potential to generate similar (or better!) fidelity of automatically-generated images as the likes of DALL-E 2/3 and Midjourney, all without needing to use some expensive supercomputer (which is often restricted behind a paywall). I had previously thought that much of this was forcibly online, but in reality the main core requirement is simply a graphics card which has a suitable amount of memory. These days however, such graphics cards can be pretty costly, and their memory capacities have seemingly shrunk due to a variety of factors (namely, chip shortages). However, with this in hand and some publicly available tools, your new favourite time sink will deepen to an all new-low!

Getting started

The most popular approach for generating AI images locally is notably Stable Diffusion. To put it simply, Stable Diffusion takes random noise and hones it towards a coherent result through an iterative process. It is a process that takes into account various inputs and controls, such as a positive and negative prompt. For example, you might consider trying something like the below in your positive prompt:

(photographic:1.3), (RAW photo:1.3), (ultra wide lens:1.3), (far shot:1.3), (photo:1.3), (a lion sitting on a rock, zoo), realistic, soft lighting, film grain, Fujifilm XT3, dslr, masterpiece, best quality, realistic, high-quality, ultra-high details, 8k uhd

With this prompt and the right model, here's what was produced.

If you’re stuck for ideas, take a look at some online image repositories and think about how you would tag them, with focus on the descriptors used to build a scene, and the image’s intrinsic and atmospheric qualities. Emphasis and word order is also important, with punctuation playing a role here also! The various brackets and numbers in the positive prompt above are an example of this, as these instruct the model to apply a particular weight to the terms within.

But you can also enhance your AI imaging beyond just simple text inputs, such as forcing a particular style, pose, likeness, or composition (and even creating videos, if you’re up for such a challenge!) There are a few tools which allow you to control and optimise the Stable Diffusion process, with Automatic1111 being a largely user-friendly experience, and one I’ve relied on heavily.

Essential to this process is a pre-trained model – or checkpoint – from which images are generated. When we think of the AI imaging tools available online, this concept doesn’t really come up, however Stable Diffusion allows full control over the model used to generate images from. There are a few base models openly available for Stable Diffusion, with the most recent and impressive being SDXL. However, you can also fine-tune a base model, which adds additional objects or styles. This provides an exceptional level of freedom by being able to leverage from a fine-tuned model for a specifically designed purpose, which guides your image towards a desired output. I can’t stress enough how many models exist in the wild; you can find almost any model online which people have created, from illustration styles, cinematography, and macro-photography, to nature/animals, architecture, and everything in-between. There’s little limit to the imagination of people when building the types of models for a given use-case, and you can even merge multiple models together to create new ones – the combinations and possibilities are truly endless (unlike my harddrive space).

Creative control

One of the most fascinating aspects of Stable Diffusion is the level of creative control it offers. Most of us by now are familiar with a simple prompt – a textual description of the image we desire to create – and many online methods don’t offer any variation beyond this input. It’s remarkable how a well-crafted prompt can coax the AI model into conjuring images that align with our vision, and this becomes more pertinent depending on the model and add-ons being utilised (certain keywords, for example, may trigger a particular object or style to be enforced). However, Stable Diffusion offers various controls to manipulate your image during the generation process, with each having their own nuances (such as CFG Scale, Sampling Methods, Clip-Skip and Sampling Steps). Their usefulness is apparent when you start shifting them, as they allow control over the level of randomness and creativity in the images generated. Brace yourself for an iterative process however, as there’s rarely a one-size fits all, but through your familiarisation comes a greater degree of control than alternatives I’ve tried online.

During your exploration, you might also come across another common tool for Stable Diffusion, known as a LoRA. Here’s an example image of a simple coffee machine, then again using a specialised LoRA to adjust its style:

LEGO Creator, coffee machine. shallow depth of field, vignette, highly detailed, high budget, cinemascope, moody, epic, gorgeous, film grain, grainy, high quality photography, 3 point lighting, flash with softbox, 4k, hdr, smooth, sharp focus, high resolution, award winning photo, 80mm, f2.8 <lora:lego_v2.0_XL_32:0.8>

In the above example prompt, the text in angled brackets is a referenced LoRA, and the ‘LEGO Creator’ text at the beginning of the prompt instructs the LoRA to activate during the image’s refinement. These LoRAs offer a very specific method of fine-tuning a model to a particular object or style. To help structure our image to represent everyone’s beloved building blocks – LEGO – into AI-generated images, I utilised a trained LORA with their likeness and applied it to my prompt. These can be incredibly powerful (yet a little tedious to prepare and train your own), but without it any common model would be completely incapable of incorporating LEGO-styled blocks, as it simply wouldn’t be trained on such specific imagery. Many LoRA’s already exist thanks to the expanding AI community, but with some elbow grease you can most certainly create your own for use with Stable Diffusion.

Challenges and triumphs

I’ll admit, generating high-fidelity images using Stable Diffusion isn’t painless. It’s still hard to get legible written words, and trying to force particular compositions can be outright infuriating (don’t get me started on hands…) Where I found relief is through its ability to take an existing input image and manipulate it further, which can correct many mistakes or oddities in your original image (or, make adjustments to a non-AI image). You can find many tools online which perform these manipulations to some degree, such as outpainting an image to reveal more of the scenery, or adding/removing an object or background, and these may even be bundled within software you already use. However these are readily available with Stable Diffusion, and can further be enhanced by a growing list of add-ons shared online.

Reflecting on the journey and the future

The convergence of our personal computers and AI models presents a wealth of possibilities for businesses, artists, and creatives. The challenges and fine-tuning required may seem daunting initially, but they are all part of the exhilarating process of harnessing AI for image creation, and all without relying on an online/paid alternative. For those in the business world, it’s worth keeping an eye on this ever-evolving field, and if AI imagery is called for, consider whether Stable Diffusion can meet your needs as a cheap, local alternative without compromising on quality.

As AI image generation continues to advance, staying informed and exploring its applications can lead to a creative revolution that enhances your business’s visual presence in ways you may have never imagined. Want to visualise your company’s brand on a particular product line? Need inspiration for a new piece of designer clothing? Or maybe you just need a unique, eye-catching banner to welcome visitors to your website? And of course, if you’re looking to bring your software concept to life, the Conduct team is here and ready to help!

Key Takeaways:

Transition to Locally-Hosted Tools: Making the shift from online AI imaging platforms to locally-hosted methods like Stable Diffusion can significantly enhance control and creativity in generating AI art. It enables a more personalised experience, leveraging the power of your personal computer to create intricate and high-fidelity images, whilst also bypassing queues and paywalls.
Understanding Stable Diffusion: Grasping the various controls of Stable Diffusion, a process that refines random noise into coherent visuals through an iterative process, goes a long way for refinement and manipulation over your imagery. These inputs and supporting tools extend beyond simple text prompts, enabling a user to impose styles, poses, and compositions on the generated imagery.
Importance of Pre-trained Models: Utilising pre-trained models or checkpoints is essential in locally-hosted AI imaging. These models can be fine-tuned or even merged to cater to specific creative needs, vastly expanding the scope and quality of the generated art. This offers an unprecedented level of creative control, and familiarisation with a model’s composition and its keywords may be daunting, but rewarding.
Future Possibilities: The fusion of personal computing power and evolving AI models opens a plethora of opportunities for artists, creatives, and businesses. Staying informed and exploring the applications of locally-hosted AI imaging tools could herald a creative revolution, enhancing visual presence and artistic expression in a rapidly advancing digital landscape.