AI Painting Pitfalls: A Two-Year Collection of Things That Go Wrong and How I Fix Them
I've been messing with AI painting for over two years now—from the early NovelAI days through Stable Diffusion, Flux, and various Chinese-made models. I've fallen into more holes than I've made good images. This article isn't a "comprehensive guide"—it's just me organizing the problems I keep running into and what actually fixes them.
Since it's personal experience, there's bound to be bias. Take what's useful.
Let's start with the one everyone hits first.
Hands: Yeah, They're Just Hard
You've seen the memes—six-fused fingers, palms melted together, fingers like octopus tentacles. This is the number one newbie complaint, and I think the one you need to make peace with first: AI draws hands badly, and it's not going to get dramatically better anytime soon.
It's not that your prompt is bad. It's not that your model is worse than someone else's. The fundamental problem is that hands have too many small details—bones, joints, nails, skin creases—and the poses vary enormously. A hand from behind, gripping something, holding a finger up, playing piano—each angle is almost a completely different thing to the model. Training data for hands is also unevenly labeled, so the model never learns a consistent structure.
What helps? Prompt improvements take you from "every image has a hand problem" to "occasional problems." My negative prompt always includes:
bad hands, extra fingers, missing fingers, fused fingers,
mutated hands, deformed hands, poorly drawn hands
And on the positive side I add things like perfect hands, detailed hands. After this, maybe it goes from constant problems to occasional ones. But "occasional" means you're still picking and choosing.
The method I actually rely on is Inpaint. The workflow: generate the full image first—hands look terrible, don't worry about it. Then select just the area around the hands, use a slightly higher denoising strength (0.5-0.65 for me), and redraw just the hands. Sometimes once isn't enough, do it twice. Sometimes try a different Seed. It sounds tedious, but once you're used to it it takes maybe thirty seconds and saves a ton of time.
One more thing people miss: compose so hands aren't the focus. Hands in pockets, behind the back, holding something naturally hanging down—these are way safer than both hands spread open facing the camera. This isn't avoiding the problem, it's working within the AI's actual capabilities.
If you really need hands to be right—like a ring close-up or piano-playing scene—OpenPose with ControlNet is the most reliable approach I know. Lock the hand pose with a skeleton first, then let AI fill in the details within that structure. At least you won't get six fingers.
Faces: Not Just Ugly, Uncanny
Hands are common, but a broken face is the one that genuinely makes people uncomfortable.
Common issues: asymmetrical eyes, misplaced features, frozen smiles that look like masks, and the worst kind—each feature looks fine individually but together it gives you the creeps.
I've noticed multi-face scenes are a disaster zone. Two or three people in one image, and there's always one face that's off. The reason is attention gets spread across everyone, so nobody gets a complete aesthetic treatment. If you care about faces a lot, keep the number of people low. Single close-ups are way safer than group shots.
I can't live without face restoration tools. CodeFormer is my go-to, at 0.5-0.7 weight for natural results. Above 0.7 it starts producing cookie-cutter Instagram-face look, losing the original style and detail. If two faces are broken, fix them one at a time—don't select everything at once, the quality is better when you focus.
Another thing that seems to help: front-facing angles are safer than profiles and extreme angles. Profile shots don't always break, but the probability is definitely higher. If your composition needs a tricky angle, mentally prepare to try a few times.
Resolution-wise, a lot of lot of people overlook this: if your base image is too small, facial details literally don't have room to render. Asking for a face close-up at 512×512—of course it'll be blurry. I usually render at normal size, then use Hires Fix to scale 1.5x, and facial detail improves noticeably.
Extra Heads, Extra Arms
The first time I got this I thought my prompt was wrong. Checked it carefully—no, the prompt was fine. Turns out this is just a "classic feature" of AI image generation.
Extra limbs, multiple heads—the community has names for these. This happens more at higher resolutions, especially with SD 1.5 models. My experience: once you exceed 768 pixels on one side with SD 1.5, this starts happening. SDXL is better, but "better" doesn't mean "gone."
So my habit is: always confirm composition at small size first, then upscale through proper channels. Hires Fix, Tiled Upscaler, or generate a small image and use img2img to scale up—these extra steps look tedious but effectively sidestep this pitfall.
Negative prompt additions that help somewhat:
extra limbs, extra arms, extra legs, mutated, disfigured,
bad anatomy, malformed limbs
Honestly these help, but controlling resolution is far more effective. If I could give one piece of advice: don't render large images directly.
For multi-person structural problems, OpenPose is genuinely good. Feed in a skeleton diagram, let AI fill in the flesh within those constraints. The only cost is the extra time preparing the skeleton image.
Repetition and Copy-Paste Backgrounds
Sometimes when drawing backgrounds or dense patterns—a flock of birds, a row of buildings, a field of flowers—you get this eerie sameness. The birds look identical, buildings are perfectly symmetrical, flowers look copy-pasted.
This is also worse at higher resolution. Render small, then upscale, and it largely improves. It might also be sampler-related—I feel Euler a does worse on large repetitive elements, DPM++ 2M Karras handles it better. Not sure if that's real or just my imagination.
Another scenario: you write "a group of different people" and AI still gives you clones. Adding words like:
diverse, varied, different faces, different outfits, unique
helps a bit. But honestly, AI's ability to produce "diversity" has always been limited, especially for multiple characters in the same scene. If diversity is critical to your image, be prepared to do some post-fix work.
Colors That Scream
Some models are especially bad at this. Certain anime-leaning or illustration models default to high saturation and contrast. The resulting images are "loud."
The most direct fix: lower CFG. CFG (Classifier Free Guidance) controls how much the AI "listens" to your prompt. At 7 it tries hard to match everything you said. At 5.5 it loosens up. Interestingly, lowering CFG often makes colors more natural, less extreme. When I dropped from 7 to 5.5, the colors noticeably "pulled back."
You can also add color direction to your prompt:
muted colors, natural color palette, soft lighting
Or exclude what you don't want in negatives:
over saturated, neon, garish, too vibrant
VAE matters too. The model's built-in VAE versus a separately loaded VAE can produce noticeably different color styles. If your colors always feel off, try swapping the VAE—sometimes the problem suddenly has a direction.
And of course, the most professional approach is still post-processing. I don't mean you need to open Photoshop for a major edit, but if an image is great in every way except the color is slightly off, a quick hue/saturation adjustment is way more efficient than re-generating.
Prompts That Don't Work
This might be the most frustrating situation: you clearly wrote "blue hair" and AI gives you red. You stressed "no hat" three times and AI draws a hat anyway.
Before blaming the model, look at how you wrote the prompt. Position matters first. Stable Diffusion-type models weight earlier parts of the prompt more heavily. If you write a long paragraph and put the most important content at the end, its presence is naturally weak.
Second is weighting. SD supports (blue hair:1.3) to boost weight, or ((blue hair)) for quick emphasis. But higher isn't always better—too much weight creates weird artifacts. 1.3 to 1.5 is usually the sweet spot.
Third—and I think this is crucial but often overlooked: prompts fight each other. Writing "early morning" and "starry sky" together, "minimalist" and "Baroque decoration" together, "surreal" and "photorealistic" together. The AI tries to satisfy both and produces something that's neither. It's not that AI didn't understand—it heard two contradictory instructions and couldn't fulfill both.
When a concept just won't appear no matter what, check whether your model covers that domain. Some models are strong at clothing design but weak at landscapes. Some excel at anime but can't do European classical painting. If the concept isn't there, switch models or add a specialized LoRA—sometimes that's more productive than banging your head against the prompt.
Style Drift: You Asked for Miyazaki, Got Nothing
Style control is a subtle topic.
I've seen people write prompts like: "Miyazaki style, Van Gogh brushstrokes, Shinkai lighting, Ghibli palette." That many style words and the AI has no idea what you want. Keep style words to two or one main style plus one modifier at most.
Position matters too. For emphasis, put style at the front: Makoto Shinkai style, a girl standing under cherry blossoms, soft light probably works better than putting the style at the end.
Artist names are sometimes more precise than style descriptions. by Greg Rutkowski hits harder than digital art style because the model saw this artist's real work during training and has specific visual patterns to reference. "Digital art style" is too broad—AI doesn't know which direction of digital art to give you.
But for the most reliable style control, it's still LoRA. A well-trained style LoRA beats any text description. I used one that mimicked a specific artist's style once—the results were on a completely different level from stacking prompt words. The only hurdle is finding a good LoRA. Civitai images that look great in the preview don't always work well in practice—you have to test.
Why Everything Is Blurry
Image quality issues come in a few flavors.
Overall blur, not enough detail: usually too low steps or no high-res fix. Below 20 steps images get visibly rough. I usually run around 30—beyond that the gains diminish a lot. I generally turn on Hires Fix, scale 1.5x, denoising 0.3-0.4 as a stable starting point. Above 0.5 and the image starts drifting.
Another type is an overall "plastic" feel, like a low-poly game render. This has a lot to do with the model's inherent style tendencies—some models just have this plastic quality. Adding quality words like masterpiece, best quality, ultra detailed, sharp focus sometimes helps, but don't expect these to compensate for a model's fundamental capability ceiling.
High-res fix + sufficient steps + correct sampler is the iron triangle for good image quality. For sampler I use DPM++ 2M Karras most—stable, decent output. DPM++ SDE Karras supposedly has better detail but is slower. I use it occasionally.
A Few Words on Mindset
After two years my biggest realization is: a lot of AI painting "problems" aren't technical—they're expectation problems.
We come in expecting "AI should paint as well as a human," then find out hands still don't work, complex structures still break, prompts sometimes just don't listen—and conclude AI is terrible. But if you understand it as a tool that's very strong in some areas and has clear limitations in others, a lot of things become less frustrating.
My habit now: every time an image comes out, I quickly scan—hands ok? face ok? extra limbs? colors feel right? Catch problems immediately while the Seed is still "hot" and fix them. Much more efficient than finishing everything and going back to fix one by one.
Also: don't wait for perfect technique before creating. Bad hands can be fixed, broken faces can be rescued, wrong colors can be post-adjusted. An image with great composition and ideas but a small flaw is far more valuable than a technically perfect but soulless image.
Tools change, models evolve—maybe next year half these problems won't be problems anymore. But at least for today, these pitfall summaries still hold up. Hope this helps.