Back to Blog

We Added Real Photos to AI Content Generation — Here's What Changed

A follow-up case study testing how reference photos affect AI-generated Instagram captions. Same coffee shop, same input — 0, 1, 2, and 3 photos. The results were surprising.

2026-04-098 min read
AI contentcase studysmall businessInstagram marketingphoto quality

Last time, we tested how input quality and business profile completeness affect AI-generated content. This time, we kept everything the same — and changed only one variable: the number of reference photos the owner uploads.

Same business. Same detailed input. Same AI model. Different number of photos.

Here's what happened.


The Setup

All tests used the same scenario: Bloom & Brew, a specialty coffee shop in East Austin, announcing their new Terroir Series — a single-origin espresso flight launching Saturday, April 12.

The detailed input (333 characters):

"Bloom & Brew — New Terroir Series launch Sat Apr 12 at both Austin locations. 3 single-origin espressos: Ethiopian Yirgacheffe (jasmine, bergamot), Colombian Huila (dark chocolate, caramel), Guatemalan Antigua (cedar, brown sugar). Flight includes palate-cleansing sparkling water. $14/flight. East 6th St 8am–6pm, Rainey St 9am–5pm."

We ran five test cases using Kreplo's Quick Create:

TC-ATC-B1TC-B2TC-B3TC-C1
Photos uploaded01231
Input qualityDetailedDetailedDetailedDetailedVague
Score34/3534/3533/3535/3523/35

The scoring rubric was the same 7-criterion, 35-point framework from our previous test: brand voice, product accuracy, local relevance, CTA clarity, hashtag quality, emotional appeal, and platform fit.


TC-A: No Photos, Detailed Input (Baseline)

No reference photo uploaded — AI generates visuals from text input only.

Score: 34/35

The AI produced a strong caption with the correct product details, location, and brand voice. The only issue: a hashtag with a minor spelling error. Everything else was accurate and Instagram-ready.

This confirmed what we saw in the previous test — a detailed input with a complete business profile is the strongest lever you have. Photos would need to improve on this to justify the extra effort.


TC-B1: One Photo, Detailed Input

Photo used: A single espresso shot on a wooden board — the kind of product photo a coffee shop owner would realistically have on their phone.

Espresso on wooden board — the reference photo used for TC-B1

What the AI generated:

Three origins. Three stories. One flight you won't forget. ☕✨

Something special is landing this Saturday, April 12 — and we can't wait to share it with you. Introducing the Bloom & Brew Terroir Series: a curated single-origin espresso flight that takes your palate on a journey across three of the world's most celebrated coffee regions.

🌸 Ethiopian Yirgacheffe — delicate notes of jasmine and bergamot, like spring in a cup 🍫 Colombian Huila — rich dark chocolate and smooth caramel warmth 🌲 Guatemalan Antigua — earthy cedar with a hint of brown sugar sweetness

Each $14 flight comes with palate-cleansing sparkling water between pours...

Score: 34/35

CriterionScoreNotes
Brand voice5/5Warm, story-driven, @bloomandbrewatx correctly tagged
Product accuracy5/5All three origins, correct flavor notes, $14, both locations + hours
Local relevance5/5East 6th St, Rainey St, Austin
CTA clarity5/5Save, tag a friend, comment CTA
Hashtag quality4/5#TerriorSeries (misspelled "Terroir")
Emotional appeal5/5"Three origins. Three stories. One flight you won't forget."
Platform fit5/5Perfect length, emoji cadence, storytelling arc

AI-generated Instagram image for TC-B1 — one reference photo used

Adding one photo didn't improve the score over the no-photo baseline — but it didn't hurt it either. The caption quality was essentially equivalent.


TC-B2: Two Photos, Detailed Input

Photos used: Espresso on wooden board + barista pouring a latte.

Barista pouring latte — second reference photo added for TC-B2

What the AI generated:

Three countries. Three stories. One unforgettable flight. ☕🌍

This Saturday, April 12, we're launching our brand-new Terroir Series — and your taste buds are invited.

🌸 Ethiopian Yirgacheffe — floral jasmine & bright bergamot 🍫 Colombian Huila — rich dark chocolate & smooth caramel 🌲 Guatemalan Antigua — warm cedar & sweet brown sugar

Each $14 flight comes with palate-cleansing sparkling water between pours...

Score: 33/35

CriterionScoreNotes
Brand voice5/5"Passport stamp in a cup" — creative metaphor, strong voice
Product accuracy5/5All details correct
Local relevance5/5Both locations and hours
CTA clarity5/5Save, tag, comment
Hashtag quality3/5#TerribleSeries appeared alongside #TerroirSeries — a notable hallucination
Emotional appeal5/5Strong hook and sensory language
Platform fit5/5Good structure

AI-generated Instagram image for TC-B2 — two reference photos used

Two photos produced the lowest score of the detailed-input group. The copy itself was excellent — but the AI simultaneously generated both #TerribleSeries and #TerroirSeries in the hashtag block. The creative writing improved; the hashtags got worse.

This is a meaningful finding: more inputs don't always produce more accurate outputs in every dimension.


TC-B3: Three Photos, Detailed Input

Photos used: Espresso on wooden board + barista with latte + roasted coffee beans. Together, these three photos represent the full product story: the origin (beans), the craft (barista), and the final product (espresso).

Roasted coffee beans — third photo added for TC-B3, completing the product story

What the AI generated:

Three continents. One flight. Infinite curiosity. ☕🌍

This Saturday, April 12, we're launching something we've been crafting for months — the Bloom & Brew Terroir Series. Two locations. One unforgettable tasting experience.

🌸 Ethiopian Yirgacheffe — delicate jasmine and bright bergamot 🍫 Colombian Huila — rich dark chocolate with smooth caramel 🌲 Guatemalan Antigua — warm cedar and sweet brown sugar

Each espresso is single-origin, pulled to honor the land and farmers behind every bean. The flight comes with palate-cleansing sparkling water so you can truly taste the difference between each origin — no flavor blending, just pure terroir.

💰 $14/flight — a small price for a world of flavor.

📍 East 6th St — Saturday 8am–6pm 📍 Rainey St — Saturday 9am–5pm

#SpecialtyCoffee #CoffeeLovers #AustinFoodie #CoffeeFlight #SingleOriginCoffee #AustinCoffee #TerroirSeries #EspressoFlight #BloomAndBrew #CoffeeOrigins #ATXEats #SpringCoffee

Score: 35/35

CriterionScoreNotes
Brand voice5/5"Honor the land and farmers", "pure terroir" — elevated copy
Product accuracy5/5All details correct, price and locations explicit
Local relevance5/5Both locations, hours, #AustinFoodie, #ATXEats
CTA clarity5/5"Drop a ☕ below if you're coming", "tag someone"
Hashtag quality5/5#TerroirSeries correctly spelled, all 12 relevant
Emotional appeal5/5"Three continents. One flight. Infinite curiosity." — best hook of all five tests
Platform fit5/5Ideal length, clear visual hierarchy

AI-generated Instagram image for TC-B3 — three reference photos used

The three-photo test produced the highest-quality output of the entire session. The creative language was richer ("a small price for a world of flavor"), the hashtags were clean, and the hook — "Three continents. One flight. Infinite curiosity." — was the strongest of any run.


TC-C1: One Photo + Vague Input

Finally, we repeated the vague input test from our previous study — but this time with one reference photo attached.

Input (17 characters): "new coffee launch"

Photo: The same espresso on wooden board.

Same espresso photo — now paired with only a vague 3-word input

What the AI generated:

Your mornings just got a whole lot more beautiful. 🌸☕

Introducing Rise & Bloom — the newest addition to the Bloom & Brew family...

Crafted for those who believe a great morning is more than just caffeine, Rise & Bloom is a wellness-forward brew with delicate floral notes, a smooth earthy finish...

Score: 23/35

CriterionScoreNotes
Brand voice4/5"Bloom & Brew" correctly named (photo helped); "Rise & Bloom" sub-brand invented
Product accuracy1/5Fabricated product: "wellness-forward brew, floral notes, earthy finish" — nothing like this was described
Local relevance3/5Austin, East 6th, Rainey Street — but no hours or date
CTA clarity3/5"Visit the link in bio" — generic, no specific action
Hashtag quality4/5All correctly spelled, reasonable choices
Emotional appeal4/5Strong hook, spring theme, but describes a fictional product
Platform fit4/5Good length and format

AI-generated Instagram image for TC-C1 — vague input with one photo

The photo helped in one specific way: the AI correctly identified the business as Bloom & Brew (presumably from the image context or profile) rather than defaulting to a generic name. Without the photo in our previous test, the AI had slightly better product-accuracy framing — but still invented a product.

The core problem with vague input did not change: the AI invented a product called "Rise & Bloom" with fabricated tasting notes. One photo is not enough to compensate for the absence of actual product information.


What the Results Tell Us

The photo sweet spot appears to be 3

PhotosScoreQuality driver
034/35Strong input alone
134/35Equivalent to no photo
233/35Hashtag hallucination emerged
335/35Best hook, cleanest hashtags

Counterintuitively, two photos performed worse than one or three. The most likely explanation: with two photos, the AI has more visual "surface area" to work from but not enough context to fully anchor the content — leading to some ambiguity in brand-related details like hashtags. With three photos that cohesively represent the product (the espresso, the craft process, the raw ingredients), the AI appears to develop a more complete picture of the product story.

This isn't a hard rule — it may vary across product types and photo choices. But based on this test, three cohesive, product-focused photos produced the best results.

Photos help with brand recall in vague-input scenarios

The one concrete benefit of photos in the TC-C1 test was that the AI correctly named "Bloom & Brew" in the caption — suggesting the photos contributed some brand context. This is a narrow but real benefit if you're the kind of owner who writes short inputs.

But it's not enough. The AI still hallucinated an entire product. Photos cannot substitute for product information in the input.

The vague input problem is photo-resistant

In our previous test, vague input with a rich profile scored 24/35. In this test, vague input with one photo scored 23/35. Within margin of error — photos make no meaningful difference when the input lacks product facts.

The AI cannot read your menu from a photo of an espresso cup. It needs words.


The Photo Checklist

When uploading photos to Quick Create, these factors make a difference:

What helps:

  • Photos that show the specific product you're announcing (not generic coffee shop shots)
  • 2–3 cohesive photos from the same product launch (origin beans, preparation, final drink)
  • Real photos over stock imagery — customers recognize the difference

What doesn't matter as much as you'd think:

  • Adding photos to compensate for a vague input — the text still has to describe the product
  • Photo resolution — any recent smartphone photo is sufficient
  • Photo count beyond 3 — diminishing returns after the product story is visually complete

The Combined Picture

Across both studies, the pattern is consistent:

Input quality → product accuracy Business profile → brand voice Photos → marginal creative lift (at 3+ cohesive shots)

You get a 34/35 caption from detailed input alone. You get a 35/35 caption from detailed input plus three cohesive photos. You get a 23–26/35 caption if either the input is vague or the profile is minimal — and photos don't meaningfully change that range.

The practical recommendation: start with your input and profile. Add photos when you have them — especially three shots that tell a coherent product story. But never let photo gathering delay your post.


Kreplo is an AI content platform built for small business owners. It generates Instagram captions, carousels, blog posts, and short-form videos — using your brand profile, your products, and your photos.

Automate your content creation with Kreplo

Start Free Trial
We Added Real Photos to AI Content Generation — Here's What Changed — Kreplo