Automating Blog Cover Image Generation: Leverage Playwright and GPT-4 with Midjourney for Programmatic SEO
Discover how we automated blog post cover image generation for GiftIdeasAI.xyz using GPT-4 and Playwright. Learn from our process and see the top picks in our upcoming image gallery.
Felix VemmerJuly 30, 2023
My girlfriend and I took on this cool side project, GiftIdeasAI.xyz. We put our heads together and came up with a custom-built gift recommendation engine, all juiced up with OpenAI's GPT. It's a pretty neat setup: users throw in some filters like age, occasion, budget, and bam! They get 10 unique gift ideas. We've even got most of these linked up to Amazon with affiliate codes, so we can make a bit of cash on the side.
Wanna see how it works? Here's a demo of our Gift Recommendation Engine:
We were all starry-eyed at launch, thinking users would just start flowing in. But, as you fellow indiehackers know, it doesn't really roll that way. So, we started thinking, why not try to reel in the crowd with some long-tail keywords related to gift ideas? This way, we get to drive some traffic to our blog and then direct folks to our free gift idea engine. Not a bad plan, right?
Challenge 1: Google Will Not Index Our Pages
Spent a good chunk of time crafting a neat data schema, and even had GPT-3 churn out some snazzy titles, subtitles, and descriptions for the gift ideas. Thought we'd nailed it, but Google had other plans: just wouldn't index those pages. Let's dig into this a bit more:
To tell you the truth, I was pretty happy with the content, the headings, and the intros we put together. But, even then, our blog posts looked a bit thin - like a party without the balloons. What were we missing? A good, attention-grabbing blog post cover image.
Challenge 2: Scoring Stock Images for Blog Post Covers
So, my first shot at this was to use the Unsplash API, thinking I could easily snag some good-looking cover images for our blog posts. Turns out, finding the perfect picture for each specific keyword was like finding a needle in a haystack. For instance, try finding the perfect image for Gift ideas 9yr old girl. Not as straightforward as I thought.
Then I thought, why not try Stable diffusion? But yet again, the results left a lot to be desired. I just wasn't hitting the mark. I'd heard a lot of buzz around Midjourney, so I figured, why not give it a shot?
Well, it turns out that even with Midjourney, my initial prompts were a bit off-target and didn't quite give me the results I was hoping for:
Housewarming gift ideas, new homeowners, vibrant, inviting, decorated living room, warm and cozy, coffee table, potted plant, scented candles, personalized welcome mat, elegant coasters, stylish decorative vase, soft natural lighting, modern kitchen, comfortable seating area, thoughtful gestures, meaningful gifts
After doing some research though, I finally was able to use GPT 3.5 to start generating some good prompts based on a given blog post title.
Blog Post Title: 10 Unique Gift Ideas for a 9-Year-Old Girl
A photograph of a beaming 9 - year - old girl unwrapping a gift, surrounded by her friends at a
colorful and lively birthday party, captured during midday with natural lighting streaming in
through a large window. The picture is shot with a Canon EOS Rebel T7i camera, using a 35mm f/ 1.
8 lens, highlighting the excitement and joy of the moment.
Here's the final result of the picture. Not too shabby, right?
By the way, if you're interested, I'll soon be offering the stock images prompt generator as a free tool
on BacklinkGPT.com. Additionally, I also recommend Yuyu's opengraph.xyz
tool for checking how your blog post will look on social media.
Challenge 3: Automating the Midjourney Blog Post Cover Image Generation
Here's a curveball: Midjourney doesn't yet have an API for easy image generation. Let's look into this:
On top of that, instead of offering a straightforward interface, you end up sending a bunch of messages
to a Discord bot. It's not exactly the most automation-friendly setup:
You start with the /imagine [prompt] command
You need then to upscale one of the provided versions
Finally, you need to save the image from an url
Being a programmer who values efficiency (okay, maybe a little lazy), I hit up Google to find any existing solutions. I found a few overpriced unofficial API services and a Medium article that leaned heavily on PyAutoGUI. But since I wanted to keep working while the script was running, I started looking into web browser automation.
When I couldn't find exactly what I needed, I did what any self-respecting coder does: I decided to write the script myself, with a little assist from GPT-4.
Automating Midjourney Image Generation
So here's the game plan for the automation at a high level, along with a peek at the corresponding code block:
Log in to discord using my email, password and two-factor auth_code
Get the slug and blog_post_cover_prompt for all my blog posts from the database where there is no blog_post_cover_image_url set yet.
Post the prompt with the slash command
Wait for the 4 image variations to be generated and upscale the first version
Wait for the upscale version to be available and retrieve the image_url
Download the image to my local file system
Upload the image to an S3 Bucket
Update the row blog_post_cover_image_url with the newly set S3 Url
A little known secret to automation is to actually write as little code as possible by leveraging GPT-4 and the playwright codegen utility.
I started of generating 70% of the code by running the following simple command:
This process launches a window where you can start interacting with the Discord server - clicking, typing, you name it. The great thing is, as you're doing this, the code is being automatically generated and recorded for you. All you have to do is copy it and paste it into your script. Here's how it looks:
So, with the code generated by Playwright and a little help from GPT to refactor it, creating the final login_to_discord function becomes a breeze. Let's take a look:
And voilà! Here's what our final function looks like:
Next, I followed the same process for the remaining parts. There was a bit of manual work involved - mainly finding the last Discord messages and figuring out the waiting logic to decide when to proceed.
And we repeat the same process again for retrieving the image link , giving minimal feedback:
GPT Engineering Wait Functions for Image Link Retrieval
With a little bit of debugging, the final script was ready to roll. I've shared it in this GitHub Gist for you to take, tweak, and tailor to your needs.
To wrap things up, we embarked on a journey to automate blog post cover image generation. We navigated through challenges with Google's indexing, experimented with different solutions for sourcing images, and eventually automated the process using a blend of Playwright and GPT-4. The result? An efficient way to generate engaging, relevant cover images for our blog posts.