Every week Sembit holds an internal all-hands company meeting. We talk about the latest technology, give business updates, or talk about current projects we're working on. A few weeks ago, I was on deck to demo a new product from Anthropic called Claude Code. The demo I was doing wasn't scripted. I just asked the Sembit team for some ideas of what to do, and we went for it. After about a minute of brainstorming, we came up with this as our initial prompt:
SIMPLE PROMPT
Create a basic Settlers of Catan clone. Make it a single html file with embedded javascript.
Using the Sonnet 3.7 model, it came up with this game. It was pretty amazing how far it had gotten, but we thought we could do better. There were various bugs (some buttons didn't work at all, the robber mechanic spit out an annoying error message, we couldn't figure out how to place road, and so on). We were only maybe 4 minutes into the call - I asked the team for better prompts and we redid our experiment, telling Claude Code more exactly what we wanted:
COMPLEX PROMPT
Create a simplified version of Settlers of Catan for the web. The game will be played locally, with four players passing turns at the same web browser back and forth. It should be a single html file, with embedded Javascript. Make it self-contained, without any kind of backend. Use SVG since that will allow for better rendering of the hex grid. For the photos of resources, use permissively-licensed images you find online from a site like pixabay.com. Try to find images that correctly represent the resource types of each hex. Lay the entire Catan board out, tile by tile, with random resource types. Make sure you include some basic dice-rolling mechanics, complete with animation. Don't worry about any advanced mechanics like the robber, etc - just focus on getting a basic implementation working.
Here are the results. Definitely better, but not MUCH better than the basic prompt. I also ran it several times in a row, and more than half the time the more complex prompt produced WORSE results than the simplest prompt. In general, we found that the upper bound of what it COULD produce went up when we provided more guidance, but on average the simpler prompt returned more usable results.
We had an enjoyable time trying out various features on the call (reach out if you want to learn more about Claude Code and agentic coding), but after the call I decided to experiment further, extending the test to the other frontier models available in April of 2025.
4 Models, 2 Prompts, which one do you like best?
Click each Catan game below to try it out yourself:
When trying out various prompts, remember that prompt engineering has a distribution of outcomes. For example, our more robust and complex prompt could produce a more-correct result, but we had to run it a few times to get that result. It was just as likely to produce WORSE results than the simpler prompt did. When building prototypes, or amusing your coworkers on a call, be ready to run the same prompt multiple times and pick the best one for your purpose.