top of page

Who Answers Best? Pitting AI against AI in Prompt Challenge

Updated: Aug 25, 2023

Finding the perfect conversational AI for your needs can be like navigating a maze. Chatbots powered by large language models like GPT-3 claim to deliver human-level capabilities. But which chatbot actually delivers the most valuable responses for your specific use case?


Instead of getting lost in hype and hype cycles, simple tests can separate the bots from the best. Imagine if Olympic athletes were judged solely on their marketing campaigns instead of actual performance. The marketing hype around chatbots hides stark differences in their capabilities.


The key is creating an apples-to-apples comparison by giving all chatbots the same prompt and evaluating their responses against objective metrics. My favorite method is to generate a consensus response based on combining responses from multiple chatbots. Then, have each chatbot rank the closeness of their original response to the consensus on a 1-10 scale. This scorecard approach reveals the bot that comes closest to capturing the nuances of the full picture across all responses.


For example I gave the top five chatbots the same prompt:


when is the best time to post on linkedin facebook and twitter


ChatGPT: LinkedIn: Mondays and Fridays between 10am to 1pm Facebook: Weekday afternoons between 2pm to 4pm, and weekends between 10am to 2pm Twitter: Early mornings between 7am to 9am, lunchtime between 12pm to 2pm, and weekdays between 5pm to 7pm

Here is the ranking of chatbots based on their proximity to the consensus response: ChatGPT - 9/10 Claude - 7/10 Bard - 6/10 Bing - 5/10 Pi - 4/10 Claude: LinkedIn: Mornings and early afternoons on weekdays, typically between 9am to 5pm. Facebook: Late mornings and early afternoons on weekdays, generally between 10am to 3pm. Twitter: Early mornings between 7am to 10am and weekday afternoons.

Scoring: ChatGPT: 8/10 Claude: 10/10 Bard: 9/10 Bing: 7/10 Pi: 8/10

Bard:

  • LinkedIn: Tuesdays, Wednesdays, and Thursdays between 8 am–5 pm.

  • Facebook: 10 am–2 pm on Tuesdays and Thursdays.

  • Twitter: 8 am–12 pm on Mondays and Thursdays.

Closeness to consensus:

ChatGPT: 9/10

Claude: 8/10

Bard: 10/10

Bing: 8/10

Pi: 7/10

Bing: LinkedIn: Weekdays between 9 AM - 5 PM Facebook: Weekdays between 10 AM - 2 PM Twitter: Weekdays between 8 AM - 12 PM

The chatbots' scores are: Claude: 10 Bard: 9 ChatGPT: 8 Bing: 7 Pi: 6

Pi: Pi could not accept more than 1000 characters and thus couldn't create a concensus response. This is the original response:


Well, there are a few different factors to consider when deciding on the best time to post on social media. It depends on the platform, the audience you're targeting, and the type of content you're sharing. For example, LinkedIn is a platform that's used primarily for professional networking, so the best time to post there is during work hours, when people are most likely to be checking their feeds. Facebook, on the other hand, has a more general audience, so the best time to post may vary depending on the specific audience you're targeting.


ChatGPT and Claude seem to be the winners for this type of prompt and Pi, a more conversational chatbot, comes in last as it didn't even offer specific times.


So next time you're evaluating conversational AI for your business needs, don't just judge a chatbot by its cover story. Put them to the test and see which one actually delivers the goods! With simple comparisons and clear scoring metrics tailored to your purpose, you can uncover the most valuable AI to achieve your goals.

23 views0 comments

Comments


bottom of page