The "Idea Tournament" Or "Pairwise Evaluation" is an efficient and gamified method to identify the best ideas among a large number of ideas by running multiple tournaments between different  pairs of ideas and asking the reviewer "which idea is better?".



The algorithm keeps a "fail play" by making sure that ALL ideas will get the same exposure (i.e. participate in the same number of tournaments, plus/minus 1) regardless of how many ideas there are and/or how many reviewers participate in the evaluation.


At any point, the campaign manager can see the overall results: which ideas are the best ones (so far) and what is their balance, i.e. number of wins/losses

The "overall score" is calculated by the Wilson algorithm (see more details below) , and is important mainly when sorting the ideas.



Number of tournaments 

The # of tournaments is determined by the # of potential pairs to rate per reviewer is always (#ideas) X (#ideas – 1) / 2

So in case there are 50 ideas, it’s 50X49/1 = 1225 potential pars/ tournaments per reviewer, but any user can stop at any time.


This method is traditionally good in scenarios of many ideas, but even better for cases with many ideas AND many reviewers .

Think about it – 

  • With 12 reviewers – if they each cover 25 pairs (which is not a small number), they covered 25X12 = 300 of the permutations
    • Every idea would still be compared at least 12 times, which is nice
    • But if you compare it to the effort of reviewing 25 ideas, not sure it is much easier

  • But if you had 50 reviewers 
    • Even if each one of them only covered 10 pairs, which is really easy – you already got 500 tournaments – which is a really good coverage (every ideas got compared to 20 ideas)

Additional information re the Wilson algorithm

When users vote on or rank ideas, we often want a system that not only takes into account the proportion of positive votes but also considers the uncertainty associated with a small number of votes. That's where the Wilson score interval comes in handy. 
The Wilson score interval provides a nuanced and statistically sound method for ranking ideas based on user votes. By accounting for both the proportion of positive votes and the uncertainty associated with the total number of votes, it ensures that ideas are ranked in a manner that reflects both their popularity and the reliability of that popularity measure.


Imagine two ideas:

  1. Idea A: Has received 200 votes with 180 upvotes.
  2. Idea B: Has received 10 votes with 9 upvotes.

Simply using the proportion of upvotes, Idea B would rank higher with a 90% positive rate, compared to Idea A's 90%. But intuitively, Idea A should probably rank higher since more people have voted on it, offering a larger and more reliable sample size.


The Wilson score interval provides a solution by giving us a range (confidence interval) in which the true proportion of positive votes for an idea likely falls. For ranking purposes, we often use the lower bound of this interval, as it offers a conservative estimate that accounts for the uncertainty associated with fewer votes.


The formula for the lower bound of the Wilson score interval is:

Given two numbers:

  1. pos: Number of positive votes (e.g., upvotes).
  2. neg: Number of negative votes (e.g., downvotes).

We want to determine a score that accounts for both the proportion of positive votes and the reliability of that proportion, based on the number of votes.

The Solution:

 

z is the z-value from the standard normal distribution, which corresponds to the desired confidence level. For a 95% confidence level, z is typically 1.96.