What to do when there are multiple valid answers to an ARC-AGI test?
Consider Puzzle ID: 0d87d2a6
https://arcprize.org/play?task=0d87d2a6
Here are the three examples:
Before looking at the test, the rule appear to be simple:
Draw horizontal and vertical blue lines to connect blue pixels on the outside edges
If any red objects are intersected by a blue line, recolor them to be blue
But them we look at the test:
There are two situations in the test image that were not present in the example images and there are multiple ways they could be dealt with.
Blue Lines:
In the example images, there is only ever one blue pixel on each edge. In the sample image, there are two blue pixels on each of two edges. We need to make a guess at how general the rule is. We have two options that are valid:
Blue Line Specific: Use a horizontal blue line to connect any blue pixels on the right/left edges in the same row. Use a vertical blue line to connect any blue pixels on the top/bottom edges in the same column.
Blue Line General: Use a horizontal or vertical blue line to connect any blue pixels that are on an outside edge in the same row or column. Note even more general rules exist but they won’t make a difference with this test image. E.g. Use a horizontal or vertical blue line to connect any pair of blue pixels in the same row/column.
Red Object Recoloring:
In the example image, the blue lines either cross a red object (there is overlap) or remain clear of it by at least one row/column of black pixels. In the test images, there are one or two situations where the blue line is directly adjacent to a red object (touching but with no overlap.) We need to make a guess at how general the rule is. We have two options that are both valid.
Red Recolor Specific: Recolor a red object anytime that a blue line overlaps with the red object.
Red Recolor General: Recolor a red object anytime that a blue line makes overlapping or adjacent contact.
In the test, you are allowed to make two guesses and if either guess is correct, you are given full points. Given that, one might assume that a safe strategy is to make one guess with the most general rules and one with the most specific rules.
Blue Line General & Red Recolor General: Wrong
Blue Line Specific and Red Recolor Specific: Wrong
Unfortunately, neither of those options is considered correct so now we will move on to mixed examples.
Blue Line General and Red Recolor Specific: Wrong
Blue Line Specific and Red Recolor Specific: Correct
I think this is quite interesting. Would a group of 100 intelligent people consistently choose the fourth option?
What is the state of the art for properly guessing when there are multiple solutions that could all be justified with reasonable arguments?