Announcement

Collapse
No announcement yet.

Regionals Luck Analysis

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regionals Luck Analysis

    Confession #14408 on MCTBIAU did a really interesting analysis on the impact of luck on a team's success by regional (which I've attached below for reference), and I immediately noticed that some of the regionals with the most teams were the ones where luck seemed to play the largest factor. So I ran a correlation between the number of teams at a regional and the % impact of luck on a team's chances of making it out (luck impact percentage). The correlation was statistically significant, signaling that every additional team at a regional increases the luck impact percentage by 1.3%. A regional with 20 teams would be predicted to have an average luck impact percentage of 43%, whereas a regional with 30 would be predicted to have an average of 56%.

    Though the correlation is significant, it isn't that strong, suggesting that the percentage impact of luck isn't a direct function of the number of teams at a regional, but is still significantly influenced by it. From a subjective standpoint, this makes a lot of sense- some of the weirdest outcomes from regionals (nats-level teams not making it out, etc) have come out of regions where there were a lot of teams competing (Orlando, New Haven, etc).

    I understand that AMTA has resource constraints when it comes to selecting locations for regionals, but if there is solid evidence that having a large number of teams at a regional creates odd outcomes and allows luck to play a much more significant factor than skill, as a competitor, I feel as though it may behoove AMTA to limit the number of teams a regional can host so that regionals remain more competitive and less influenced by luck.

  • #2
    I'm not a math person so hopefully someone can help me out here. What does the "luck percentage" actually mean in quantifiable terms? Is is the percent of a team's results that are attributable to luck? Or is it something else?

    Comment


    • #3
      This is bonkers.

      Comment


      • #4
        So if a team goes 4-4 because of 56% luck at a 30-team tournament, does that mean 2 of their ballots were unfairly won, or two of their ballots were unfairly lost?

        Comment


        • #5
          According to the original post, the math for luck impact percentage was based on the formula contained in this video: https://www.youtube.com/watch?v=HNlg...MwSlPslfLrIhZc

          Comment


          • #6
            So this data is really interesting. One thing I think that needs to be made clear is that, if the analysis is following the same basic formula of the video it linked originally, the data doesn't say that bad teams are beating good teams because of luck. This is measuring the gap between what the expected outcome in a Regional would be based on what we knew going in, and what the results actually ended up as. Just because a surprise team made ORCS doesn't necessarily mean they were "lucky" it just means that their advance was unexpected. If a regional has a high luck factor, it means that their results turned out to be more unexpected. It doesn't necessarily mean the judging was poor (though you could argue that it is an indication of poor judging).

            I would like to have more information about how the creator constructed what the "expected" results should be for a Regional. Is it based off of TPR? If so does it just not attempt to take into account the relative strength of C/D/E teams? Does it look at invitational results? If so how does it take into account the strength of the invitational?

            I also think the correlation shown above between the luck factor and the size of the regional is pretty clear evidence that alarm over the growing size of regionals is justified. Specifically, the three most "lucky" regionals are 3 of the 4 biggest regionals. If not for Minneapolis, which looks a lot like an outlier, the trend line would be more severe. Finding hosts is of course an understandable problem for AMTA, but its a problem that will need to be tackled as the size of regionals will only continue to grow in the absence of more hosts.
            Last edited by STC; March 7th, 2019, 02:53 PM.

            Comment


            • #7
              Can someone please write an overly-simplistic expert-witness analogy so my feeble brain can understand it? What the hell is being measured here? What does it mean?

              Comment


              • #8
                An interesting take overall. But I see many issues with the model. An OLS would not be an appropriate model for this analysis. There are so many variables that affect outcomes at regionals, such as judging preferences, team strength, coaching, etc. Unless all of these variables are somehow factored in (whether through dummy variables or by regressing a different model), this type of analysis doesn't tell us anything. Additionally, the R-squared is only 0.27 and the adjusted R-squared is 0.24, which shows a very weak correlation between the two variables regressed. Not to mention the very small sample size and extremely small degrees of freedom. To draw any sort of conclusion from this analysis would be, in my expert opinion, improper.

                Comment


                • #9
                  It seems like the only thing being measured is how close the divide was between teams that qualified and teams that didn’t. In detail, if a regional had more teams closer to the middle of the pack (a.k.a., between 3 and 5 wins), the more likely it was that luck played a greater role in the results of that regional. In this analysis, “luck” just refers to a blanket statement about all variables a team cannot control for, and “skill” seems to indicate all factors a team can control for. Essentially, the conclusion from this analysis is: “the more teams clustered in the middle of the pack, the more likely it is that uncontrollable factors may have played a role in who snagged one of the last bids, over who didn’t.” That is wholey consistent with the notion that larger regionals may be a problem, because the increased number of total teams will decrease the percentage of teams any given team sees from the total field. Thus, you will almost guarantee more teams in the middle of the pack, most of whom may not have hit/played each other at all. One solution to this problem (mathematically) is to decrease the field. The other is to add more rounds (as the original video suggested, was a factor that helped decrease randomness). I don’t think adding rounds to a tournament is practical, which means, lowering the number of teams seems like the most plausible solution.

                  The thing I would caution against is suggesting that there was a measurable “expected” outcome before the tournament for individual team performances. It simply compares, in general, what the distribution of wins per team should look like in any game/sport/activity that seeks to measure skill accurately (in our case, teams would be more evenly spread out between 0 and 8 wins), with the actual distribution of wins (which, for Orlando, looked like a steep bell-curve with lots of teams in the middle). It is important to note that this bell curve could still possibly suggest that there is more parity at a tournament like Orlando, than others, in terms of skill. But, according to the original model, it does show that a bigger tournament may be <more likely> to be subject to “luck”, or uncontrollable factors, than a smaller tournament.

                  Someone correct me if I am wrong. I’ am not a STEM major lol. But I love analyzing sports statistics, and had seen that original YouTube video several times before.
                  Last edited by bdopl; March 9th, 2019, 01:07 AM.

                  Comment


                  • #10
                    Edit:

                    1. "Expert" TL;DR- if we canceled trials and just decided rounds based on coin flips, how close would those results look to what we actually got? Higher luck % = looks more like a coin flip tournament.

                    2. I didn't take size into account but it may be the case that, given AMTA's pairing mechanics, I'm simply penalizing tournaments for being large. That would be worth adjusting around, so if someone can confirm that AMTA's primary/secondary bracket voodoo would push more teams toward the middle of the pack at larger regionals (relative to random matchmaking), then I can just redo this adjusting for AMTA's specific matchmaking system and then we won't have to worry about that piece anymore- not just size, but matchmaking effects in general.

                    ---

                    OP of 14408 here. bdopl pretty much covered it. To prove I'm OP or at least make the point moot: Luck % is MAX(1, (((0.25 * 8) / VARP({ballot counts})) - 1/8) * 8/(8-1)). VARP is population variance, as opposed to VAR; I subtract and rescale based on the number of rounds because they set a ceiling on variance (if every team went either 0-8 or 8-0, variance would be 16). The 3 columns in the right are just % of teams with won ballots within a given range- so about a quarter of Orlando teams went 4-4 and 2/3 won between 3 and 5 ballots. Oh and a couple of Regionals had slightly messy data due to Quincy and one other university missing rounds; this inflates variance and deflates Luck % a little bit, but I did not correct for this in the screenshot I sent to MTC.

                    This is all based on a simple counterfactual: If every round were decided by a coin flip, what would the results look like? With coin flips you'd see everyone eventually trend toward 50-50 (Law of Large Numbers) but it's still possible for someone to go 8-0 or 0-8 so you'll see variance (deviation from the average result) that shrinks as you have more rounds. We can just use the binomial distribution to get the variance that would tell us what to expect a coin flip tournament to look like based on the number of trials/ballots. It won't be exact- actual lotteries will not always result in the same outcomes (I ran with an actual lottery and it was actually more "skill-driven" than the 2017 MLS season by the same analysis).

                    All "Luck %" really is, is comparing what we actually saw to that coin flip tournament. How much of the variance in our actual outcomes can be explained through sheer luck alone? Honestly, given that the actual numbers shouldn't be read into too closely ("Orlando was 74% luck" is probably the wrong read here), I should have just done this by showing y'all the expected variance for an 8-ballot coin flip tournament and then all the Regionals variance data.

                    This is a really crude tool. It's based on the idea that, in a skill-driven situation- where "skill" is a stable and predictive attribute of each competitor (think like an accurate Elo rating, or IQ)- you're more likely to get useful signals for predicting future results from past outcomes. In other words, winners will keep winning and losers will keep losing- you'll be able to give a 4-2 team an edge over a 2-4 team (that you wouldn't be able to, if the round were just a lottery). And the more this happens, the more spread out the outcomes will get at the end.

                    So "74% luck" means that 74% of the variance in the actual distribution of won ballots could be explained by luck alone. This could be because of the role of chance in Mock Trial, a lack of skill differentiation between teams at a Regionals tournament, pairing/matchmaking (power-matching hurts, power-protection helps increase variance), or anything else that would affect variance in # of ballots won- including just randomness. Keep in mind that "% Luck" is global- it refers to overall "luck" (i.e., non-stable factors- including stuff like a team being stronger on one side of the case; basically anything that can't be harnessed predictively) over the course of a tournament specifically in terms of the number of ballots you win, not over an individual round.

                    I'm glad y'all are finding this interesting though; I posted on MTC without much context just to see how people would react to someone telling them their ORCS bids were 30-70% "luck" (without a clear definition of luck). Due to things like matchmaking, it is hard to really compare this across events (e.g., against hockey) and draw meaningful data- although the NBA, for comparison, tends to consistently land in the mid single digits iirc. There are different ways of performing the same "skill vs. luck" analysis- at least within the framework that ties "skill" to being able to predict future matches from prior rounds- to avoid some of the confusion of this approach. I'm sure we've got people in here that can provide those approaches (my major, while "STEM", also doesn't really touch too closely on statistical analysis; skill rating and quality pairings in competitive contexts are just personal hobbies/interests of mine).
                    Last edited by Zephaniah; March 8th, 2019, 07:58 PM.
                    - Zephaniah

                    Comment


                    • #11
                      Originally posted by Zephaniah View Post
                      if someone can confirm that AMTA's primary/secondary bracket voodoo would push more teams toward the middle of the pack at larger regionals (relative to random matchmaking), then I can just redo this adjusting for AMTA's specific matchmaking system and then we won't have to worry about that piece anymore- not just size, but matchmaking effects in general.
                      I'm in a very similar boat to you -- STEM major that doesn't really deal with statistics. So, take everything I say with a heavy dose of salt.

                      But -- based on what you posted about the equation you're using -- I think a lot of what this is measuring isn't based on anything inherent to mock trial, but just a function of power matching. This kind of variance analysis, as I understand it, would work if all the rounds were randomly matched. But here, that's not the case.

                      To use a simplified example. Imagine a tournament with 16 teams, no side-constraints, and only one judge per round. Immediately, no matter how the activity works, 8 teams will be 1-0, 8 teams will be 1-0. Then, after all teams are paired against like teams, 4 will be 2-0, 4 will be 0-2, and 8 will be 1-1. After R3, 2 will be 3-0, 6 will be 2-1, 6 will be 1-2, 2 will be 0-4. Then, at the end, 1 will be 4-0, 4 will be 3-1, 6 will be 2-2, 4 will be 3-1, and 1 will be 0-4. All of that has nothing to do with who you're likely to hit, how large the region is, etc. That sort of analysis is very different from saying that Basketball is less random than Football because it has more possessions.

                      Now -- according to my very tired brain -- I think this changes when you take into account judges splitting ballots. And, I think it actually moves in the right direction. If more judges split ballots, teams will regress (slightly) to the mean, reducing variance, reducing the role of skill. Judges splitting ballots would happen more in a random activity.

                      That being said, I think this is a really interesting subject, and I'd welcome anyone explaining why all of this analysis is wrong. But I think looking round-by-round, instead of at the whole tournament, may be the best way to find the randomness given how structured pairings for AMTA tournaments are.

                      Comment

                      Working...
                      X