Jump to content

Slotting 2013


Recommended Posts

1. Clearly you didn't mean to say that observer bias is irrelevant in the end? Regardless of how consistent the 'methodology' is? Did you mean that observer bias becomes harder to detect due to smoothing?

2. Thank you for clarifying this nonsense. Your quote in Deming refers to a study confirming the existence of observer bias. That's it. Of course it would compare the blinded vs. non-blinded assessors and would find that the blinded assessors were unbiased and the non-biased assessors were biased. That's it? Nobody on this thread doubted the existence of observer bias. Your claim (that you were trying to back up I thought?) was specifically that you were able, by simply looking at 2013 DCI scores, to detect observer bias, and in fact deliberate tampering (presumably by the judges).

I suppose it should be trivial for you to crunch some numbers and demonstrate this bias. The irony is that it may be possible to detect bias with actual analysis. Who knows? Please enlighten us. Numbers please.

Good grief! Again regurgitating last year's nonsense. Observer bias is not irrelevant; but, in a regulated process, it's effects are. Slotting is a form of observer bias. In fact, it is perhaps a form of bias used to counteract bias, but observer bias nevertheless. YOU already provided the numbers and even alluded to them in the chart you published a few pages back. Look at the spikes you had mentioned. Those spikes occurred across several corps at the same event on several occasions. A single assessment of those events would erroneously assume either an incompetent judge or the fix was in. However, those spikes settled back into stability at the following event, significantly lower but still with a respectable amount of growth. In layman's terms, the bias corrected itself and a straight line superimposed from immediately prior to immediately after gives an assessment of the slope of change. A single spike is evidence of error/failure/or other special cause variable. Your example showed 4 very distinct occurrences of 4-5 corps eliciting concurrent, equidistant spikes. That is very clear evidence of an attempt to preserve the overall spacing and range. THAT is slotting. It is NOT blindly pegging a position for a corps. It is perhaps why other sports I had mentioned use more of a mean-based system, often tossing the lowest and highest. They've found a method that prevents judging oneself into a corner. "Tampering" by the way, is jargon to indicate any intervention when process measurements appears to move out of control.

And since you have obviously missed a few math classes, you've also missed a few English ones too. As I have REPEATEDLY stated, over time, biases observed in individual trials, under consistent mechanisms, DO stabilize about a mean. And yes, statistical models do apply to all data, subjective or otherwise as they relate to themselves over time, and show that at the end of the season, the judges get it pretty right. If anything, slotting during individual trials would preserve this end of season result by acting as the consistent mechanism to allay the Avon Effect - counteracting the influencing of an assessment based upon a perceived notion of where that assessment stands in relation to a previous assessment; or, score inflation/deflation by comparing to another corps. Am I saying the fix is in? Absolutely not. Am I saying it is an unspoken policy? Absolutely not. But, it would be silly to assume that "fixes" aren't employed to keep the overall process intact. I've never seen anyone so stubbornly argue with someone that is supporting your end result. I suggest you lay low.

Edited by 13strokeroll
  • Like 1
Link to comment
Share on other sites

I know a little about statistics, but not the degree being discussed here. Seems there is a simpler explanation, since humans are involved, a time range is involved, and a cap on the high score is involved. Whoever wins is going to be somewhere between 97 (split captions) to 99 (if a corps wins every caption) in the scoring range, with each caption being 9.8-10.0. Back up to the start of the season, pace the scoring so you don't enter box 5 before Atlanta, and so forth. Sure seems like that combined with season long improvement plus judge variability can account for the trends we see.

Without all the technical terms this is the explanation that has always made sense to me.

Feel free to point out the flaws in my argument. I won't know if you are right are wrong.

p.s. I don't believe in slotting. Looks like that maybe happened years ago, but with captions consistently all over the place vs. overall ranking I find it hard to argue that it is prevalent.

You just slotted. You pegged a scoring range and used the magic word - "pace." You set up a preconceived notion of scores and ranges and "paced" your assessment according to its position in and over time.

Edited by 13strokeroll
Link to comment
Share on other sites

This pretty much sums up exactly my take on the situation. Statistical models don't really enter into it as none of the numbers are objective measures of anything beyond relative rankings of a particular set of corps during a show.

There's almost certainly some interesting sociological/psychological elements that could be studied in terms of how the judges synthesize the performance, their training, other judges' scores, typical scores for this point in the season, number of corps in the show, personal connections with the corps, execution styles favored by the corps vs the judges, etc, etc.

Measuring subjective numbers is actually a more repeatable method to monitor process validity than measuring "concrete" data in determining uncommon variables, as long as the correct range is recognized within individual trials, believe it or not. Greater understanding of the range of possible scores during a single trial (show) opposed to a much narrower range of outcomes for more concrete data such as the diameter of a drilled hole, allows the process to regress to its own mean. Keep in mind, it is not a single value that is important, it is that value's relationship to other values around it that is critical. That's why many knowledgable people don't give 2 hoots about scores, but they get bent out of shape over disproportionate ranges and spreads. They might not be versed in statistics, but they are intuitively practicing it.

Edited by 13strokeroll
Link to comment
Share on other sites

Or that scores consistently clump around 4 distinct median ranges season after season after season after season - a statistical impossibility unless an additional variable is introduced. Either 25 staffs are in collusion to manipulate the process, or a mechanism is obfuscating randomness, either intended or unintended.

Or, we have several clumped medians because the corps tend to generally attract the same talent level of performers. Currently I think Cadets, Blue Devils, and Crown attract highest-level of achievers, Phantom, Bluecoats, SCV, and Cavaliers attract the next highest-level. BAC, Scouts, BK attract the same level of performers, etc. Sometimes those corps will have the special combination of higher-achieving staff + a design that helps the corps excel to non-normal heights (or conversely, a corps will have less-achieving members and a poor design that kills a corps' momentum). Success breeds success, and it's not some unfathomable concept to understand that perceived "slotting" is really just consistency for the most part.

Link to comment
Share on other sites

Or, far more likely, you don't understand how the judging system works. It's not based on randomly generated numbers. The "additional factor" you're looking for is called criteria-based evaluation.

Maths are hard, yo.

YES!! You, are correct sir! ($.02 to Phil Hartman's estate)

Link to comment
Share on other sites

Or, we have several clumped medians because the corps tend to generally attract the same talent level of performers. Currently I think Cadets, Blue Devils, and Crown attract highest-level of achievers, Phantom, Bluecoats, SCV, and Cavaliers attract the next highest-level. BAC, Scouts, BK attract the same level of performers, etc. Sometimes those corps will have the special combination of higher-achieving staff + a design that helps the corps excel to non-normal heights (or conversely, a corps will have less-achieving members and a poor design that kills a corps' momentum). Success breeds success, and it's not some unfathomable concept to understand that perceived "slotting" is really just consistency for the most part.

In a manner of speaking, yes; but, I think from a reversed perspective. The similarities in base ability are probably the vehicle that justifies the process to right itself while still maintaining process integrity. If say, a 23-25 corps scores a 75 in early July, those clumps are going to preclude some adjustments to preserve the integrity of the spacing with those yet to perform. A 4-10 corps, not as much, but still to a smaller extent. Is the fix in? Not per se. The overall continuity still remains. My original comment about the medians was partially tongue in cheek. My only concern was the potential to pigeon hole oneself into a corner - much as the example I gave. Preserving the spacing is the correct method, but it doesn't take the spotlight off of the original oops.

Edited by 13strokeroll
Link to comment
Share on other sites

Good grief! Again regurgitating last year's nonsense.

Some things never get old, eh?

Observer bias is not irrelevant; but, in a regulated process, it's effects are.

It would be one thing if you knew what you were talking about and were just talking over everyone's head with terms like "regulated process" and "process validity" with the hubris of the awful professor who can't teach. But you aren't. You're just a guy trying to make everyone think that. And nobody is buying it. Please state your expertise for the record (i.e. coursework in statistics) as I have done.

Slotting is a form of observer bias. In fact, it is perhaps a form of bias used to counteract bias, but observer bias nevertheless.

Is this the specific form of observer bias you claim leaps out from the raw 2013 DCI raw scores like a stripper from a cake?

YOU already provided the numbers and even alluded to them in the chart you published a few pages back. Look at the spikes you had mentioned. Those spikes occurred across several corps at the same event on several occasions. A single assessment of those events would erroneously assume either an incompetent judge or the fix was in.

This is absolutely false. It indicates nothing more than that some judges score higher than others. This is a normal variation and certainly does not indicate incompetence or cheating.

However, those spikes settled back into stability at the following event, significantly lower but still with a respectable amount of growth. In layman's terms, the bias corrected itself and a straight line superimposed from immediately prior to immediately after gives an assessment of the slope of change. A single spike is evidence of error/failure/or other special cause variable.

You're not even mentioning whether the same judges were involved or different judges. It's even possible that a judge might judges everyone high at one show, then correct himself/herself at the next show (presumably after seeing that their scores were consistently high). Still no harm, no foul. No incompetence (at least none to speak of) and certainly no cheating.

Your example showed 4 very distinct occurrences of 4-5 corps eliciting concurrent, equidistant spikes. That is very clear evidence of an attempt to preserve the overall spacing and range. THAT is slotting. It is NOT blindly pegging a position for a corps.

What? It certainly is not evidence of any such thing. It is only evidence of some judges judging higher than others. This is 1. not slotting, and 2. perfectly acceptable (withing reasonable limits that people at DCI presumably discuss at length.)

It is perhaps why other sports I had mentioned use more of a mean-based system, often tossing the lowest and highest. They've found a method that prevents judging oneself into a corner.

I'm no expert on those sports, but from the press available about those judging systems, the statisticians appear to be sharply criticizing that method. Because it appears to correct for bias but does not actually work. In particular it has been found that the most accurate judges (however they measure that) tend to judge a bit higher than others, resulting in their scores being thrown out.

I believe those sports should consider the DCI model of breaking down the judging into several different categories and having judges specialize.

"Tampering" by the way, is jargon to indicate any intervention when process measurements appears to move out of control.

As soon as you've demonstrated your expertise in statistics I'll think about taking that ridiculous statement seriously.

And since you have obviously missed a few math classes, you've also missed a few English ones too. As I have REPEATEDLY stated, over time, biases observed in individual trials, under consistent mechanisms, DO stabilize about a mean.

So? Of course they would. We have only been talking about the individual scores from shows, not means. Irrelevant obfuscation. (Did anyone even perform a mean on any of these scores? Or talk about it on this thread?)

And yes, statistical models do apply to all data, subjective or otherwise as they relate to themselves over time, and show that at the end of the season, the judges get it pretty right. If anything, slotting during individual trials would preserve this end of season result by acting as the consistent mechanism to allay the Avon Effect - counteracting the influencing of an assessment based upon a perceived notion of where that assessment stands in relation to a previous assessment; or, score inflation/deflation by comparing to another corps. Am I saying the fix is in? Absolutely not. Am I saying it is an unspoken policy? Absolutely not. But, it would be silly to assume that "fixes" aren't employed to keep the overall process intact. I've never seen anyone so stubbornly argue with someone that is supporting your end result. I suggest you lay low.

That's hilarious, especially at the end of such a beautiful work of obfuscation. Now you claim that you are supporting my end result? Because the biases cancel each other out over a season? (See how easy it is to say things in plain English when you're not trying to BS people?)

So, what are you saying is my "end result"? That the scores at finals are reasonably accurate? Did I ever bring that up? I didn't even claim the scores were unbiased; in fact I said they probably are biased.

Link to comment
Share on other sites

Your example showed 4 very distinct occurrences of 4-5 corps eliciting concurrent, equidistant spikes. That is very clear evidence of an attempt to preserve the overall spacing and range. THAT is slotting.

Wait, do you mean the example I posted? The random numbers!?

Link to comment
Share on other sites

... speaking of which, I've redone my randomly generated score path, but this time including a "bump" for each show of -1 to +1 to represent judging that is high or low consistently across corps. This is of course highly simplistic, but still results in a path more similar to the BK chart than the previous one. Note that not all corps go up or down together; it depends on other variations to their score. But this is essentially the kind of data that 13strokeRoll claims represents obvious observer bias!

2r76no5.png

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...