Jump to content

dalyea

Members
  • Posts

    17
  • Joined

  • Last visited

Profile Information

  • Your Drum Corps Experience
    3 Years
  • Your Favorite Corps
    The Cadets
  • Your Favorite All Time Corps Performance (Any)
    '84 Garfield Cadets
  • Your Favorite Drum Corps Season
    1985

dalyea's Achievements

DCP Rookie

DCP Rookie (1/3)

15

Reputation

  1. For Sale: 18 practice female dance partners. Wooden, adorned in pink. A bit creepy, but less so when 18 male dance partners perform with them.
  2. I ran the models tonight. Here are the results: rank corps NN LM GBM H20 optim score optim NN LM GBM H20 1 Blue Devils 2 2 2.0 2 2 0.9765 0.9676 0.9687 0.9687 0.9696 0.9670 2 Carolina Crown 1 1 1.0 1 1 0.9708 0.9693 0.9716 0.9713 0.9732 0.9682 3 Bluecoats 3 3 3.0 3 3 0.9692 0.9604 0.9606 0.9603 0.9588 0.9605 4 The Cadets 4 4 4.0 4 4 0.9590 0.9523 0.9585 0.9582 0.9547 0.9501 5 Santa Clara Vanguard 5 5 5.0 5 5 0.9385 0.9357 0.9358 0.9353 0.9342 0.9358 6 Blue Knights 6 6 6.0 6 6 0.9185 0.9053 0.9091 0.9088 0.9027 0.9044 7 Phantom Regiment 7 7 7.0 7 7 0.9032 0.8905 0.8954 0.8951 0.8923 0.8888 8 Madison Scouts 8 8 9.0 9 9 0.8875 0.8768 0.8819 0.8818 0.8738 0.8755 9 The Cavaliers 9 9 8.0 8 8 0.8832 0.8778 0.8801 0.8789 0.8765 0.8774 10 Boston Crusaders 10 10 11.5 10 10 0.8680 0.8581 0.8576 0.8563 0.8550 0.8587 11 Blue Stars 11 11 10.0 12 11 0.8515 0.8440 0.8463 0.8474 0.8551 0.8420 12 Crossmen 12 12 11.5 11 12 0.8503 0.8426 0.8322 0.8324 0.8550 0.8444 These results rely on all scores through semis, with the 1/2 point penalty for the Blue Devils added back in to their Thursday show score. (That is the right thing to do, since trying to reckon for penalties introduces unnecessary noise into the modeling process.) The columns names optim refer to the R function optim rank-loss weightings of the 4 models to predict the actual final rankings, whereas the 4 models are built to predict the finals scores for each drum corps. Based on all the data through semifinals, relying on the models built on the 600 data points of the top 12 corps from 2005-2014, Carolina Crown had the statistical advantage going into finals with a predicted winning margin of 0.15 points (rounding down the optim difference). Note the huge score jump for the Bluecoats. For drum corps from position #6 to #12, note that all had scores ~+1 pts. over expected - and remember, these models factor in any finals jumps in scores from past years' data. From semis to finals scoring, it appears to me to be an unprecedented increase in scores for all those drum corps. If we credit Clara for 1/2 point of GE, that would give the Blue Devils a 0.35 winning margin based on the predicte values above. Even now, after seeing semis and finals for myself, I can't quite grasp the 0.575 margin of victory, in absolute terms having seen the shows and in relative terms thinking about the progression of scores from 3 weeks out, 9 days out, then Thursday/Friday/Saturday.
  3. For Sale: eye charts, numbers only, no letters. Each about 4' x 8' - ask your patients to identify 10 numbers accurately at 100 paces in order to gauge long-distance vision.
  4. Highlighting the 2006 Cavaliers as pointed out as the lowest finals score since 2000: The black line shows the Cavaliers scoring trend line. They had quite a few high scores during the season, relative to other winners' scores, yet that telltale 3 score cluster for Thursday/Friday/Saturday at the end of the season.
  5. That's a great explanation of why we're seeing lower scores. I've only seen one show this season at Stanford University, and I haven't followed scores and caption scores much since then. What you said sounds spot on - I recall years in which horns, drums, GE and guard wins were split by different drum corps, and naturally the winning scores in those years are 97.X or thereabouts.
  6. For 2015, scores throughout the season have been at a 12-year historic low. Generally, scores for the top 3 corps - Blue Devils, The Cadets, and Carolina Crown - have been right at -2 points from the champions' score trajectories from 2003-2014. The rainbow of smaller dots are the season scores for the champions from 2003-2014. I highlighted with bigger dots two score sets: Blue Devils 2007 (green), who started the season scoring 75+ and won with a finals score of 98.00, and Blue Devils 2014 (red), the all-time highest score at finals, 99.65. Note the wildly different paths to those wins, and the 7 points increase for the 2014 Blue Devils at -40 days out from finals. The 3 lines below the main array of dots shows the Blue Devils, The Cadets, and Carolina Crown scores for 2015, up until -9 days before finals. If those 3 drum corps names for 2015 weren't known in the plot, you'd think none of them could be contenders to win it all this year. Just in the last week, the 2015 scores have started to intersect with the past champions' scores. In past years, the linear regression for the winner's finals score at -N days from finals often goes to a score at or over 100. It has been very common to see the Thursday/Friday/Saturday show scores for the eventual winner to fall about 1 point below the regression line. This year, it appears that scoring will track, it may be argued, as it should - linearly increasing right through Saturday night. Corps continue to make consistent fixes and improvements in their show performances right up until the very end. Some years, corps pull out surprises for Saturday night only, like SCV did in the 80's. So scores can buck the linear trend for various reasons, but all things being equal, the kids keep working as hard as they have all season, the shows get that much better even in the final week, and the scores should reflect this right along the trend line. Here is the aggregation of the champions' scores, showing the mean/max/min in black and gray: Here we see the mean and max 2015 scores for the top 3 contenders in blue and red, respectively. For the best 2015 drum corps, the scores have been below the min boundary for past winners for almost the entire season. Note the sets of 3, 3, and 4 data points at day -28 and forward which start far below the regression line, but by day -16, catch up to it. Relative to previous years, whether for judges scoring or massive show improvements or both, the top 3 contenders for this year made huge strides. The red line shows the linear regression for the max scores - the smaller red circles - which point to a final score at day=0, finals Saturday night, of 97.60. Will the winner's score exceed 97.60? I think it's entirely likely. Yet, if scores were to follow the trends from the last 12 years, if no one drum corps has miraculous improvement in the final 7 days of the season, and if the eventual winner doesn't pull out a Saturday night surprise show element, it looks like the winning score will be right in the 97.60 to 98.00 range, rounding up a bit factoring in the excitement even for the judges and the importance of finals night.
  7. 100% agree. That is for sure the most problematic part underlying the modeling.
  8. I had to look up that word, that's a new one for me. All models have the chance to be wrong, this one for 2015 rankings predictions included for sure. For sure, the ultimate experience is seeing the corps give their all at finals and finding out which corps is crowned the champion.
  9. You think like a machine. :-) Your predictions are highly aligned with some data science predictions, with The Cadets just edging out BD and CC: http://davidalyea.com/dci/2015/
  10. I'll go with: The Cadets Blue Devils Carolina Crown Santa Clara Vanguard Bluecoats Cavaliers Phantom Regiment Blue Knights Madison Scouts Crossmen Blue Stars Boston Crusaders for my personal picks, but by the numbers: The Cadets Blue Devils Carolina Crown Bluecoats Santa Clara Vanguard Phantom Regiment Blue Knights The Cavaliers Madison Scouts Blue Stars Crossmen Colts And here's why: http://davidalyea.com/dci/2015/
  11. One thing I noticed while doing analysis, but didn't have the time to look into until now, was that all my 2015 scoring predictions based on 2005-2014 training data were unusually low. For other years where I back-tested, my final model scores predicting winners' scores between 97 and 99, which is in line with usual scoring at Saturday finals. The highest predicted scores I was getting this year, from any of the 4 sub-models, was in the 94 range. Why is that the case? The plot below shows that scores this year are -4 points behind scores at the same point in the season, from -30 to -21 days removed from finals, than ALL other years 2003 to 2014. Each set of colored dots are the 5 scores for the champion drum corps from 2003 to 2014 - I've noted that the peak score was an S4 score at T-22 days from finals for the Blue Devils in 2010. The 3 lines below the dots are the 5 scores for the top 3 drum corps for 2015, as of T-21 days from the 8/8/2015 finals. For whatever reason, this is a low scoring year. Were there rules or judging changes? I haven't followed drum corps news all summer, so perhaps I'm not aware of some change that would result in scores being, relative to the past 12 years, so low. What I would find troubling at this point is to see that finals scores rocket to 98+ as semis and finals next week. Certainly, every corps is working hard to perfect their show, and performance levels and scores will rise. However, there is no reason to think that corps will work harder or improve more in the final 21 days before finals this year than in any of the previous 12 years. From what we see in the scores so far this season, if the winning drum corps finals score is much over 96.0, it appears that would be out of line (regression humor?!) with clearly delineated scoring trajectories from the past 12 years. If we assume that the judging criteria and the drum corps' final polishing up their shows remain the same as in all previous years, the scoring trends this year clearly point to a champion score of about 96.0.
  12. If I reversed some numbers, let me know and I'll make the correction. I'll doublecheck my posted 2015 ranks, but I don't recall them being reversed.
  13. I'd have to think about how a random walk approach might lead to predicted scores. I get the thought - I was once working on price predictions and, at some point, I realized that using random walks as a form of simulation would get me the results I needed. Though now, I'd likely use bootstrapping instead - I didn't know about that technique back then! Thanks for reading my predictions and for the idea.
  14. There is an H20 library function called deeplearning now, which is basically neural networks at hyperspeed! H20 has implemented deeplearning to use dropout, which is akin to regularization in some ways, but is more aptly described as multi-modeling to avoid overfitting. I'm keeping that in my hip pocket for future modeling. :-) GBM in a nutshell is tree-based learning, but with a twist. For classification, at each step of defining what the eventual tree will look like, the decision rule is a stump. That is, for one feature only, the model finds a split value for it. The first stump is chosen to classify the data correctly as best as possible. Then, for the next stump, the split is determined by, as before, trying to classify as many items as possible correctly, but the loss function this time weights the misclassified items by the first stump more heavily. The second stump boosts the treatment for misclassified items. This system of boosting to account for misclassifications continues with subsequent stump decision boundaries. One advantage of GBM is that it tends to not overfit the data. If you look at RMSE for tighter and tighter training data fits, the test data RMSE tends to decrease in lockstep and eventually flatten out - without rising as would be seen with overfitting by other modeling techniques. For those reasons, I used two GBM models in my approach for predicting finals scores this year. The GBM library in R then has a build in varimp function to tell us, which of the features was most important when defining all those stumps? As I'd expect, S5, the most recent score, was most important. Surprisingly, S1 or S2 was often 2nd most important. The reason I think this is the case is that S1 and S5 values implicitly define the scoring slopes that are directed toward the target final scores. I'm not sure that S4 and S5 would always provide as much pair-indicated directional guidance toward the finals scores. Presumably, variable selection or variable importance for linear models may also favor S5 and S1/S2 as most relevant.
×
×
  • Create New...