Sunday, 9 February 2014

Kripparrian vs. the Arena Value Application

There exists an arena assist application called Arena Value. Basically, using image capture and recognition, the application reads each three card selection from the Hearthstone window and gives you a numerical rating of each card. Higher ratings are considered superior picks for an arena deck. The application programmer has a thread on the Hearthpwn forums, but it still seems a bit of a mystery how these ratings are computed. From the little that the creator has said, it seems to be an amalgamation of arena card lists from various arena experts (such as Hafu, Trump, Kripparrian, and others). Whether the program takes into account things like mana curve, existing card synergies, etc. is not well-known. That the application isn't open-source will keep the rating algorithm a secret for now.

The preamble out of the way, I was curious "How would the Arena Value application's picks differ from an expert arena player?" I chose Kripparrian for this experiment.

(Be sure to click the images for larger, more readable versions of the deck lists and choices. The bold minions are Kripp's choices. The numbers are Arena Value's card ratings.)

Druid (9-3)
Kripparrian chose differently than Arena Value's recommendations eight times. The most interesting choices were on rounds 17, 22, 23, and 24.

At around pick 16 Kripp did mention that he was short of early draw, and that he needed more 2 mana minions to even out his mana curve. Pick 17, Novice Engineer over Stormwind Champion and Silvermoon Guardian would seem to play into that concern, as well as giving him added card draw. Pick 22 also seems to be a concern over his lack of early game draw; the Innervate gives him potential access to stronger cards in the early game.

Now, the Moonfire at pick 23 is the curious choice. He chose that card over Raid Leader and Silverback Patriarch. His early game was still weak at this point, and Moonfire without +spell damage cards isn't a very good choice in my opinion, and he only had a single Azure Drake. I probably would have chosen Raid Leader, simply because it's a 3 mana drop with minion buff. Watching Kripp's matches, I don't recall Moonfire ever giving him much value. On the other hand, I'm not a self-sustaining/infinite arena player (7+ wins consistently), I'm only a break-even player (4+ wins consistently), so I can't legitimately question Kripp's choices.

The choice of Wild Growth on pick 24 was likely because he felt he was weak in card draw. Yes, he had two Ancient of Lore minions at that point (plus a Novice Engineer), but that's a card that can give card draw or health, so depending on the state of the game when playing the Ancient he might have to forgo card draw.

(One note on this particular arena run. Kripp may have been able to do better, but his last match at 9-2 bugged out, and he was unable to make any plays. He finished 9-3, but he had a good shot at doing better.)

Hunter (5-3)
Kripp stated that it is difficult as Hunter to protect minions on the board, so to offset that he chose Sunfury Protector. With most other classes, Demolisher would have been his card of choice.

On his second pick he wonders if he should create a serious deck by choosing Acidic Swamp Ooze, or take a chance trying his luck with beast synergy. He decides for beasts and chooses Starving Buzzard. One problem he confesses about arena beast decks are a lack of solid mid-game plays.

Committing himself to beast synergy, Kripparrian chose against Arena Value ten times. Not much to say about those picks, since he generally chose in favour of beast synergy each and every time.

5-3 is certainly not the type of result expected from Kripparrian, but weak deck is weak deck even in the hands of an expert player. He likely did better than most players given the deck, but certainly not the 7-3 (and more) that Kripp usually pulls from an arena run. Would he have fared better had he chosen more in line with Arena Value? Perhaps. Probably. It is obviously hard to know for sure, but it is a general consensus that Hunter is a weak arena class to begin with.

Rogue (12-1)
Unlike Hunter, Kripp took this run seriously during picks. He wasn't going to rely on chance to give him particular synergies. He did end up taking two Sprint spells, which he stated was an uncharacteristic choice, but felt his mid-game was so weak that he would need the card draw late game.

Kripp made ten picks that weren't the "best" picks as per Arena Value's rating system. Granted, about five of those picks were toss-ups, the best and second-best picks only having a rating difference of a couple points.

The only interesting pick was Alarm-o-Bot, but considering his other two choices were murlocs, it made sense. Alarm-o-Bot was inconsequential in every match until Kripp's potential 12-0 match, in which the bot cost him the game.

At 8-0, Kripp did express surprise that this deck was doing so well. He expected 7 or 8 wins, tops. Patient Assassin was the unexpected star of the deck.

This was an interesting exercise. While Arena Value can help with card choice, it is still nothing but a tool, not an answer. The player still has to know their cards and what choices will add value to what they've already picked. And then they have to play those choices effectively.

Is Arena Value overpowered? Not in my opinion. Whereas it can't be said for sure that Arena Value's ratings would result in worse decks, I'd like to think that skill and experience count above and beyond a simple algorithm. I'd like to think that where Kripp differed from the application, that he was building the stronger deck. That said, Arena Value is certainly a valuable tool for inexperienced arena players and those looking to improve their arena deck building skills.

I may well do this again for arena streamers such as Hafu and Trump.


  1. This isn't an experiment.

    Your methodology is inherently flawed. If you want to get a real experiment going, you need these people on board. First off, you just randomly throw up decks that the arena player went with, state the number of differences, and then just talk about how well the deck performed. Since the deck builder can't play it's own games, you have to have an actual player play games with, and then without, the deck builder. then you have to make sure you get enough games to actually get a decent idea of how "good" the player is (roughly 20 games will get you started), then compare that to how they do with the deck builder choosing the deck for them.

    Then, ideally , you get a large sampling of such players, to determine the effectiveness of the program on people of different skill levels.

    Of course, this is probably out of your means, but even 3 runs with vs 3 without would be respectable "experiment."

    To call this an experiment is simply a farce.

    1. Thanks, professor.

      Once you pick a deck, you're stuck with it. You can't go back later and try a different set of picks. So a true comparison of decks cannot be made. Apparently you don't know how arena works, thus are unaware.

      And having the subjects unaware made their picks pure. If they'd been using the app, their picks might have been affected by Arena Value's rating system.

      I don't have the time to do a really large sample, but you feel free if you want.

    2. The obvious test to me would be to have players with well known win rates by class, etc. in arena use the app solely for picking cards for like a month, then compare results. It'd be especially cool if there were a wide range of skill levels, but I suspect there aren't a whole lot of low end players with lots and lots of arena stats to compare against.

      Of course, this presupposes that the picker is pretty intelligent, actually constructing a deck rather than giving the highest amalgamated rating from some list. If it's just amalgamating ratings, it's just going to have people do stupid shit like take 5 flamestrikes in the draft.

      The other obvious test would be to calculate various scores for a whole shitload of drafts (median, mean, etc) and compare them to # of wins.

      Of course, if you're doing that, you could simply have it calculate card values based on how they affect win%, given a large enough data set.

    3. I would not say that this is a farce, but I believe that it is not impossible to conduct a more accurate experiment. You could request the co-operation of a few higher skill level players to do this experiment. The premises would be to draft an arena deck, and then record the decklist that was chosen by the player as well as the decklist drafted by the program.
      The rest would be difficult to conduct unless it is a dedicated experiment as it would require the players to construct a proxy "real life" as it would be occasionally impossible to recreate such a deck in constructed. The player could then play the two decks against a set (perhaps 10 or 15) of other drafted arena decks by other high skill level players in a similar scenario. In short two decks would be constructed, one by the player one by the program, each of which would be tested against a set (the same set) of decks to see if there is a significant difference.
      However, I do acknowledge that this would be a difficult endeavor without a lot of time and co-operation from the other players for an experiment that is, in the end, rather inconsequential. Even so, due to the nature of the game depending on luck of the draw as well as the skill of the player, the results are likely to be scattered and not necessarily accurate even with the set of 15 trials and would require perhaps a few hundred. To conclude, I agree with Eric in that it is not a true experiment, but perhaps a rough analysis and evaluation of the program with some given evidence, in comparing the card choices made by Krip and the program with respect to the holistic view of the deck

  2. I don't see the point of this... It doesn't have any info about when Kripp or if Kripp used the ArenaValue recommended picks and how well he did with that deck. How would we know he would do worse with it? I don't think it's justifiable to say that an experienced arena player would make a better deck than a deck that arena value recommends everytime. I honestly think this is post is blogpost is just a summary of 3 of kripp's games haha.

  3. Can you not play the picked deck of the person vs the app in constructed? (this wont work when over 2 of one card is picked), also i guess it doesnt say how well the deck will do against the average deck.

    But I have to agree with the other commenter that you can't just say 'arena apps deck's was worst' based on very little evidence to compare them

    1. An arena deck would be crushed consistently by constructed decks. An arena deck is just a random bunch of cards, whereas a constructed deck is built with synergy and win conditions in mind.

    2. What I meant was use constructed to make 2 decks:
      1. The choices Kripp made
      2. The choices the app made

      Now have deck 1 vs deck 2

  4. I really liked this poetic! I watched some of this stream so it made it more interesting to read for sure, thanks for posting

  5. Kripp's style differs from some other players style. If trump would watch kripp draft, he might differ on 5-8 cards as well. That doesn't mean that trump is bad tho :X

  6. I'd be interested to see how this tool's picks compare to the picks of other, more serious streamers than Kripparian, who, while a very good player, obviously plays more for entertainment than winning.

    Maybe use Trump or Hafu, who don't refuse to use certain classes, and will (probably) provide a closer metric to the value tool.

    1. Sure. I'll see about doing that in the next few days.

  7. I'm obviously late on this, but I wanted to write that I appreciated the writing tone of the document. Besides being an interesting and well-conducted experiment in itself, your reserved approach toward speculation in the results I felt lent the article objective authenticity, something that's rare to find in online write-ups these days. Cheers!