Playability Heuristics

Next up in the reading stack is Playability Heuristics for Mobile Games.

Stemming from the literature on usability heuristics, the authors (Korhonen and Koivisto) develop a set of playability heuristics for mobile games. In the paper, they present their motivations for developing these heuristics, the heuristics themselves, and the ‘results’ of their ‘validation’ of these characteristics.

Their heuristics are grouped into three categories: gameplay, mobility, and game usability. Their initial list of heuristics was made up of the following:

  • Don’t waste the player’s time
  • Prepare for interruptions
  • Take other persons into account
  • Follow standard conventions
  • Provide gameplay help
  • Differentiation between device UI and the game UI should be evident
  • Use terms that are familiar to the player
  • Status of the characters and the game should be clearly visible
  • The player should have clear goals
  • Support a wide range of players and playing styles
  • Don’t encourage repetitive and boring tasks

In order to validate these heuristics, six evaluators applied them to a selected application and noted all playability problems that were both covered and not covered by the list of heuristics. The evaluators found 61 playability problems, but 16 of these were not adequately described by one of their heuristics. Thus, the authors expanded their initial set of heuristics into three expanded sublists (one for each ‘category’):

  • Game Usability
    • Audio-visual representation supports the game
    • Screen layout is efficient and visually pleasing
    • Device UI and game UI are used for their own purposes
    • Indicators are visible
    • The player understands the terminology
    • Navigation is consistent, logical, and minimalist
    • Control keys are consistent and follow standard conventions
    • Game controls are convenient and flexible
    • The game gives feedback on the player’s actions
    • The player cannot make irreversible errors
    • The player does not have to memorize things unnecessarily
    • The game contains help
  • Mobility
    • The game and play sessions can be started quickly
    • The game accommodates with the surroundings
    • Interruptions are handled reasonably
  • Gameplay
    • The game provides clear goals or supports player- created goals
    • The player sees the progress in the game and can compare the results
    • The players are rewarded and rewards are meaningful
    • The player is in control
    • Challenge, strategy, and pace are in balance
    • The first-time experience is encouraging
    • The game story supports the gameplay and is meaningful
    • There are no repetitive or boring tasks
    • The players can express themselves
    • The game supports different playing styles
    • The game does not stagnate
    • The game is consistent
    • The game uses orthogonal unit differentiation4
    • The player does not lose any hard-won possessions

This expanded set of heuristics was validated using the same process, only now with five different games. Based on this process, the authors draw the following conclusions:

  • Usability problems were both the easiest to identify with their heuristics, as well as the easiest violations to make.
  • More mobility problems were found than expected.
  • Gameplay is the most difficult aspect of playability to evaluate.

Yowza–talk about a scattered paper. I mean, this bad boy is all over the place. It seems as though the authors’ thoughts simply haven’t gelled well at all. Nevertheless, they do present what seem to be reasonable heuristics for the evaluation of playability. I have two primary problems with this paper. First, an the world of smartphones and mobile games has changed dramatically in the last decade. I would imagine an more recent look at playability is both available and more useful. Second, while their heuristics seem reasonable, and they claim to have validated these heuristics, I can’t find any evidence of this. Do Korhonen and Koivisto not understand that just using a set of heuristics doesn’t imply that they are valid? This leads to the bigger question of what it means for a set of heuristics to be valid. Do valid heuristics completely describe all possible problems? Is the ‘most’ valid set of heuristics that which completely describes all possible problems with the fewest heuristics? I’m not sure. I am sure, however, that writing a list of heuristics and then applying them absolutely does not make them valid. The analysis necessary to do so just isn’t present in this paper. Even if the authors claim to have begun to validate their framework of heuristics, they certainly haven’t presented any such results in this paper. While the work shows (showed) promise, I find this both misleading and frustrating.

Usability Heuristics Usability

My reading this week included Nielsen’s Enhancing the Explanatory Power of Usability Heuristics. As usual, I’ll get my trivial beef out of the way up front.

First, the paper is downright painful to read. The English-as-a-second-language rule buys back a few points for Nielsen here, but seriously?:

Note that it would be insufficient to hand different groups of usability specialists different lists of heuristics and let them have a go at a sample interface: it would be impossible for the evaluators to wipe their minds of the additional usability knowledge they hopefully had, so each evaluator would in reality apply certain heuristics from the sets he or she was supposed not to use.

Sure, I’m nitpicking, but that sentence makes my inner ear bleed.

Before going any further, some orientation with respect to the aim of the paper is in order. Surrounding the multiple self-citations Nielsen makes right out of the gate (before the third word of the paper), he defines heuristic evaluation as

a ‘discount usability engineering’ method for evaluation user interfaces to find their usability problems. Basically, a set of evaluators inspects the interface with respect to a small set of fairly broad usability principles, which are referred to as ‘heuristics.’

(I’ll forego my opinion that usability should be concerned with issues beyond just those in the interface itself…) A number of batteries of these usability heuristics have been developed by different authors, and in this paper Nielsen’s aim is to synthesize ‘a new set of usability heuristics that is as good as possible at explaining the usability problems that occur in real systems.’ In short, Nielsen compiles a master list of 101 heuristics from seven lists found in the literature. Armed with this master list, he examines 249 usability problems across different stages of development and types of interfaces. Each of the heuristics was given a grade for how well it explained each of the 249 problems. A principal components analysis (PCA) of these grades revealed that no heuristics account for a large portion of variability in the problems he examined.

After his PCA Nielsen groups individual heuristics into larger factors–essentially heuristic categories. In his opinion, seven of these categories warrant presentation given here in decreasing order of PCA loadings as calculated by Nielsen:

  • Visibility of system status
  • Match between system and real world
  • User control and freedom
  • Consistency and standards
  • Error prevention
  • Recognition rather than recall
  • Flexibility and efficiency of use

His presentation of these factors and their component heuristics is troubling and confusing. First, the highest PCA loading of any of these factors is 6.1%. Not only is this an exceedingly small amount of explanatory power, it represents the aggregated contribution of 12 individual heuristics! Furthermore, the individual heuristic loadings themselves seem to be at odds. As an example, the heuristic speak the user’s language taken from one source in the literature and another speak the user’s language taken from another source give respective loadings of 0.78 and 0.67 Why do two identically phrased heuristics have different loadings? Furthermore, why are two identically phrased heuristics even present in the master list at all? This should, at the very least, be addressed by the author. Without some sort of explanation, I am wary of taking Nielsen’s PCA results seriously. Nielsen sweeps this under the rug, stating that ‘it was not possible to account for a reasonably large part of the variability in the usability problems with a small, manageable set of usability factors.’ (That, or some data preprocessing or an upgraded PCA gizmo was in order…)

Nielsen states that 53 factors are needed to account for 90% of the variance in the usability problems in the dataset. I’m lost. The factors for which Nielsen did show the component heuristics had an average of 10 heuristics each. With only 101 total heuristics, how does one arrive at 53 factors (in addition to the others that account for the remaining 10% of variability)? Is Nielsen shuffling heuristics around into different factors to try and force something to work? To make matters worse, Nielsen states that ‘we have seen that perfection is impossible with a reasonably small set of heuristics’. No, you’re missing the point, Nielsen. Perfection is impossible even with a very large set of heuristics. At this point, I’m beginning to lose faith that this paper is going anywhere meaningful…

So, since perfection is impossible, Nielsen pivots to using a new lens for the data. Now, it’s a head-to-head match of the individual lists of heuristics gathered by Nielsen. Here, he ‘consider[s] a usability problem to be “explained” by a set of heuristics if it has achieved an explanation score of at least 3 (“explains a major part of the problem, but there are some aspects of the problem that are not explained”) from at least one of the heuristics in the set.’ Strange, I guess we are now ignoring Nielsen’s previous statement that ‘the relative merits of the various lists can only be determined by a shoot-out type comparative test, which is beyond the scope of the present study’… Nevertheless, based on this approach, Nielsen gives the ten heuristics that explain all usability problems in the dataset and the ten that explain the serious usability problems in the dataset. With this new analysis in hand, and after jumping through several hoops (I’m not entirely clear on how Nielsen’s data were rearranged to make this new analysis work), Nielsen concludes that ‘it would seem that [these lists of ten heuristics indicate] the potential for the seven usability factors to form the backbone of an improved set of heuristics.’ Going on, Nielsen then states that two important factors are missing: error handling and aesthetic integrity…so we’ll add those to the list, too. In other words, even though my data don’t bear this out, I’m adding them because they’re important to me, dammit.

I’m utterly confused. How is it that one can take real data, slice and dice the analysis several ways, never really get the data to shape up and prove your point, and then act like it does? Add to this the necessary hubris to come out and say, ‘Hey, even without the data to prove it, I’m stating that these are equally important factors’, and I’m left wholly unimpressed with this paper.