Determining how difficult a word is to type on a QWERTY keyboard



I'm looking for a reasonably simple algorithm to determine how difficult it is to type a word on the QWERTY layout.

The words would not necessarily be dictionary words, so a list of commonly mistyped words or the like is not an option. I'm sure there must be an existing, well-tested algorithm, but I can't find anything.

Can anyone offer any help or advice? I'm coding the algorithm in python, but any other language or pseudo-code is welcome.

Herman Schaaf

Posted 2010-12-16T09:49:20.460

Reputation: 23 583

Maybe if you read about the logic behind the Dvorak keyboard, it could help you. – ruslik – 2010-12-16T10:02:22.230


A crude solution would be to get data on typing errors (discussed and work out the error rate for each key. It's problematic because typos are often contextual (transpositions, confusion between similar words, common endings, etc). To take some context into account you could do 2-grams instead (error rates for each key following each other key).

– Michael Dunn – 2010-12-16T10:28:40.097



There is this comparison between QWERTY, Colemak and Dvorak layouts, which calculates the distance between the keys typed, the percentage of keys on the same hand, etc. with source code in Java. These metrics in combination should give a very good estimate of the 'typeability' of a word.

Herman Schaaf

Posted 2010-12-16T09:49:20.460

Reputation: 23 583

Zombified this thread! That's a dead link- Happen to know if the source code still lives anywhere? – Alexander Lucas – 2016-10-11T23:28:12.347


Take out your Scrabble set, note down the scores for each letter, total the scores for a word, hey presto you have your algorithm. Not sure it entirely satisfies your requirements, but it might point you in a useful direction. You might, for instance, want to assign scores not only to individual letters but also to di- and tri-grams.

I'm not aware of any existing source of the information you need, perhaps you could come up with your own letter scores by examining the keyboard and assigning higher scores to the more difficult letters: so 1 for 'a', 8 for 'q', 2 for 'm', and so on.

EDIT: I seem to have confused people more than I usually do when I reply on SO. Here's the barebones of my proposal:

a) List all trigrams and digrams which occur in English (or your language). To each of them assign a difficulty-of-typing score. Do the same for individual letters (after all a 4 letter word might be composed of a trigram and a letter rather than two digrams).

b) Score the difficulty of typing a word as the sum of the difficulty of typing its components.

As for the difficulty scores, I haven't a clue, but you could start from 1 for a letter on the home keys on a keyboard, 2 for a letter which uses the index fingers but is not a home key, 3 for a letter which uses the 2nd or 3rd fingers on your hand, and so on. Then for digrams, score low for easy letters on left and right (or right and left) in sequence, high for difficult letters on one hand in sequence (eg qz, though that's perhaps not valid for English). And on you go.

High Performance Mark

Posted 2010-12-16T09:49:20.460

Reputation: 68 422

1why/how is q 8 times harder to type than a?! – fearofawhackplanet – 2010-12-16T10:17:18.123

1So make the cost of typing a q 1.87 times than the cost of typing an a. I'm offering ill-thought-through, spontaneous advice, not trying to spoon-feed the OP a solution. – High Performance Mark – 2010-12-16T10:25:45.813

ok point taken, i actually like the basic idea (though i'd argue you need to look at groupings of letters, not single letters) but the way it reads to me you are suggesting a commanilty between the frequency a letter may occur in natural language and how hard it is to type. – fearofawhackplanet – 2010-12-16T10:45:21.163

-1 Scrabble scores represent the occurrence rate of alphabet letters of the game's locale. It has no relation to typing difficulty on a QWERTY keyboard. – Petrus Theron – 2010-12-16T16:27:40.603


I don't have any algorithms to propose, but a few hints:

  • I use both hands to type, meaning that the keyboard is roughly split in 2 halves, it is frequent that I have coordination issues between the two hands, meaning that each type the letters in the "right" order but the interleaving is wrong. This is especially true if one hand has more letters to type than the other, typical: "the" because the left hand type t and e and the right hand types h.

  • "slips" are frequent, meaning that often time one is going to miss the key and hit another key instead; "addition" / "deletion" are frequent too, ie typing a supplementary key or not pushing hard enough --> this mean that (obviously) the more letters there is, the harder it is to get the word right.

  • mix case makes it harder, it requires synchronization between pushing CAPS and hitting the keys, so it's likely that the nearby keys won't have the right upper/lower case.

Hope this helps...

Matthieu M.

Posted 2010-12-16T09:49:20.460

Reputation: 199 004


I think, manhatten distances algorithm could be closest of what you are looking at. That algorithm takes into account the distance of the target from the source in the quadrangular fashion.

As for the implementation in python, for your specific need of difficulty in QWERTY, you will have to write one for yourself, otherwise few manhatten distances implementation can be found if you google for "n puzzle solver in python"

Senthil Kumaran

Posted 2010-12-16T09:49:20.460

Reputation: 33 610

Manhattan distance between what 2 points ? – High Performance Mark – 2010-12-16T09:57:55.507

2Manhattan distances between keys could be useful only for 1-finger typing. – ruslik – 2010-12-16T10:00:45.383

sum(distance for character in the word to its destination in QWERTY)? – Senthil Kumaran – 2010-12-16T10:15:54.130

You assign 10 starting points for the initial positions of the fingers and then calculate the length of the path you would need to walk when typing the word using the Manhattan distance. Just a guess. – Björn Pollex – 2010-12-16T10:20:16.533