UsefulNotes / IQ Testing

Intelligence Quotient scores and Intelligence Quotient testing do not work the way that they are depicted in most fiction. Since there are a couple of tropes, both dealing with the two most common (and unfortunately flawed) portrayals of these, here's the lowdown on how IQ actually works.

Accuracy and Scope

There is still much debate about what an IQ test actually measures, whether it is actually a good measure of intelligence, a measure of only a part or one type of intelligence, or actually a measure of test taking skills.

Television and media often perpetuate these tropes about IQ:

IQ is a direct measure of intelligence, synonymous with intelligence. An intelligent person is one with a high IQ, and "high IQ" shorthand for high intelligence, just as high decibels are indicative of a loud sound (even if a person doesn't understand the scale for either).
Even if IQ tests don't measure intelligence 100% accurately, they're at least measuring something basic and unchanging about a person. A person's brain has 150 IQ points just as a computer's processor has 20 cores and an average clock rate of 3.2 GHz.

Research suggests not: intelligence seems to change over a person's lifetime, and certain types of training help people boost their IQ. Additionally, one can score lower on an IQ test as a result of brain damage, medication side effects, or cognitive decline associated with aging. However, it should be noted that this is a shortcoming of the IQ tests used to measure intelligence. Lastly, IQ scores have seen something called the Flynn effect: a steady rise over the past half-century, slowly but across the board (although possibly now dying out). To keep 100 as the mean, the tests themselves have to be made harder.

Additionally, social, psychological, and cultural factors can often skew a score. For example, some studies have suggested that African-Americans score better on IQ tests when they think their score is only going to be compared to those of other African-Americans. Some of the first adult intelligence tests, administered by the US for WWI draftees, showed that African-Americans scored much better on a difficult question (calculating how many blocks were in a pyramid based on one view of the pyramid) than on "easy" questions asking for opposites: day and night, black and white, for the simple reason their inferior schooling had never taught what the word "opposite" meant.

Fluid intelligence vs crystallized intelligence

While the most common use of IQ in fiction is to portray high intelligence, the most important use of IQ testing in the real world is to identify low intelligence for things like identifying learning disabilities in schooling, selection for the armed forces (most armed forces have a limit of around IQ 85), qualification for disability payments, and even the ability to be held accountable for crimes (in the US, a prisoner cannot be administered the death penalty if his IQ is below 70).

In regards to a culture skew, many IQ tests include a core measure of a subject's general knowledge of things such as famous historical people/events (e.g., Who was Pocahontas?) or social norms/mores (e.g., What are you supposed to do if you find a wallet on the street?). Many professionals critique this as being culturally biased in favor of White/Caucasian culture to the detriment of not only minorities, but also immigrants. While handing the wallet to the police or tracking down its owner might seem like a moral good, someone who fears racial persecution or false accusations of theft might feel more comfortable leaving it on the ground.

However, this practice is generally reserved for brief mental status exams that may be given prior to an IQ test in contexts like autism and developmental evaluation. (While autism is not necessarily a disorder of low intelligence, it is standard to perform an IQ test alongside an autism evaluation to look for any comorbid patterns of learning disability.)

The modern IQ test, in theory, is primarily designed to measure the "G factor," a construct that many psychologists believe estimates a person's "fluid intelligence."

Fluid intelligence is one's ability to learn, as well as to perform a variety of simple verbal and visual tasks believed to more closely reflect one's cognitive processing skills than one's existing body of knowledge. It is believed to peak in the 20s, when most early brain development is complete and atrophy isn't a concern.
This is distinct from crystallized intelligence, which includes general knowledge, experience, and indeed, what one has learned in school. It includes one's vocabulary, knowledge of history, and memorization of mathematical formulas. It is believed to peak later in life, as one has had more years and opportunities to acquire this knowledge.

The distinction between fluid and crystallized intelligence was discovered by psychologists Cattell and Horn, who were both highly influential on the direction of the modern IQ test. Cattell and Horn realized that some forms of intelligence tended to diminish with age, while others tended to stay strong. They also named a task that could be done either in crystallized or fluid intelligence: "There are 100 patients in a hospital. Some (an even number) are one-legged but wearing shoes. One-half of the remainder are barefooted. How many shoes are being worn?"

An educated person may solve this with algebra, a mental shortcut considered part of crystallized intelligence. However, a person who never took algebra would think of the problem logically: regardless of how many people have one leg, half of the two-legged people have no shoes, and the rest are one-legged. Therefore, since the one-legged people have one shoe each, and the even number of two-shoe people (out of 100) either have two or no shoes (an average of 1), the average number of shoes is one per person, meaning the number is exactly 100. This ad hoc solution was thought of as fluid.

Obviously, many questions are easier if one knows a shortcut, such as multiplying the rows and columns of a grid to find out the number of squares inside, as opposed to counting them yourself. While both are considered relevant to many IQ tests, the modern IQ test is focused on fluid intelligence.

As such, IQ tests are intended to measure a number of areas of fluid intelligence, and tend to avoid specific ones like the ability to do mental arithmetic very fast. Mental arithmetic is a skill, which (like juggling or card-counting) can be taught, and has more to do with memorizing tables of simple equations and working with mnemonics than innate intelligence. Pattern recognition is emphasized over math skills in most tests. There's also a hotly-debated theory that there are many kinds of specialized intelligences, some of which an IQ test may not even bother assessing. IQ tests may also require skills that people with brain injuries may have lost but which don't actually affect fluid intelligence, such as the ability to do math in one's head at all, or to understand perspective. Finally, a full-scale IQ number does not tell people which areas a person excelled in: A person with an average-range IQ may have performed exceptionally in the verbal reasoning and comprehension portion, but bombed the working memory or processing speed portions.

Even tests theoretically designed to avoid general knowledge subjects (which are generally considered separate from the "G factor") might still inadvertently include them, especially when analogies are involved. A person might be asked "Paganini is to violin as Renoir is to [blank]", with the correct response being "paintbrush." But one could just as easily say "canvas" or "palette", and no one is ever asked "Numan is to synthesizer as Greenblatt is to..." – it seems the capacity to memorize pop culture facts isn't as important as memorizing "high art" facts.

Additionally, language-related sections of IQ tests might be biased to standard or formal dialects, ignoring that even non-standard or casual manners of speaking are used even by many highly-trained professionals (what person avoids saying "gonna" in the workplace?), and that highly intelligent individuals tend to have a more abstract and playful relationship to language. Analogies, for instance, frequently list choices that are equally or even more valid from perspectives the test creator never considered.

The line between fluid and crystallized intelligence is debated, as is the belief that that "G factor" exists at all.

This takes us to the next point: with any tool—but especially with psychological tests—you need to be aware of what the thing is trying to measure, and whether it actually accomplishes what it claims to.

The original scope of the IQ test

The original IQ test, the Stanford-Binet, was designed simply to identify what grade of school you should be in, versus which grade you actually were, with an eye towards identifying kids with developmental disabilities. In other words, your Stanford-Binet IQ number gets smaller every year, as you progress through numerically-larger grades, and stops being meaningful the moment you get your diploma. Obviously, a test that does not calculate using school grades as a yardstick (eg. the Wechsler Adult Intelligence Scale

or more recent versions of the Stanford-Binet) will not have this problem... But how does it measure? It turns out you can't give the WAIS to a child and expect meaningful results, and there's a "Wechsler Scale for Children" for that exact reason. Psychological tests are designed to be used under very specific circumstances, and if you move outside them, the test and its results may not be applicable at all anymore.

IQs were originally given as ratios: (mental age / chronological age) * 100 = IQ. Thus a 6 year old who scored as well as the average 9 year old would have an IQ of 150 (9 / 6 * 100). This system matches contemporary IQ scores quite well—until you get more than about three standard deviations from the mean, after which point ratio IQ scores stop being distributed on a normal curve. Thus, if IQ is meant in the original sense, then it's at least mathematically possible that a character could have an IQ of 300: they would simply have had to have been functioning at the level of a twelve-year-old when they were four. Of course, this didn't really work for adults. At any rate, IQs haven't been ratio-based for decades.

Range

Improbably High I.Q. and Improbably Low I.Q. are staples of the writer's trade; in reality, IQs range somewhere between 50 and 200. While in theory the average (statistical mean) IQ is 100, in practice the average of people in the street tends to be slightly higher because people with IQ scores under 70 are typically under various levels of care or supervision, though not by much, since the vast majority of people fall into the "average" range. Additionally, due to cognitive sorting, people tend to hang around with other people of similar intelligence so it would not be uncommon for a college grad with an IQ of 120 to have friends who average about the same.

IQ test scores are designed to follow a normal (bell curve) distribution—meaning that, if an IQ test is normed at 100 and has a standard deviation of 15 points, about 68% of the population has an IQ between 85 and 115 (one standard deviation from the norm), and fully 95% of people are between 70 and 130. Mensa, the best-known international society for people with high IQs, requires a score of at least 132 on the Stanford-Binet or Wechsler tests, corresponding to the 98th percentile. IQs over 145/under 55 number about one in a thousand; IQs over 160/under 40 (four standard deviations from the norm), about one in thirty thousand. The occurence of the more ridiculously high levels that come up in fiction become difficult to calculate, even in theory, but IQs over 190 (6 standard deviations from the norm) are about one in a billion, while IQs over 229 (8.6 standard deviations from the norm) would be about one in ten trillion: an interstellar level of IQ. For more numerical stats, see the analysis pages of Improbably High I.Q. or Improbably Low I.Q..

That far from the middle, test makers have trouble finding enough people to produce a nice reliable sample. Even then, and even assuming such a person had nothing better to do than help psychiatrists norm their IQ tests, the same person would likely achieve slightly different scores each time due to differences in the tests, the specific questions, and the conditions under which the test was taken.

Measured IQ, like most types of human variation, does not exactly follow a normal distribution in practice. The most obvious deviation is that very low scores are much more common than very high scores, because measured IQ can be greatly lowered but not raised by various disorders and traumas affecting the brain.

It's also not actually certain whether the distribution of human intelligence actually is in the form of a bell curve. If it's not, then formulating IQ tests specifically to create such a curve means that inaccuracy is being built into them.

Precision

A test that can successfully classify 95% of the population will be adequate for nearly all conceivable uses... but eventually it will run into a ceiling. If two people get a perfect score, the test provides no way to measure which of them is more intelligent. A more difficult test must be used, and this is not commonly done. Mensa and other high IQ societies are some of the few groups that are interested in quantifying IQ scores at the extreme high end of the range. Since a special test must be used to give an accurate and precise score for these individuals, and since most people never need to do this, it's common for people with high intelligence to not know their exact IQ score beyond a fuzzy range.

Similarly, if two people both get every single question wrong (or get the number of correct answers you'd typically get from randomly guessing), the test does not tell you which one is less intelligent. Most standard IQ tests only go down to 40 or 50 IQ at the lowest. Unlike high IQ, however, there are plenty of people interested in specialized tests able to measure the lower ranges, since this information is important for special education programs and other kinds of services. As a result, several tests are designed to be highly specific for IQs under 50. There are also adaptive behavior tests used to determine what practical self-care skills the person has, which can estimate IQ in someone who is difficult to test.

The Score

People in TV shows never ever qualify their IQ tests in any way.

It doesn't matter if different IQ tests were all intended to calculate the same things. Technically speaking, stating your IQ without mentioning which test you were administered is akin to saying you got a "college admissions test score of 36"; a 36 could mean perfect score, a score in the top 75%, or a score below the lowest possible score, depending on whether you're talking about the ACT, the International Baccalaureate, or the SAT respectively. Just giving a score assumes that all IQ tests have the same standard deviation, or measure the same kinds of brainpower. They don't. Tests have standard deviations ranging from 10 to 20 or higher; for example a score of 132 on the Stanford-Binet is equivalent to a 148 on the Cattell.

Furthermore, the tests and associated scales are updated over time, much like (though not as radically as) the SAT.

This may be partially justified by the use of the term IQ itself, which can automatically imply the bell curve/mean = 100/standard deviation = 15 pattern described above. What is certainly true is that raw test scores from different tests cannot be compared in any meaningful way; only two transformed values based on said specific bell curve pattern could even come close to making sense in comparison to one another. Also, different tests have been normed to different populations, meaning that differences in their norming samples might make even such values basically incompatible.

In addition, most IQ tests have at least two components, if nothing else, something separating an equivalent to verbal IQ (language and decoding) from performance IQ (perceptual abilities and/or executive function). It is not unheard of for someone taking an IQ test to come out at opposite extremes of the two scales, leaving the composite (total) score worthless for informational purposes - in such a case the composite score shows an average IQ, when the useful information is a very high score on one part and a very low one on the other.

Just To Make Things Worse...

Although the term "IQ" implies that a test has been taken, it is possible for psychologists and psychiatrists to estimate a person's IQ without ever administering a formal IQ test. The famous Rorschach test ("inkblot test") is less concerned with measuring what the person thinks the shape looks like but how they think it looks like, which provides insight into a person's thought processes. The TAT asks a patient to put a series of simple sketches in chronological order and then create a narrative. These tests, along with many others, often provide sufficient insight into a person's thought processes and level of reasoning to allow an expert to estimate the subject's IQ number.

Of course, the Rorschach test is one of the most contested tests, with multiple analysis sets and training. Two researchers who have been trained to analyze responses in the same way will likely draw the same conclusions, but their results may not match those of another analytic model. That being said, that makes the Rorschach about as a reliable as most conventional IQ tests, and since it's often administered as part of a package, it allows a sufficiently trained researcher to make a credible guess as to someone's IQ. This method is rarely shown on TV, as no one needs therapy and it's hardly as impressive as having a character simply rattle off a concrete number.^note

However, all processes still face the same hurdles: controlling for stereotype threat; validity (does the test measure what it says it measures); reliability (does the test consistently yield the same results); and a sufficiently large sample size to produce meaningful results. Whether or not human intelligence can be accurately plotted as an IQ number is unclear. However intelligence is defined, even tests that measure that definition of intelligence are going to have to overcome all the hurdles previously discussed. There is also a sizable language gap in how TV writers and psychologists (et al) define IQ, nevermind the fact that researchers can compare how a person does on multiple tests and derive multiple IQ numbers.

In short, the result that matters is determined by what the test is being used for: a student who significantly deviates from the mean on an IQ test is being underserved by their school in some respect, with focus less on the number than on the degrees of separation from the mean; a criminal whose IQ is significantly low, to the point that there is doubt as to their ability to form criminal intent, will need to be tested extensively, as inability to perform higher-order logical reasoning processes is critical when determining eligibility for the death penalty in the USA; and then there's Mensa and other private companies that will offer to tell you your number for a fee, if you truly want to know a "number."

IQ Testing

2 Following

Accuracy and Scope

Fluid intelligence vs crystallized intelligence

The original scope of the IQ test

Range

Precision

The Score

Just To Make Things Worse...

Previous

Index

Next

Useful Notes / IQ Testing 2 Following

Accuracy and Scope

Fluid intelligence vs crystallized intelligence

The original scope of the IQ test

Range

Precision

The Score

Just To Make Things Worse...

Previous

Index

Next

IQ Testing

2 Following