On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that that they have a number of good qualities. This is as true for behavioural and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking.
Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a split-half correlation. This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items.
Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. For example, Figure 5. For example, there are ways to split a set of 10 items into two sets of five. Many behavioural measures involve significant judgment on the part of an observer or a rater.
Inter-rater reliability is the extent to which different observers are consistent in their judgments. Validity is the extent to which the scores from a measure represent the variable they are intended to.
But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever.
Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. Here we consider three basic kinds: face validity, content validity, and criterion validity. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity.
Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally. Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to.
It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 MMPI-2 measures many personality characteristics and disorders by having people decide whether each of over different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure.
For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation leading to nervous feelings and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Understanding Reliability and Validity. Reliability Reliability is the extent to which an "experiment, test, or any measuring procedure yields the same results on repeated trials.
So, even though Ms. Jones blood pressure yielded three different readings when taken by your nurse, the medical student and you, they are close. One is not sky high and the others low. There is reliability between the three readings. Validity is the extent to which the construct measures what it says it is measuring. The use of a blood pressure cuff is considered to be valid because it is measuring blood pressure, not something else.
Using an opthalmoscope to measure blood pressure would not be a valid method. Barbara Ferrell, PhD. Example: A test designed to assess student learning in psychology could be given to a group of students twice, with the second administration perhaps coming a week after the first. The obtained correlation coefficient would indicate the stability of the scores. Example: If you wanted to evaluate the reliability of a critical thinking assessment, you might create a large set of items that all pertain to critical thinking and then randomly split the questions up into two sets, which would represent the parallel forms.
Example: Inter-rater reliability might be employed when different judges are evaluating the degree to which art portfolios meet certain standards. Inter-rater reliability is especially useful when judgments can be considered relatively subjective. Thus, the use of this type of reliability would probably be more likely when evaluating artwork as opposed to math problems. Validity refers to how well a test measures what it is purported to measure.
Why is it necessary? While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 5lbs to your true weight.
Describe the kinds of evidence that would be relevant to assessing the reliability and validity of a particular measure.
Key Takeaways Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them. There are two distinct criteria by which researchers evaluate their measures: reliability and validity.
Reliability is consistency across time test-retest reliability , across items internal consistency , and across researchers interrater reliability. Validity is the extent to which the scores actually represent the variable they are intended to. Validity is a judgment based on various types of evidence. The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.
Then assess its internal consistency by making a scatterplot to show the split-half correlation even- vs. Compute the correlation coefficient too if you know how. Discussion: Think back to the last college exam you took and think of the exam as a psychological measure. What construct do you think it was intended to measure? Comment on its face and content validity. What data could you collect to assess its reliability and criterion validity? Cacioppo, J. The need for cognition. Journal of Personality and Social Psychology, 42 , — Hoyle Eds.
Previous Section. Next Section. License 4. Share This Book.
0コメント