Cuyamaca College skip to content
                Cuyamaca College Library

Jeri Edelen's Home Page Library Home Page Student Learning Outcomes

Introduction  HOME | SLOs 1,2,3 |  SLO 4 | SLO 5 | SLO matrixppt

Interrater Reliability definition

     So how do we determine whether two observers are being consistent in their observations? We have chosen to implement a test for inter-rater reliability.  Interrater reliability is the extent to which two or more individuals agree. 

     If you have watched the Olympics or any sport using judges, the scores are based upon judges as observers maintaining a great degree of consistency between themselves. If even one of the judges is erratic in their scoring system, this can jeopardize the entire system and deny a participant of their rightful prize. Another example is where a husbands response to marital satisfaction questions are related to (correlated with) wives responses. A third example is when a parents assessment of their child's achievements are related to (correlated with) with the child's assessment of his/her achievement.

     Outside the world of sport, marriage and children, inter-rater reliability has some far more important connotations and can directly influence the Library. In our assessment, the CC Library will be assessing student IC skills are related to students own assessment of his/her achievement.   For example, CC Library has designed a "self-efficacy survey" called Reference Card Survey (RCS) to assess SLO 1,2,3. Our RCS was used to report students sense of their own competence. Student self-efficacy involves students rating their perception of their own achievement in library learning outcomes. The Reference Card Survey includes a librarian assessment of student performance. (Ren, Wen-hua. "Library Instruction and College Students Self-efficacy in Electronic Information Searching". J. of Academic Librarianship 26(Sept 2000):323-328.

Data Collection procedures at Cuyamaca College Library

     The data will be used to evaluate the quality of library reference services. The Reference Survey Card consists of two forms: Form A, which focuses on student experience during a reference desk interview, and Form B, which focuses on the librarian's experience during a reference desk interview. The librarian and the student will complete each form immediately after a reference desk interview.

     For the rating of the students and the librarians to be consider “credible” and thus usable to draw conclusions, we must first test the degree to which the students and librarians agree about the student’s experience with reference librarian. When we test the reliability of ratings we often compute the inter-rater reliability coefficient. It is generally accepted that a inter-rater reliability coefficient of .75 or higher suggests that the ratings are reliable. The closer to 1.0 (a perfect match), the more reliable the ratings are. For example, if we have the following data:

 

Question

Student Rating

Librarian Rating

 

 

I am now better able to construct a successful search statement in order to find information.

5 = Strongly Agree

1= Strongly Disagree

 

 

Interaction Event 1

5

5

 

 

Interaction Event 2

4

4

 

 

Interaction Event 3

4

4

 

 

Interaction Event 4

3

5

 

 

Interaction Event 5

5

3

 

 

Interaction Event 6

3

4

 

 

Interaction Event 7

5

5

 

 

Interaction Event 8

5

5

 

 

Interaction Event 9

3

2

 

 

Interaction Event 10

2

2

 

 

Total

39

39

 

 

Mean

3.9

3.9

Significance Level

Interpretation

Inter-Rater Reliability

.61

.031

Marginal agreement about statement (significant result)

In this first case, the reliability of the ratings are considered low (.61) even though the student and librarian mean

rating (3.9 on a 5-point scale) are the same (3.9).

Question

Student Rating

Librarian Rating

 

 

I am now better able to construct a successful search statement in order to find information.

5 = Strongly Agree

1= Strongly Disagree

 

 

Interaction Event 1

5

5

 

 

Interaction Event 2

4

4

 

 

Interaction Event 3

4

4

 

 

Interaction Event 4

5

5

 

 

Interaction Event 5

3

3

 

 

Interaction Event 6

3

4

 

 

Interaction Event 7

5

5

 

 

Interaction Event 8

5

5

 

 

Interaction Event 9

3

2

 

 

Interaction Event 10

2

2

 

 

Total

39

39

 

 

Mean

3.9

3.9

Significance Level

Interpretation

Inter-Rater Reliability

.92

Less than .000

Excellent agreement about statement (highly significant result)

 In this second case, the reliability of the ratings is considered high (.92) even though the student and librarian’s mean rating (3.9) is the same as in the first case. The difference is in the level of agreement. You can visually see that in the 2nd set of data the student and librarian ratings are the same in almost each row and that is what we are looking for in with your data. It doesn’t matter if the student “Strongly Agrees or Strongly Disagrees” only that the librarian and the student agree on the result of the interaction and thus provide similar ratings.

 


         back to Introduction  HOME | SLOs 1,2,3 |