Improving Inter-scorer Reliability

One of the major goals in publishing the American Academy of Sleep Medicine sleep stage and arousal scoring rules in 2007 was to improve inter-scorer reliability and accuracy so that sleep studies would be scored “the same” within and between sleep laboratories across city, state, country or sea.1,2

Recent studies show that the rules have led to significant improvements.3-5 One study found sleep stage specific agreements of 97.5 percent for REM sleep, 95.6 percent for wakefulness, 93.8 percent for NREM 3, 86.5 percent for NREM 2, and 90.1 percent for NREM 1.5

Why is it important that we score sleep studies with reasonable agreement? When two or more individuals score a stage of sleep or an event in a PSG differently, it can introduce enough variability into the results to lead to a false positive or false negative for a particular diagnosis.2 U.S. sleep centers must demonstrate regular continuing testing of inter-scorer reliability to obtain and maintain their AASM sleep center accreditation.6

Inter-scorer reliability of sleep studies depends upon several factors:

  • the scorer’s skill, experience, and training
  • the study’s technical quality
  • the scoring rules’ clarity and simplicity
  • the diligence with which scoring rules are applied
  • the degree of physiological ambiguity of the sleep/wake patterns.7

Poor inter-scorer agreement can be due to imprecise scoring definitions, improper signal measurement, poor signal quality, and scorer’s limited ability or experience.8

Improvement epoch by epoch

The sleep center at the University of New Mexico was one of the first in the country to develop monthly inter-scorer reliability testing. A desire to teach ourselves the then newly published AASM scoring rules and to demonstrate compliance with them for a pending reaccreditation prompted our early adoption.

Each month we select a PSG test for the sleep center staff to score. We often choose one that contains challenging stage scoring or provides an opportunity to review and learn more about a topic, for example sleep-related epilepsy.

The PSG is transferred into a comparison scoring folder with the previous scores and patient identification removed. A senior sleep specialist chosen as the “gold standard” scorer for the month selects and re-scores 200 consecutive epochs within the chosen study. We advise the staff of the file location, the particular epochs to score, the patient’s age, and the date when we will review the results. We require each and every sleep technologist, specialist, fellow, or trainee to score the study of the month.

Archive Image

Click to view larger graphic.

We compare the results of individual scoring with the gold standard scorer using a comparison scoring program that came with our digital PSG system. It generates agreement matrices showing in tabular form how an individual’s score compared epoch-by-epoch with the gold standard scorer for sleep stages, respiratory events, arousals, oxygen desaturations, and leg movements. Discrepancy tables allow us to see where disagreement with the master scorer occurred. (See Figure 1.)

Staff development an added benefit

We schedule a dinner meeting between the day and evening shifts to review the comparison on a video screen. As we munch on pizza or enchiladas, we analyze each epoch. When we disagree about the particular scoring of an event, we debate it and reach agreement – or at least compromise.

Month by month we grow closer in our scoring, and far fewer studies are sent back for re-scoring. We use the comparison scoring session to discuss how the study could have been done better, for example by identifying an artifact or event that went unnoticed by some or all.

We also review new guidelines or practice parameters. The session often ends with a PowerPoint didactic presentation related to the PSG given by a sleep technologist or sleep specialist.


For sleep centers who are working toward AASM accreditation, lack the resources to perform monthly inter-scorer reliability, or want to pit their lab with the “champs,” the AASM now offers monthly record reviews online.

The AASM Inter-scorer Reliability Testing Program works on a “pay per play” basis using purchased credits or with the purchase of an annual subscription for the center.

Technologists and specialists from your center can log on monthly and score a PSG that is stored online with the AASM.

The records selected have been submitted from several accredited sleep centers, scored by a senior board-certified sleep specialist, and reviewed by at least two others before posting.

The AASM site provides explanation about the logic of a particular scoring, and it allows administrators to compare their center and staff to national scoring averages.

– Megan Rauch, RPSGT, RRT, and Madeleine M. Grigg-Damberger, MD

Implementing comparison scoring on a monthly basis is good for training, quality, morale, and it even can be fun.


1. Iber C, et al., The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. 2007, American Academy of Sleep Medicine: Westchester, Illinois.

2. Grigg-Damberger MM, The AASM scoring manual: a critical appraisal. Curr Opin Pulm Med. 2009;15(6):540-49.

3. Parrino L, et al., Commentary from the Italian Association of Sleep Medicine on the AASM manual for the scoring of sleep and associated events: for debate and discussion. Sleep Med. 2009;10(7):799-808.

4. Moser D., et al. Sleep classification according to AASM and Rechtschaffen & Kales: effects on sleep scoring parameters. Sleep. 2009;32(2):139-49.

5. Danker-Hopfe H., et al. Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. J Sleep Res. 2009;18(1):74-84.

6. American Academy of Sleep Medicine. Standards for Accreditation of Sleep Disorders Centers. 2008. Accessed via

7. Hirshkowitz M, Sharafkhaneh A. On measurement consistency and other hobgoblins. Sleep. 2004;27(5): 847-8.

8. Stepnowsky CJ, Berry C, Dimsdale JE. The effect of measurement unreliability on sleep and respiratory variables. Sleep. 2004;27(5)990-5.

Megan Rauch, RRT, RPSGT, is technical supervisor of the University of New Mexico Hospital Sleep Disorders Center, Albuquerque. Madeleine Grigg-Damberger, MD, is medical director of pediatric sleep medicine services at the same facility and professor of neurology at University of New Mexico School of Medicine.

About The Author