One of the enduring issues being evaluated during the SPICE trials is the reliability of assessments. One type of reliability is the extent to which different assessors produce similar ratings when assessing the same organization and presented with the same evidence. In this paper we report on a study that was conducted to start answering this question. Data was collected from an assessment of 21 process instances covering 15 processes. In each of these assessments two independent assessors performed the ratings. We found that six of the fifteen processes do not meet our minimal benchmark for interrater agreement. Three of these were due to systematic biases by either an internal or external assessor. Furthermore, for eight processes specific rating scale adjustments were identified that could improve its reliability. The findings reported in this paper provide guidance for assessors using the SPICE framework.