The reliability test assesses the stability of the partitions during the subsequent administration of the instrument; If the symptomatology studied has not changed in the interval between the two evaluations, the assessment of the scale should overlap significantly, so the correlation between the two evaluations should be high. Test stability is particularly important for tools that assess personality characteristics or characteristics for which low correlation coefficients may indicate that the scale is unstable, or for the presence of elements that examine the situation or for difficulties in understanding the elements. In the event of acute illness, the reliability of the test is not appropriate. The stability of a scale is important when used several times at different distances, as in the case of clinical psychopharmacological research, because only good reliability of the repeat test can be used to attribute changes in scores to the effect of treatment and not to the instability of the instrument. The degree of correlation with the repeat test varies depending on the interval between the following evaluations; for limited time intervals (1 to 2 weeks), the coefficient must be greater than 0.80, while values of 0.69 or more for monthly periods are also acceptable. Given the time variability of psychiatric symptomatology, the reliability assessment of the retest test is frequently used to record interviews (with real or simulated patients) on videotapes, which are then proposed at different times. The physician`s ability to evaluate patients may change (improve!), as the evaluator gains experience and thus reduces the correlation between evaluations over time. To assess this type of reliability (intra-rater reliability) since it is impossible to have the same patient with the same symptomatology at a distance of time, the analysis of videotaped cases is used and the degree of correlation between the results obtained during the different sessions is therefore calculated. The ambiguous measurement of the characteristics of interest in destination notation is generally enhanced by several formed compartments. Such measurement tasks often involve a subjective assessment of quality. For example, the assessment of the doctor`s “bed manner,” the assessment of the credibility of witnesses by a jury, and the ability of a spokesperson to present.
If the number of categories used is small (z.B. 2 or 3), the probability of 2 advisors agreeing by pure coincidence increases considerably. This is because the two advisors must limit themselves to the limited number of options available, which affects the overall agreement rate, not necessarily their propensity to enter into an “intrinsic” agreement (an agreement is considered “intrinsic” if not due to chance). Bland and Altman expanded this idea by graphically showing the difference in each point, the average difference and the limits of vertical match with the average of the two horizontal ratings. The resulting Bland-Altman plot shows not only the general degree of compliance, but also whether the agreement is related to the underlying value of the article. For example, two advisors could closely match the estimate of the size of small objects, but could disagree on larger objects. In statistics, reliability between advisors (also cited under different similar names, such as the inter-rater agreement. B, inter-rated matching, reliability between observers, etc.) is the degree of agreement between the advisors. This is an assessment of the amount of homogeneity or consensus given in the evaluations of different judges. The common probability of an agreement is the simplest and least robust measure.