Eur Arch Psychiatry Clin Neurosci (2001) 251: Suppl. 2, II13II20 Steinkopff Verlag 2001 [600248]

Eur Arch Psychiatry Clin Neurosci (2001) 251: Suppl. 2, II/13–II/20 © Steinkopff Verlag 2001
/L50237 Abstract Observer-rating scales are used for the eval-
uation of drug trials in depression. One of the mostwidely used depression rating scales is still the HamiltonDepression Scale (HAMD), which was developed at thetime when the first antidepressants were becoming avail-able. Due to its construction it seems to have a specificfocus on drug effects of classical antidepressants. As aresult of different methodological analyses such as prin-cipal component analysis, the Rasch model and facetanalysis, a differentiation between core symptomsreflecting the severity of depression and additionalsymptoms describing other aspects of the symptomatol-ogy of depression seems meaningful. The use of theHAMD 6-item score, described primarily by Bech, as themain efficacy criterion in antidepressant drug trials givesa fair estimation of drug-induced changes of severity ofdepression and avoids bias such as the well-known biasof the HAMD total score in favour of tricyclic antide-pressants. This is demonstrated by the evaluation of twosertraline-amitriptyline comparative studies.
/L50237 Key words Efficacy criteria · antidepressant drug
trials · depression rating scales · severity of depression
For Professor Per Bech on the occasion of his 60th birthday. With many
thanks for our close academic and personal collaboration.Introduction
The quantitative evaluation of antidepressant treatmentrelies on the application of reliable standardised ratingscales (Lang et al. 1991). The traditional depression rat-ing scales, such as the Hamilton Depression Scale, weredeveloped in the 1960s, when the first antidepressantdrugs were becoming available. They may have been spe-cially designed to measure the efficacy of the classical tri-cyclic antidepressants (Domken et al. 1994; Hughes et al.1982; Möller 2000).
There are many ways of classifying scales, but the
most important distinguishes between observer-ratingscales and self-rating scales (Möller 2000). Both types ofscales provide valuable information for the clinician and,in practice, clinical studies sometimes employ bothobserver- and self-rating scales. Each type of scale is lim-ited in its assessment, providing a complementary viewrather than duplication of information. Experts in thisfield generally agree that observer-rating should be cho-sen as the principal outcome criterion in antidepressantdrug trials. However, self-rating scales can provide addi-tional important information for evaluating therapeuticregimens (Möller and von Zerssen 1995; Möller 1991).
Observer-rating scales have a number of advantages
and disadvantages over self-rating scales, and theseshould be taken into account both in the selection ofscales and when reviewing the results of the scales thathave been applied. Skilled observers, by means of theirclinical training and experience, are able to assess theseverity of a patient’s symptoms, whereas the patient’spoint of comparison can only be in relation to his ownexperience (Hamilton 1976). An observer’s ability toassess all grades of severity of illness reliably is affectedby his clinical experience and familiarity with the scale heis applying, whereas a severely ill patient may be too ill tocomplete a self-assessment (Hamilton 1976) or his self-perception may not reflect reality. Likewise, patients withpoor literacy skills, deficient vocabulary or diminishedconcentration span may have problems completing a
Prof. Hans-Jürgen Möller ( /L53175)
Department of PsychiatryLudwig-Maximilians-UniversityNussbaumstr. 7, 80336 Munich, GermanyTel.: 089/5160-5501Fax: 089/5160-5522E-Mail: hans-juergen.moeller@psy.med.uni-muenchen.deH.-J. Möller
Methodological aspects in the assessment of severity of depression
by the Hamilton Depression Scale

II/14
self-assessment questionnaire (Hamilton 1976). Obser-
vers may be subject to a certain bias which can influencethe true assessment of a patient’s condition. The rating ofspecific items by an observer can be confounded by gen-eral or initial impressions, and the psychiatrist may rateitems to fit with that initial impression. The preconcep-tion that one therapy is or is not beneficial can also biasthe rating of symptoms during the course of treatment,and highlights the need for double-blind clinical trails.Additionally, it may be difficult to differentiate certainpersonality traits from actual depressive symptoms(Möller 1991; Möller and von Zerssen 1995).
There are several depression observer-rating scales
used in psychopharmacological studies. These scaleswere not designed to be used for diagnostic purposes andare only intended for use once a diagnosis has beenmade. This type of scale focuses on the psychopathologyof depression and is used to measure the severity ofsymptoms. When used during treatment a scale gives theclinician an assessment of the patient’s improved orworsened depressive symptoms. The change in the totalscore is seen to reflect a change in general severity ofdepression.
Most often used is still the Hamilton Depression Scale
(HAMD) (Hamilton 1960, 1967), although it has beencriticised under different aspects. Beside the HAMD, theMontgomery-Asberg Depression Scale is increasinglyused in antidepressant drug trials (Möller 2000).
The HAMD was developed to be used on patients who
had already been diagnosed as suffering from affectivedisorders of the depressive type. The scale is used toquantify the results of a clinician’s interview with apatient and render the results in a convenient format forstatistical analysis. The value of the scale, as with mostobserver-rating scales, depends on the skill of the inter-viewer in obtaining the necessary information from hispatient during the interview, without using direct orprobing questions. The scale has been modified since itwas first developed and the original scale is now rarelyused. An American version, published in 1976 (Guy1976), is now the most frequently used. Hamilton’s orig-inal version contained 17 items, each of which is con-cerned with a semi-global symptom. These items werechosen because they are the most common symptoms ofdepressive illness. Some items are defined in terms of aseries of categories of increasing intensity (e.g., item 2,guilt), while others are defined by a number of terms withequal values (e.g., item 13, somatic symptoms). How-ever, some symptoms often found with atypical depres-sion are not rated at all (hypersomnia, weight or appetiteincrease), and some so-called “endogenous” symptomsare not covered, e.g., quality of mood. In this scale,behavioural and somatic features account for at least50% of the possible total score. An additional four itemswere originally excluded from the total score (thoughthey were included on the interviewer’s form): diurnalvariation, because it is neither a measure of depressionnor of intensity, but defines the type of depression; andderealisation (or depersonalisation), paranoid symp-toms and obsessional symptoms, because they occur so
infrequently that they do not contribute further infor-mation about the majority of patients (Hamilton 1960,1967; Snaith 1993). The 21-item HAMD (Hamilton 1967)has a total score range from 0 to 56 points. No score onthe scale distinguishing normality from morbidity wasmade by Hamilton; however, the range 0-10 points isgenerally used to indicate minor or no depression, 10-20(or, in some trials, 25) major depression and 25 (or 28)more severe depression.
This paper will focus on some methodological aspects
of the measurement of the severity of depression by usingthe Hamilton Depression Scale (HAMD). This topic is ofspecial importance in the context of psychopharmaco-logical studies because the efficacy criteria of drug trialsin depression are often related to the total score of theHAMD, which is seen as a measure of the severity ofdepression.
The structure of severity of depression in the HAMD
Analyses of HAMD data (Bech 1990; Conti and Cassano1990; Hamilton 1960, 1967; Maier 1990; Maier andPhilipp 1985) suggest that a distinction has to be madebetween those components of depression which are “coresymptoms” of severity of major depression and thosewhich reflect different „aspects“ of symptomatology.
Using a probabilistic test model developed by Rasch
(1960), Bech et al. (1981) found that the structure orseverity of depressive states in the HAMD could beassessed sufficiently by six items (depressed mood, guilt,work and interest, retardation, anxiety psychic, andsomatic general). On the basis of these six HAMD itemsBech and Rafaelsen (1980) developed their MelancholiaScale (MES). The validity of the subscale of the HAMDconsisting of the six core symptoms of depression waspartly replicated by Maier and Philipp (1985). The maindifference between the two studies is that the item„somatic complaints“ was substituted by the item „agi-tation“ in the subscale derived by Maier and Philipp(1985).
Among the symptom specific components focussing
on different aspects of depressive symptomatology, fourdimensions for the 17-item version of HAMD are dis-cussed frequently: somatisation, cognition, retardationand a sleep factor. However, there are considerablediscrepancies between the many studies that have beencarried out on the HAMD using factor analysis. WhenHamilton (1960) published his depression scale, he usedfactor analysis to delineate the dimensions of severity ofdepressive states. No general factor was found. In his sec-ond study with the HAMD, a general factor of severitywas confirmed (Hamilton 1967). As pointed out byHamilton himself (Hamilton 1986) the reason for thisdiscrepancy between the two studies was that the hetero-geneity of the patients regarding severity of depressionwas larger in the second than in the first study. In general,

II/15
there has been great variability in the numbers of factors
reported in the literature. A maximum of 7-factor solu-tions may provide insufficient structural variance of thescale (Berrios and Bulbena-Villarasa 1990).
In analyses of the HAMD by Steinmeyer and Möller
(1992), data from a multi-centre, multi-national,prospective, double-blind randomised clinical trial wereused, in which the efficacy and safety of paroxetine wascompared to amitriptyline (Möller et al. 1993). HAMD-data (17-item version) of a sample of 223 inpatients withmajor depression obtained at baseline without anti-depressive treatment and at endpoint after 6 weeks oftreatment with paroxetine or amitriptyline (N= 174)were inter-correlated and factor-analysed as well as stud-ied with a non-metric multidimensional scaling proce-dure.
The factorisation of the two inter-correlation matrices
of the HAMD items at baseline and at endpoint after 6weeks presents two different solutions. As can be seen inTable 1 the pattern of the “eigenvalues” lends support todifferent numbers of interpretable factors. At baseline sixprincipal components can be discerned, while the end-point solution is characterised by two principal compo-nents with a strong first component. It can easily be seenthat there are two quite different PCA results reflecting adifferent structure of the baseline and endpoint exami-nations. There is indeed no similarity between the twosolutions. As expected the strong first principal compo-nent reflects the somewhat larger heterogeneity of thepatients regarding severity of depression at the endpoint.Factor analysis (FA) and principal component analy-
sis (PCA) have a longstanding tradition in this field ofresearch. Both techniques have been used to reduce thecomplexity inherent in multi-variate observations. It hasbeen argued, however, in the psychometric literature thatexploratory factor analysis has severe drawbacks (thefactorial indeterminacy problem in particular, cf. PCA(Steiger and Schönemann 1978)). Furthermore, themodel assumptions of FA are rather strong: Linear rela-tionship between observed variables and factors; multi-variate normality of the observed variables and metricrequirements; the numerical values of the correlationcoefficients must be reproduced by F* F’ as accurately aspossible (F being the matrix of common factor loadings).Therefore, the overall level of the correlation coefficientsis crucial for the number of common factors (or princi-pal components) to be retained even if the pattern ofcoefficients is identical otherwise. This last aspect hasimportant consequences for the interpretation of thestrong first HAMD-factor found with many factor analy-ses (Hamilton 1967). It can be argued that this factor isalso a reflection of the large inter-individual variability indepression within the samples of depressive patients.
Non-metric multi-dimensional scaling (MDS) may be
better suited to reveal structures or patterns within cor-relation (or more generally: similarity) matrices (Gutt-man 1966, 1967; Lingoes and Guttman 1979). MDS – andmore specifically Monotone Distance Analysis (MDA) –refers to a family of models within which the ordinal rela-tionships among similarity coefficients are mapped intoa (Euclidean) space of as few dimensions as possible insuch a way that the ordinal relations among distancesbetween points (variables) in the space are in closeagreement with the ordering of the similarity coefficients.A framework that takes into account only the ordinal(not the metric) aspects of the similarities (correlations)among the items of the HAMD scale is better suited forthe analysis of any depression test.
Facet theory (Shye 1978; Borg 1979; Levy 1981; Canter
1985) provides a conceptual framework within which thedomain of items for depression scales can be structured.But facet theory also allows predictions about theregional structure of the spatial representation (Levy1985) obtained from some MDA-procedures such asSmallest Space Analysis (SSA; Guttman 1967, 1968; Lin-goes 1979). A facet is a “set playing the role of a compo-nent set of a Cartesian set” (Shye 1978, p. 412). Facetsserve to conceptually categorise a domain of tasks oritems. A facet theory-based definition of items includedin the HAMD scale could be proposed as follows:
An item belongs to the universe of depression rating
items if, and only if, its domain asks about the severity ofthe depressive state (“centrality”) and also calls for therating of one “aspect” of depressive symptomatology,and its range is ordered from very severe to very mildwith respect to depressive symptomatology.
Facets can play different roles in partitioning the SSA
representation. Most important is whether the elementsof a facet are ordered. The representations of several
Tab. 1 Varimax rotated factor loadings for baseline and endpoint data (Steinmeyer
and Möller 1992)
Item Principal Principal
components components
Baseline Endpoint
I* II III IV V VI I II
18.6% 18.3% 17.4% 15.8% 15.2% 14.7% 69.7% 30.3%
a1b1 13 0.57 0.74
b210 0.65 0.67
20 . 7 1 0 .72
b3 10 . 6 9 0 .70 0.50
70 . 7 0 0 .64
b4 60 . 7 8 0 .58
a2b1 11 0.83 0.72
12 0.72 0.66
b2 30 . 7 9 0 .41 0.54
90 . 5 8 0 .53
b3 80 . 5 1
b4 50 . 7 6 0 .51
a3b114 0.51
17 0.63
b215 0.7216 0.75
b4 40 . 6 5 0 .40 0.40
* Only loadings less than 0.40 are given

II/16
combinations of facets have been given special names:
Radex, cylindrex or duplex (cf. Levy 1985).
More complex configurations with more facets have
already been studied. One can always specify a minimumdimensionality of the SSA – space required for the spa-tial representation (Levy 1981; Canter 1985). The divisionof the space into regions is accomplished by depictingboundary curves according to the structuples charac-terising the variables under study.For depression ratings with only the two facets “cen-
trality” and “aspect” of depression the spatial configura-tion of the multi-dimensional scaling solution should bean ordinal radex as shown in Fig. 1 (Borg 1979; Lingoes1979).
A radex results when there is a combination of an
ordered facet and a facet which is either unordered orcircularly ordered. The ordered facet is represented byconcentric circles, the other one by segments not neces-sary starting from the centre of the spatial representa-tion.
For the HAMD various elements (structs) of the two
facets can be hypothesised. In this way each HAMD itemcan be characterised by a structupel a
ibj, i.e. a combina-
tion of one element each from the two content facets asshown in Table 2.
The data of the paroxetine-amitriptyline study were
subjected to a nonmetric (ordinal) multidimensionalscaling procedure (Smallest Space Analysis; Guttman1967, 1968, 1979). The scaling procedure can be based oneither correlation coefficients or monotonicity-(µ
2)coef-
ficients (Raveh 1978). The latter are used because theproduct moment correlation coefficient is only applica-ble for the detection of linear relationships between vari-ables. The matrix of µ
2coefficients is given a spatial
representation such that the distances between pointsrepresenting the variables reflect their similarity in asfew dimensions as possible. A facet theoretic interpreta-tion of the regions of the spatial representation is given.
To re-analyse the two similarity matrices the SSA-1
program was used, one of a family of computer programsin the Guttman-Lingoes Nonmetric Program Series ofMDS procedures (Lingoes 1979). It can be succinctlydescribed as follows: This analysis provides a geometricrepresentation of the different variables as points in a
Fig. 1 Schematic representation of an
idealised ordinal radex for the facets“centrality” and “aspect” (Steinmeyerand Möller 1992).
Tab. 2 Facet theory characterisation of HAMD items (Steinmeyer and Möller 1992)
Item Structupel
13 Somatic, general a1b1
10 Anxiety, psychic a1b2
2G u ilt a1b2
1D e p r e ssed mood a1b3
7W o r k a n d interest a1b3
6I n s o m nia, delayed a1b4
11 Anxiety, somatic a1b1
12 Gastrointestinal a1b1
3S u i c i d e a1b2
9A gitation a1b2
8R e t a r d a t i o n a1b3
5I n s o m nia, middle a1b4
14 Genital a1b1
17 Loss of weight a1b1
15 Hypochondriasis a1b2
16 Loss of insight a1b2
4 Insomnia, initial a1b4*
“Centrality” facet (modular) A: a1core symptoms of depression; a2second order
symptoms of depression; a3other accessory symptoms. “Aspect” facet B: b1soma-
tisation; b2cognition; b3retardation; b4sleep.
* There is no item related to the a3b3structupel

II/17
Euclidean space. The distance between pairs of points in
the space correspond to the correlations of the variables.Hence two points are closer if the correlation between thecorresponding variables is higher. There is a stress meas-ure termed the coefficient of alienation, which is a rank-order correlation between the variables intercorrelationsand their corresponding spatial distances. The smallerthe correlation, the better the fit. Smallest space analysisworks in a sequential manner to provide the minimumnumber of dimensions needed to obtain a geometric re-presentation with a good fit (i.e. a coefficient of alienationsmaller than 0.15). Coxon (1982) discusses the problemof the acceptability of MDS solutions. He argues that thejustification of guidelines for acceptable stress measures
is somewhat obscure and suggests (p. 65) ‘even as rulesof thumb they should be treated with considerable cau-tion’. One strategy has been to adopt the ‘elbow test’ – inother words, to generate spatial solutions in a high num-ber of dimensions and plot the stress value againstdimensionality. If a bend or elbow occurs which indicatesthat the spatial representation is not significantlyimproved by adding another dimension, then the lowerdimensional solution is acceptable.
Following the above mentioned criteria the MDS-
reanalysis shows that for both matrices only two dimen-sions are necessary to adequately represent the similar-
Fig. 2 Smallest space analysis for the
HAMD items (baseline) (Steinmeyer andMöller 1992)
Fig. 3 Smallest space analysis for the
HAMD items (endpoint) (Steinmeyerand Möller 1992)

II/18
ity patterns (Steinmeyer and Möller 1992). For both
matrices the measurement of fit – the coefficient of alien-ation k – is sufficiently close to zero: k = 0.132 and k =0.107.
As Figs. 2 and 3 demonstrate, the two-dimensional
representations of both matrices of monotonicity coeffi-cients (baseline- and endpoint-data) show an ordinalradex. Axes dividing the space according to elements offacets A and B are drawn. It can easily be seen that thehypothesised item configuration was correct in all but afew cases. A nearly identical “centrality” facet for bothmatrices is clearly discernible. The circles, which reflectthe partitioning according to the “centrality” facet,demonstrate that in both cases the core symptoms (items1, 2, 6, 7, 10, 13) are at the centre of the configuration (a
1),
whereas the second order and accessory symptomsoccupy the intermediate and outer regions (a
2and a3).
Similarly, the wedge-like shapes conform to regions inwhich the elements of “aspect” facets can be differenti-ated. Both spaces can be partitioned into four segmentsemanating from the centre. Inspection of the SSA plotsshows that a region b
1(somatisation: items 11-14) can be
differentiated from a region b3(retardation: items 1, 7, 8)
and from a region b4(sleep disturbance: items 4, 5, 6).
Only the items of aspect b2(cognition) show a different
localisation in the baseline and endpoint spatial repre-sentations. While the items guilt (2) and suicide (3) arerepresented in the b
3segment, and are confounded with
the retardation aspect at baseline, at the end-point of thestudy the items (1–3, 9, 10, 16) reflecting the cognitionaspect are close together in a separate segment.
In conclusion the following can be stated: For both
matrices a radex-like structure can easily be identified bydrawing the boundary curves as depicted. The “aspect”facet plays a polar role. The “centrality” facet is modularand is related to the severity, i.e. how much central symp-tomatology of melancholia is required for carrying outthe respective items. There is an invariant representationof the items in the sense that the items representing thecore symptoms are ordered in the central circle of bothspatial representations.
Grouping together of items, which occupy separate
sectors and the intermediate or especially the peripheralcircle, seems to be arbitrary. Only the items depressedmood (1), guilt (2), insomnia, delayed (6), anxiety, psy-chic (10), and somatic, general (13) could be combined asan indicator for depressive severity invariant over differ-ent samples. From a very restrictive point of view in con-trast to our assumption from the more peripheral facetsonly the items anxiety, somatic (11), gastrointestinal (12)and genital (14) should also be combined as a stable indi-cator for somatisation aspects of depressive symptoma-tology. As to the measurement of severity of depression,the results of the facet analysis are to a large degree inconcordance with the former results by Bech (1990) onthe core symptoms of depression. Focussing on HAMD core items of severity of
depression avoids biases
To focus on the total score of such core symptoms of
depressive severity as the primary efficacy criterion inantidepressant drug trials seems to be much more appro-priate than to use the total score of the HAMD with 17items (HAMD-17) or 21 items (HAMD-21). This avoids,among other things, some typical problems of the HAMD.
Since the Hamilton rating scale includes three sleep-
related items and, in addition, one item which investi-gates weight gain, the HAMD total score carries an inher-ent bias in favour of tricyclic antidepressants. This isapparently a special problem of the Hamilton ratingscale, while the MADRS is not sensitive to this specialbias. There is also another problem with the HAMD,especially related to SSRIs. It has been suggested that cer-tain HAMD items may be affected by the presence of sideeffects typical for the SSRIs, such as gastrointestinalsymptoms, sleep disturbances, nervousness and agita-tion. For example, in a 6-week study in outpatients withmajor depression, fluvoxamine was found to elevate theHAMD items 4 (early insomnia) and 12 (somatic gas-trointestinal symptoms) relative to placebo, particularlyin the first weeks of the study. However, elevated scorespersisted throughout the study and item 4 was signifi-cantly higher than placebo at 6 weeks (Walczak et al.1996). Thus, SSRI side effects may act to mimic depres-sive symptomatology and artificially increase HAMDtotal scores.
We tried to test empirically the inherent bias of the
Hamilton-D total score in favour of tricyclics with a seda-tive profile and to the disadvantage of SSRIs (Möller et al.1998). For this methodological evaluation we used thedata from a double-blind multicentre comparative studyof sertraline and amitriptyline in hospitalised patientswith major depression. The 6-week study included about80 patients in each group in the ITT population andabout 60 patients in each group for the efficacy analysis(Möller et al. 1998).
The results of the efficacy analysis performed in the
ATP population are presented below. The mean HAMD-21 score was reduced consistently in both groups: in thesertraline group, it dropped by 47.3%, i.e. from 28.3 to14.7; in the amitriptyline group, a reduction of 56.1% wasachieved, i.e. from 29.1 to 12.6. Amitriptyline was esti-mated to achieve a 2.0 points (ANCOVA LS means)greater reduction than sertraline in the HAMD-21 scoreafter adjustment of the covariables (95% confidenceinterval = -1.0 to 5.0). This difference was not statisticallysignificant (p = 0.186). According to the predefinedequivalence range of 5.0 points, the confidence intervalindicates equivalence between the two treatment groups.It is known from previous studies that the HAMD 21score has an inherent bias against SSRIs, particularly inshort-term treatment, since it includes three sleep-related items and one item on weight gain. Given that tri-cyclic antidepressants are highly sedative and are known

II/19
to be associated with weight gain, they fare better with
these items. The Bech HAMD Cluster, which focusses onthe core symptoms of depression (items 1, 2, 7, 8, 10 and13), does not contain the items mentioned above.
When the two treatment groups were analysed on the
basis of the Bech HAMD 6-item score, the differencebetween the groups in favour of amitriptyline decreasedto 0.6 points (95% CI = -0.8 to 1.9; p = 0.419). On the basisof the Bech 6-item score, the responder rate in the ser-traline group increased to 57% relative to the HAMDtotal score result of 52% while in the amitriptyline groupthe responder rate decreased from 68% (HAMD) to 64%(Bech 6-item score). The HAMD sleep factor reflected thedifference between the two drugs in regard to their effectson sleep: amitriptyline produced significantly betterresults than sertraline (p = 0.008) for the sleep distur-bance factor. No significant differences were seen withregard to the other HAMD factors of retardation, cogni-tive disturbance and anxiety. An important question iswhether the new antidepressants are of similar efficacy toTCA in severe depression.
The ITT analysis, which is not presented here, con-
firmed the outcome of the ATP analysis.
These findings, which give a clear signal in the direc-
tion discussed above, were tested again in a similar dou-ble-blind, multicentre comparative study of sertralineversus amitriptyline which, contrary to the above study,was performed in outpatients and not inpatients. Thestudy design was completely comparable and the statis-tical methods used were identical. This study includedabout 120 patients in each arm for the ITT populationand about 100 in each arm for the efficacy analysis(Möller et al. 2000).
The results of the efficacy analysis performed in the
ATP population are presented below. The mean HAMD-21 score was reduced consistently in both groups: in thesertraline group, it dropped by 56.5%, i.e. from 27.1 to11.7; in the amitriptyline group, a reduction of 60.5% wasachieved, i.e. from 27.5 to 10.6. The difference is notstatistically significant (p = 0.221).
The HAMD responder rates (≥ 50% reduction in
HAMD scores) were analysed both for the HAMD totalscore and for the HAMD Bech 6-item score (Bech et al.1975). When the two treatment groups were analysed onthe basis of the HAMD Bech 6-item score, the responderrate in the sertraline group increased to 57% relative tothe total HAMD responder result of 51%, while in theamitriptyline group, the responder rate decreased from68% (total HAMD) to 63% (Bech Cluster). An analysis ofthe 4 HAMD factors retardation, anxiety, cognitive andsleep disturbance, revealed an expected trend of superi-ority for amitriptyline in the sleep disturbance factor.This difference, however, was not statistically significant(p = 0.1). No differences between the treatment groupswere seen with regard to the HAMD item 1 and the otherefficacy-specific scales DSI and SDS (Table 3).
The ITT analysis confirmed the outcome of the ATP
analysis: since the HAMD-21 total score was reducedfrom 26.6 to 12.0 in the sertraline group and from 27.3 to11.8 in the amitriptyline group (90%-CI –0.9 to 2.5; 95%-
CI –1.2 to 2.9; p = 0.414), equivalence can also be con-cluded in the ITT analysis.
The methodological results of this study point into the
same direction as the results of the former study.
Apparently the tendency towards a better global anti-
depressive efficacy of amitriptyline versus sertraline isonly a pseudo effect and can be explained by a psycho-metric bias due to the inherent problems of the Hamil-ton Depression Scale. This superiority is only seen whenthe total score of the Hamilton rating scale is used for theevaluation. When the study drugs were evaluated by useof the Bech HAMD 6- item score and item 1 of the HAMD(depressive mood), sertraline and amitriptyline werefound to be very similar in efficacy. These methodologi-cal findings are of greatest importance for the interpre-tation of results of antidepressant studies. They lead tothe recommendation that under certain conditions, forexample when one drug is sedative and the other not, theHAMD total score should not be used as the main out-come criterion but rather the Bech 6-item score combin-ing the core items for the severity of depression.
Conclusions
This paper focuses on some methodological issues in theassessment of severity of depression in the context ofdrug trials in depression. With respect to the widely usedHamilton Depression Scale it seems meaningful, basedon the results of several methodological analyses, to dif-ferentiate between core items which reflect severity ofdepression and additional items which describe otherTab. 3 Improvement in HAMD scores /L50512SD among evaluable patients (ATP
population) (adapted from Möller et al. 2000)
Variable STL AMI p-value 95% confi-
n = 100 n = 105 STL vs. dence
AMI interval1
HAMD-17 total score
Mean baseline value 24.7 /L505123.9 25.0 /L505123.6
Mean reduction week 6 -13.8 /L505127.2 -15.3 /L505127.1 0.154 -0.4 to 2.8
HAMD-21 total score
Mean baseline value 27.1 /L505124.2 27.5 /L505124.5
Mean reduction week 6 -15.4 /L505127.9 -16.9 /L505128.1 0.221 -0.6 to 2.8
HAMD 6-item score
Mean baseline value 11.2 /L505122.5 11.4 /L505122.6
Mean reduction week 6 -6.4 /L505123.9 -6.9 /L505124.0 0.289 -0.4 to 1.3
HAMD Item 1 score
Mean baseline value 2.7 /L505120.8 2.7 /L505120.8
Mean reduction week 6 -1.6 /L505121.1 -1.7 /L505121.0 0.166 -0.1 to 0.4
1Difference sertraline minus amitriptyline
STR sertraline; AMI amitriptyline; HAMD Hamilton Depression Scale

II/20
characteristics of the depressive symptomatology. The
total score of the HAMD seems to have an inherent biasfor more sedative antidepressants, which is of greatestinterest in relation to the question for example whetherSSRI are less effective than tricyclics. The six core itemsof the HAMD, described primarily by Bech, reflect theseverity of depression in the best way and avoid such abias.
References
Bech P (1990) Psychometric developments of the Hamilton scales: the
spectrum of depression, dysthymia, and anxiety. In: Bech P, CoppenA (eds) The Hamilton Scales. Springer, Berlin, pp 72–79
Bech P, Rafaelsen OJ (1980) The use of rating scales exemplified by a
comparison of the Hamilton and Bech-Rafaelsen Melancholia Scale.Acta Psychiatr Scand 62 (Suppl 285): 128-131
Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG (1975) Quan-
titative rating of depressive states. Acta Psychiat Scand 51: 161–170
Bech P, Allerup P, Gram LF, Reisby N, Rosenberg R, Jacobsen O, Nagy
A (1981) The Hamilton depression scale. Evaluation of objectivityusing logistic models. Acta Psychiatr Scand 63: 290–299
Berrios GE, Bulbena-Villarasa A (1990) The Hamilton depression scale
and the numerical description of the symptoms of depression. In:Bech P, Coppen A (eds) The Hamilton Scales. Springer, Berlin, pp80–92
Borg I (1979) Some basic concepts of facet theory. In: Lingoes JC,
Roskam EE, Borg I (eds) Geometric Representation of RelationalData. Mathesis Press, Arm Arbor, MI
Canter D (1985) Facet Theory. Approaches to Social Research. Springer,
New York
Conti I, Cassano GB (1990) The impact of the Hamilton rating scale for
depression on the development of a center for clinical psychophar-macology research. In: Bech P, Coppen A (eds) The Hamilton Scales.Springer, Berlin, pp 20–27
Coxon A (1982) The Users Guide to Multidimensional Scaling. Heine-
mann, London
Domken M, Scott J, Kelly P (1994) What factors predict discrepancies
between self and observer ratings of depression? J Affect Disord 31:253–259
Guttman L (1966) Order analysis of correlation matrices. In: Cattell RB
(ed) Handbook of Multivariate Experimental Psychology. RandMcNally, Chicago, IL, pp 439–458
Guttman L (1967) The development of nonmetric space analysis: a letter
to Professor John Ross. Multivariate Behavior Res 2: 71–82
Guttman L (1968) A general nonmetric technique for finding the small-
est coordinate space for a configuration of points. Psychometrika 33:469–506
Guttman L (1979) Smallest Space Analysis by the absolute value prin-
ciple. In: Lingoes JC, Roskam EE, Borg I (eds) Geometric Represen-tation of Relational Data. Mathesis Press, Ann Arbor, MI
Guy W (1976) ECDEU Assessment Manual for Psychopharmacology
Revised. National Institute of Mental Health, Maryland
Hamilton M (1960) A rating scale for depression. J Neurol Neurosurg
Psychiatry 23:56-62
Hamilton M (1967) Development of a rating scale for primary depres-
sive illness. Br J Soc Clin Psychol 6: 278–296
Hamilton M (1976) Comparative value of rating scales. Br J Clin Phar-
macol 1 (Suppl 1): 58–60Hamilton M (1986) The Hamilton rating scale for depression. In:
Sartorius N, Ban T (eds) Assessment of Depression. Springer Ver-lag, Berlin Heidelberg New York, pp 143–152
Hughes JR, O’Hara MW, Rehm LP (1982) Measurement of depression
in clinical trials: an overview. J Clin Psychiatry 43: 85–88
Lang F, Pellet J, Postic Y, Beau JM, Lancrenon S, Blanchon Y et al. (1991)
Widlocher’s Depressive Retardation Scale and MontgomeryAsberg’s Depression Rating Scale: an inter-rater study. Eur Psychi-atry 6: 47–52
Levy S (1981) Lawful roles of facets in social theories. In: Borg I (ed)
Multidimensional Data Representation – When and why? MathesisPress, Arm Arbor, MI, pp 65–107
Levy S (1985) Lawful roles of facets in social theory. In: Canter D (ed)
Facet Theory. Approaches to Social Research. Springer, New York,pp 59–96
Lingoes JC (1979) Identifying regions in the space for interpretation. In:
Lingoes JC, Roskam EE, Borg I (eds) Geometric Representation ofRelational Data. Mathesis Press, Arm Arbor, MI, pp 16–24
Lingoes JC, Guttman C (1979) Nonmetric factor analysis: a rank reduc-
ing alternative to linear factor analysis. In: Lingoes JC, Roskam EE,Borg I (eds) Geometric Representation of Relational Data. MathesisPress, Arm Arbor, MI, pp 98–104
Maier W (1990) The Hamilton depression scale and its alternatives: a
comparison of their reliability and validity. In: Bech P, Coppen A(eds) The Hamilton Scales. Springer, Berlin, pp 64–71
Maier W, Philipp M (1985) Comparative analysis of observer depres-
sion scales. Acta Psychiatr Scand 72: 230–245
Möller HJ (1991) Outcome criteria in antidepressant drug trials: self-
rating versus observer-rating scales. Pharmacopsychiat 24: 71–75
Möller HJ (2000) Rating depressed patients: observer- vs self-assess-
ment. Eur Psychiatry 15: 160–172
Möller HJ, von Zerssen D (1995) Self-rating procedures in the evalua-
tion of antidepressants. Review of the literature and results of ourstudies. Psychopathology 28: 291–306
Möller HJ, Berzewski H, Eckmann F, Gonzalves N, Kissling W, Knorr
W, Ressler P, Rudolf GA, Steinmeyer EM, Magyar I et al. (1993) Dou-ble-blind multicenter study of paroxetine and amitriptyline indepressed inpatients. Pharmacopsychiatry 26: 75–78
Möller HJ, Gallinat J, Hegerl U, Arató M, Janka Z, Pflug B, Bauer H
(1998) Double-blind, multicenter comparative study of sertralineand amitriptyline in hospitalized patients with major depression.Pharmacopsychiat 31: 170–177
Möller HJ, Glaser K, Leverkus F, Göbel C (2000) Double-blind, multi-
center comparative study of sertraline versus amitriptyline in out-patients with major depression. Pharmacopsychiat 33: 206–212
Rasch G (1960) Probabilistic Models for Some Intelligence and Attain-
ment Tests. Danish Institute for Educational Research, Copenhagen
Raveh A (1978) Finding periodical patterns in time series with mono-
tone trend: a new technique. In: Shye S (ed) Theory Constructionand Data Analysis in the Behavioral Sciences. Jossey-Bass, San Fran-cisco, CA, pp 371–390
Shye S (1978) Theory Construction and Data Analysis in the Behavioral
Sciences. Jossey-Bass, San Francisco, CA
Snaith P (1993) What do depression rating scales measure? Br J Psychi-
atry 163: 293–298
Steiger JH, Schönemann PH (1978) A history of factor indeterminancy.
In: Shye S (ed) Theory Construction and Data Analysis in the Behav-ioral Sciences. Jossey-Bass, San Francisco, CA, pp 136–178
Steinmeyer EM, Möller HJ (1992) Facet theoretic analysis of the Hamil-
ton-D scale. J Aff Disord 25: 53–62
Walczak DD, Apter JT, Halikas JA, Borison RL, Carman JS, Post GL,
Patrick R, Cohn JB, Cunningham LA, Rittberg B, Preskorn SH, KangJS, Wilcox ChS (1996) The oral dose-effect relationship for fluvox-amine: a fixed-dose comparison against placebo in depressed out-patients. Ann Clin Psychiaty 8: 139–151

Similar Posts