A brief review is made of a body of extant asymmetric MDS models and methods, given a one-mode, two-way asymmetric square relational data matrix whose elements are similarity or dissimilarity measures between objects, or a special two-mode, three-way asymmetric relational data matrix which is composed of one-mode, two-way asymmetric square relational data matrices, and several open problems are discussed. This chapter is the reproduction of the first three sections of Chino (2012).
This page was opened on June 6, 2020.
This page was revised on July 14, 2020.
In general, these asymmetric relationships observed at a specifc point in time or during a certain period of time are summarized in a square matrix whose elements are similarity or dissimilarity measures between objects. Such a matrix is characterized as a one-mode, two-way square asymmetric matrix. In contrast, if such a matrix is obtained at several points in time or during several periods of time, the data matrix is represented by a special two-mode, three-way matrix which is composed of a set of one-mode, two-way square asymmetric matrices. Sometimes such a matrix is observed per individual, and yields the same two-mode, three-way matrix.
Asymmetric MDS (hereafter abbreviated as AMDS) extends symmetric MDS (hereafter
abbreviated simply as MDS) to handle such asymmetric relationships among objects. As for
symmetric MDS, Cox and Cox (2001) give two definitions, i.e., a narrow definition and a
wider definition. According to their narrow definition, MDS is a search for a low
dimensional space, usually Euclidean, in which points in the space represent objects, one
for each object, and such that the distances between the points in the space match as well
as possible the original dissimilarities. In order to clarify the definition of MDS and
also the difference between MDS and AMDS, we shall redefine their narrow definition of MDS,
adding a bit stricter constraints, as follows: The following five conditions are assumed in
our narrow definition of MDS. That is,
Although Cox and Cox refer to neither the data type nor the scale level for MDS, we have added them as conditions 1 and 2. Especially, the reader should pay special attention to the term “count data” in condition 2. Although counting is sometimes viewed as an ordinal scale or higher, we shall clearly distinguish between counting and ordinal scale or higher. In this paper, following Suppes and Zinnes (1963, p.9), we shall use counting as an example of an absolute scale, in which there is no arbitrary choice of unit or zero available. In contrast, in the case of, say, a ratio scale, the choice of a unit is an arbitrary decision made by an individual or group of individuals.
Furthermore, if we consider an inferential procedure, the most rational analysis of count data is to treat them, not as, say, measured at an ordinal scale level, but as measured at a ratio scale level and take their natural logarithm further. The reason for this is that (1) multinomial distribution as well as Poisson distribution as stan- dard models for count data belongs to a usual exponential family, and can be written in canonical form, (2) canonical parameters of these two distributions are not their population parameters themselves but the natural logarithms of their function or the natural logarithm of the population parameter, and (3) statistics corresponding to these canonical parameters have su?cient information of data (e.g., Andersen, 1980). According to their wider de?nition, MDS can subsume several techniques of mul- tivariate data analysis. It covers any technique which produces a graphical represen- tation of objects from multivariate data. It is apparent that such techniques do not necessarily satisfy all of the above conditions. Therefore, we shall adopt the narrow de?nition of MDS in this papar.
Two basic theorems underpinned MDS in the 1930’s. These are the Eckart and Young (1936) theorem and the Young and Householder (1938) theorem. The former is concerned with the lower rank ?t to the data matrix S, and the latter with the necessary and su?cient condition that the coordinates of objects in a multidimen- sional space are real points in the Euclidean space. Drawing upon these theorems, Richardson (1938) proposed a method of MDS and Torgerson (1954, 1958) developed it further (e.g., Tucker & Messick, 1963). Their method is now called classical MDS. There exists a large body of literature which extends classical MDS. We can divide it ?rst into two, i.e., descriptive MDS and inferential MDS. We may further divide the latter into three. These are the so-called special probabilistic MDS, maximum likelihood MDS, and Bayesian MDS.
The descriptive MDS is a class of MDS which does not accompany any statistical inferences on population parameters. We shall only refer to representative ones below. The classical MDS was first extended to the case when the dissimilarities are mea- sured at an ordinal scale level by Kruskal (1964a, b), which is called nonmetric MDS. Guttman and his colleagues (e.g., Guttman, 1968; Lingoes, 1973) followed Kruskal, and proposed another method of nonmetric MDS called the smallest space analysis (abbreviated as SSA). Carroll and Chang (1970) also extended classical MDS in such a way to handle individual differences in dissimilarity judgments, and therefore their method is called individual differences MDS. Takane, Young, and De Leeuw (1977) proposed another method of individual differences MDS. In contrast with Carroll and Chang’ method, their method is applicable to dissimilarity data measured on a wider class of scale levels.
As the special probabilistic MDS, we include a class of MDS methods which as- sumes a normal distribution on coordinates of objects to be estimated, and as a result assumes a noncentral χ2 distribution with some degrees of freedom and a specifed noncentrality parameter on squared distances. Such an MDS technique goes back to Hefner (1958), and several papers exist in the literature (e.g., Ramsay, 1969; Suppes & Zinnes, 1963; Zinnes & Mackay, 1983).
The maximum likelihood MDS is another class of inferential MDS techniques, which assumes a normal distribution and/or log-normal distribution on observed dissimilar- ities (e.g., Ramsay, 1977, 1978, 1982; Takane, 1978a,b, 1981; Takane & Carroll, 1981). In this class of MDS methods, it is assumed that dissimilarities are obtained by the pair comparison method or by a certain rating method.
The Bayesian MDS is another inferential MDS, which is based on the Bayesian inference, and has recently been proposed by several researchers (e.g., Fong et al., 2010; Je et al., 2008; Lee (2008); Oh & Raftery, 2001, 2007; Okada & Mayekawa, 2011; Okada & Shigemasu, 2010; Park et al., 2008).
As is apparent from the brief review of MDS made above, symmetric MDS has a long history and has been almost fully developed in that both descriptive and inferen- tial methods are now available. Furthermore, major books on MDS which thoroughly review MDS have been published (e.g., Borg & Groenen, 2005; Cox & Cox, 2001; Saito, 1980; Takane, 1980). In contrast, such books on AMDS are rare (e.g., Chino, 1997), although the books on MDS mentioned above partly introduce AMDS (e.g., Borg & Groenen, 2005; Cox & Cox, 2001). Recently, Saito and Yadohisa (2005) review AMDS fairly extensively. One possible reason for the relative paucity of literature on AMDS may be that it is still an active area of research. As a result, the notion of AMDS is still vague, and it seems that the precise definition of AMDS has not been established. Therefore, it seems appropriate and necessary to give some definition of AMDS, before we make a critical review of AMDS in this paper. In fact, depending on this definition, the history of AMDS will have to focus on different aspects. It might be possible to make a narrow definition of AMDS as well as a wider definition in a manner similar to Cox & Cox (2001) for MDS. In this paper, we shall further divide the narrow definition of AMDS into three, i.e., a narrow definition, a narrower definition, and finally the narrowest definition.
The following six conditions are assumed for the narrowest definition of AMDS:
Condition 2 excludes a body of asymmetric methods for count data (i.e., Chino, 1997). As the metric spaces in condition 3, we include the Minkowski r-metric space (the Euclidean space as a special case) for the real metric space, the Hilbert space for the complex metric space (i.e., Chino & Shiraiwa, 1993; Saito & Yadohisa, 2005), and the asymmetric Minkowski space for a real asymmetric metric space (i.e., Sato, 1988). This condition excludes a familiar method for the skew-symmetric data, i.e., the Gower diagram (or sometimes called the canonical analysis of skew-symmetry (abbreviated as CASK by Chino, 1997), because the area property in CASK has the symplectic but not Euclidean structure (Chino & Shiraiwa, 1993).
As will be discussed in detail later, CASK decomposes the skew-symmetric part of a squared asymmetric matrix into the weighted sum of a special quantity (see, Eq. (20) in Section 2). It is well known that this quantity is an oriented area of the parallelogram spanned by the two location vectors corresponding to two objects.
It is apparent from the above definition of AMDS that we have inherited the concept of metric space from the narrow definition of MDS discussed previously. As is well known, a metric space is a nonempty set M equipped with a positive real-valued function d : M × M → R, called the distance function, that satisfies the following axioms:
for all X, Y, Z ∈ M . Minkowski’s r-metric as well as Hilbert space satisfies all of these axioms, while a more general Minkowski space, which we call here the asymmetric Minkowski space, does not satisfy the second axiom. As defined elsewhere (e.g., Matsumoto, 1986; Sato, 1988), a Minkowski space M is a finite dimensional real vector space such that the length of a vector x ∈ M is given by the value L(x) of a function L on M , where L is assumed to satisfy the following conditions,
The function L is called Minkowski metric function. If L(x)= L(-x) for any x, then the metric is considered to be symmetric.
The narrower definition of AMDS, on the other hand, adds the count data in condition 2. This gives a criterion for checking and classifying the seemingly different approaches to an asymmetric dissimilarity data matrix using the notion of quasisymmetry in the log-linear model, as discussed in a later section.
The narrow definition of AMDS adds the indefinite metric space in condition 3. The narrow definition also adds a one-mode, three-way square matrix in condition 1. We shall discuss open problems on these topics in the discussion section.
According to the wider definition of AMDS, it can subsume several techniques of multivariate data analysis, and covers any techniques which produce a graphical representation of objects from multivariate data, as in MDS. It is apparent that such techniques do not necessarily satisfy all of the above conditions for AMDS. Therefore, we shall adopt the narrow definition of AMDS in this paper.
The organization of this paper is as follows. In the next section we shall briefly review a body of AMDS methods in the narrowest sense. Then, we shall briefly review a body of AMDS methods in the narrower sense, which assumes count data, and embeds objects in some metric space, and discuss the implications of checking and classifying various features of these two bodies of AMDS in the third section. In the discussion section, we shall list up the open problems on AMDS in the near future.
(1) |
A number of models have been proposed since Young proposed his ASYMSCAL. As in MDS, we can divide them into two, i.e., the descriptive AMDS and the infer- ential AMDS. According to the narrowest definition, almost all the extant AMDS models remain descriptive. Representative methods in this category are Borg and Groenen (2005), Chino (1978, 1990), Chino and Shiraiwa (1993), Constantine and Gower (1978), Escoufier and Grorud (1980), Gower (1977), Harshman (1978), Harshman et al. (1982), Kiers and Takane (1994), Krumhansl (1978), Loisel and Takane (2011), Okada and Imaizumi (1984, 1987, 1997), Rocci and Bove (2002), Saito (1991), Saito and Takeda (1990), Sato(1988), ten Berge (1997), Tobler (1976-77), Trendafilov (2002), Weeks and Bentler (1982), Yadohisa and Niki (1999), Young (1975), and Zielman and Heiser (1996).
Chino and Okada (1996) and Chino (1997) further divide the above methods (except for the models proposed after 1996) into three groups, i.e., the augmented distance model, the non-distance model, and the extended distance model. The augmented distance model is a family of AMDS in which some parameters are added to the metric distance between objects in order to handle asymmetry, and includes Borg and Groenen (2005), some of Gower's (1977) models, Krumhansl (1978), Okada and Imaizumi (1984, 1987, 1997), Saito (1991), Saito and Takeda (1990), Tobler (1976-1977), Weeks and Bentler (1982), Yadohisa and Niki (1999), Young (1975), and Zielman and Heiser (1993). Young's ASYMSCAL can be classified into this family of models.
Tobler (1976-1977) proposed a unique geographical AMDS model. His wind model assumes a kind of wind for the observed asymmetry, and estimates it using a vector field model. In order to estimate the vector field from the original similarity measures, the following model is assumed
(2) |
Yadohisa and Niki (1999) proposed a vector field model similar to the Tobler's model. They assume that the locations of objects have already been determined from the symmetric part of the data via some suitable MDS method. Given the configuration of objects, they estimate vectors at those locations as well as the estimated scalar potentials from the skew-symmetric part of the data.
Two of the models which Gower (1977) proposed, i.e., the jet-stream model and the cyclone model, are very similar to Tobler’s wind model. The jet-stream model was conceived by imaging a plane flying at a constant velocity V between two towns Pi and Pj which are dij distance apart. If there is a jet-stream, velocity v, making an angle θij with line Pi Pj then the flight times tij and tji are
(3) |
As a result, if the ratio v/V is sufficiently small to ignore v2/V2, the symmetric part and the skew-symmetric part of tij can be written as
(4) |
respectively. In any case, this model is formally the same as that of Tobler, if we reparameterize cij= v cos θij and cji = -v cos θij. Although Gower was interested in analyzing symmetric parts and skew-symmetric parts separately, the jet-stream model may be thought of as an AMDS model if we analyze them simultaneously. Borg and Groenen (2005) proposed a similar model to the jet-stream model in that both symmetric parts and skew-symmetric parts are analyzed simultaneously. It is called the hill-climbing model, and is written as
(5) |
The cyclone model also proposed by Gower (1977) is similar to the jet-stream model. This model is written as
(6) |
As Gower points out, this model is similar to the jet-stream model except that the jet-stream is replaced by a cyclonic wind rotating about its center C at a constant angular velocity. Moreover, it is the same as Tobler's wind model, if we reparametrize cij= ω hij and cji = -ω hij . As with the jet-stream model, the cyclone model may also be thought of as an AMDS model if we analyze them simultaneously. Later, we shall discuss an AMDS method proposed by Sato (1988, 1989) which generalizes the jet stream model from a mathematically more sophisticated view point. Krumhansl's distance-density model augments the distance dij as follows:
(7) |
where d*ij is an augmented distance, and δ(xi) and δ(xj) are measures of spatial density in the neighborhoods of objects i and j, respectively, while α and β are the corresponding weights applied to the densities.
By contrast, Weeks and Bentler (1982) proposed an augmented distance model (the W-B model) which simplifies the distance-density model. According to the W-B model,
(8) |
where a is an additive constant, and b =-1 if the data is composed of similarity measures instead of dissimilarity measures. The Euclidean distance, dij, may be replaced by d2ij in Eq. (8).
Okada and Imaizumi (1984) proposed a more general model than the distance-density model as well as the W-B model. This model is written as
(9) |
where α and β are constant weight parameters, and c(i,j,t) and c(j,i,t) are the terms which represent the skew-symmetric component and are assumed to be positive. They consider several sub-models, one of which is the O-I model (Okada & Imaizumi, 1987)
(10) |
Okada and Imaizumi (1997) extended it to the two-mode, three-way case.
Saito and Takeda (1990) proposed models similar in form to the O-I model. Model 2, which is the most general one, is written as,
(11) |
where dij is the Minkowski's r-metric, and r is an additive constant. This model is apparently a special case of Eq. (9) but it may be more appropriately regarded as an extension of Eqs. (7), (8), and (10). For example, although δ(xi) and δ(xj) in Eq. (7) as well as ri and rj in Eq. (10) are all positive by definition, there exist no such restrictions on a, b, θi, and θj in Eq. (11). It is evident that the second and third right-hand terms in (11) are extentions of those in (8), although the first right-hand term of (11) is a special case of that in (8). If θi's are positive, they are interpreted as stimulus specific effects, analogous to spatial densities in the distance-density model.
Later, Saito (1991) proposed the following model,
(12) |
which is partly a generalization of Eq. (8), because parameters in the second and third right-hand terms are distinct in Eq. (12) but they belong to the same set of parameters in Eq.(8). However, as for the first right-hand term, this model assumes a special case of that of the W-B model. Most of the above models are considered as a special case of the Holman model (Holman, 1979) stated as
(13) |
where sij is a similarity between objects i and j which Holman calls the proximity data and F is some strictly increasing function, and mij is a symmetric function. If mij is parametrized further by the coordinates of objects in a metric space, this model can be said to be an AMDS model in the narrowest sense. In any case, the only exception for the Holman model is a generalized version of the O-I model described by Eq. (10). Nosofsky (1991) calls the Holman model the additive similarity and bias model.
Zielman and Heiser (1993) proposed an algorithm to fit the slide-vector model, which had been suggested by Kruskal in 1973. This model is written as:
(14) |
where xit and xjt, respectively, are the coordinates of objects i and j on dimension t, and (z1, z2, ・・・ , zr) constitutes the slide-vector z. This model has a distinguishing feature that the diagonal elements of the model are non-zero, unlike a regular distance model. They showed how the coordinates and the slide-vector can be obtained by using an unfolding algorithm by Heiser (1987). This means that the slide-vector is a special version of the unfolding model originated by Coombs (1964). They also proposed a three-way generalization of the slide-vector model,
(15) |
However, we consider the unfolding model, especially the multidimensional unfolding models (e.g., Bennet & Hays, 1960; Hays & Bennett, 1961; Scho¨nemann, 1970) as they are neither a family of the augmented distance models nor a family of AMDS's in the narrowest sense, because in general the multidimensional unfolding models do not necessarily satisfy the conditions 1, 4, and 6 in our de?nition of AMDS in the narrowest sense. Incidentally, Zielman and Heiser (1993) also proposed the multiple slide-vector model as well as the row-weighted slide-vector model as candidates for other possible generalizations. The non-distance model is based on some quantity other than the metric distance, e.g., the inner product to the similarity measures (e.g., Chino, 1977, 1978, 1990; Con- stantine & Gower, 1978; DeSarbo et al., 1992; Escoufier & Grorud, 1980; Gower, 1977; Harshman, 1978; Harshman et al., 1982; Kiers & Takane, 1994; Loisel & Takane, 2011; Trendafilov, 2002).
Chino (1977, 1978) proposed an ASYMSCAL model different from Young (1975). In contrast with Young's augmented distance model, he proposed a special inner product model for AMDS, which is written as follows,
(16) |
where sij is a similarity between objects i and j, and c is an additive constant, while xil is the coordinate of object i on dimension l. It is apparent that the quantity in the first parentheses on the right-hand side of Eq. (16) and that in the second parentheses are, respectively, the inner product and the cross-product (outer product) of position vectors corresponding to two objects i and j on a two-dimensional plane. The cross-product is equivalent to the area of the parallelogram formed by the two position vectors.
Figure 1 shows the two-dimensional configuration of 10 nations including two regions for the trade data in Table 2. The positive direction in this figure is crucial in interpreting the direction of skewness in the amounts of trade between two nations. The sign of b in Eq. (16) determines this direction. It is apparent from the positive direction as well as the magnitudes of the parallelogram spanned by two position vectors corresponding to two nations that Japan is in a state called “unilateral love". In other words, Japan's trade surplus of exports over imports was prominent among the 10 nations.
Although Chino's ASYMSCAL was confined within the three-dimensional space, he later extended it to a generalized inner product model called GIPSCAL (Chino, 1990). GIPSCAL is written in matrix form as,
(17) |
where S = {sij}, a and b are constants, and c is an additive constant, while X is an N×q coordinate matrix, and Lq is a special skew-symmetric matrix (also, Chino, 1980; Gower, 1984).
Later, Kiers and Takane (1994) pointed out that the alternating least squares (ALS) algorithm for fitting the GIPSCAL off-diagonal elements did not constitute a true ALS, and hence need not decrease the objective function value monotonically. Furthermore, they simplified the Lq in Eq. (17) in order to facilitate the interpretation of the skew-symmetric part of the data as follows,
(18) |
where Δ is a fixed matrix with singular values of the matrix Lq in skew-symmetric 2×2 blocks along the diagonal. To be precise, Lq=U ΔU' and ). Rocci and Bove (2002) proposed a special case of (18). Trendafilov (2002) proposed another unique method for solving GIPSCAL eciently. In his approach, the GIPSCAL problem is reformulated into an initial value problem for matrix ordinary differential equations on manifolds defined by the constraints of the original least-squares problems. This algorithm has been found to produce solutions which give better fits to the data than the algorithms of Chino (1978) as well as Kiers and Takane (1994). Trendafilov also proposed a three-way GIPSCAL and its algorithm.
Recently, Loisel and Takane (2011) proposed a fast convergent algotithm for GIPSCAL with acceleration by the minimal polynomial extrapolation. They adapted their basic algorithm to various extensions of GIPSCAL, including off-diagonal DEDECOM/GIPSCAL, and three-way GIPSCAL.
Gower (1977) proposed several methods for the analysis of asymmetry, one of which includes in part the same quantity as Chino’s ASYMSCAL in the two-dimensional case. The Gower diagram or CASK, referred to in the introduction section, de- composes the skew-symmetric part of the square asymmetric data matrix S via the singular value decomposition,
(19) |
where Ssk denotes the skew-symmetric part of S, while X is an N×N orthogonal matrix, Λ is a special diagonal matrix of singular values such that Λ=diag (λ1, λ1, λ2, λ2, , (0)), and K is a skew identity matrix consisting of
as diagonal blocks. In scalar form, Eq. (19) is written as
(20) |
where p is the largest integer not exceeding N/2. Eq.(20) is nothing but the second term on the right-hand side without constant b in Chino’s ASYMSCAL given in (16). Chino (1977) and Gower (1977) introduced this quantity independently.
However, Gower took a philosophically different approach from Chino's. Holman points out that there exist two approaches to asymmetric proximity data. One consid- ers the symmetric part and skew-symmetric part of the data to be inseparable parts of the same fundamental process, and the other considers the two parts to reflect different processes that can be distinguished by appropriate analysis. Chino (1977) took the former approach, while Gower (1977) took the latter, although at least some other methods proposed by Gower, i.e., the jet-stream model and the cyclone model include the distance between two geographical points, as discussed earlier. In any case, we shall not regard at least CASK as an AMDS model because it possesses a symplectic structure which is different from that of the Euclidean metric structure (e.g., Arnold, 1978; Chino & Shiraiwa, 1993). Harshman and his colleagues (Harshman, 1978; Harshman et al., 1982) proposed a simple non-distance model called DEDICOM, which stands for the DEcomposition into DIrectional COMponents,