Pseudo-likelihood Estimation of Log-mulitplicative association models: The pleLMA Package

9/30/2021

Introduction

Log-Multiplicative Association (LMA) Models are special cases of log-linear models with two-way interactions and are extensions of the RC(M) association model for two variables to multivariate categorical data. The variables may be dichotomous or multi-category (ploytomous). In LMA models, a multiplicative structure is imposed on the (matrices) of interaction parameters; thereby, reducing the number of parameters and easing interpretation and descriptions of the relationships between variables. For example, 20 5-category variables results in a cross-classification with 9.536743e+13 cells, 190 interactions, and 760 (unique) parameters estimates for the interactions. An LMA model fit to such data would require many fewer parameters to represent the interaction. Maximum likelihood estimation (MLE) for small cross-classifications can be handled by the ‘gnm’ package (Turner and Firth (2020)), and other. The ‘logmult’ package (Bouchet-Valat et al. (2020)), which is a wrapper function for ‘gnm’, can be used to fit an RC(M) association (Goodman (1981)) model to two-way tables. Unfortunately, MLE becomes unfeasible for moderate to large numbers of variables and/or large numbers of categories. This package uses pseudo-likelihood estimation to remove the limitations of MLE due to the size of the data set. For LMA models, pseudo-likelihood estimation has been shown to be a viable alternative to MLE and yields parameter estimates nearly identical to MLE ones (Paek (2016), Paek and Anderson (2017)). Furthermore, pseudo-likelihood estimators are consistent and multivariate normaly distributed (Arnold and Straus (1991), Geys, Molenberghs, and Ryan (1999)).

LMA models have been derived from a number of different starting points, including graphical models (Anderson and Vermunt (2000)), (multidimensional) item response theory models (Anderson and Yu (2007), Anderson, Li, and Vermunt (2007), Anderson, Verkuilen, and Peyton (2010), Chen et al. (2018), Hessen (2012), Holland (1990), Marsman et al. (2018)), underlying multivarite normality (Goodman (1981), Becker (1989), Rom and Sarkar (1990), Wang (1987), Wang (1997)), the Ising model of feramagnetism (Kruis and Maris (2016)), distance based models (Rooij (2007), Rooij (2009), Rooij and Heiser (2005)), and others. The LMA models fit by the pleLMA package include the log-linear model of independence (as a baseline model), models in the Rasch family of item response theory (IRT) models, 1 and 2 parameter logistic IRT models, flexible generalized partial credit models (GPCM), and the Nominal response model. The importance of recognizing different starting points that lead to LMA models for oberved data is that the same model can be interpreted as arising from different underlying processes. For this package we take a more item response theory approach; however, it is important to note that this is not the only use for these models. For more details see Anderson, Kateri, and Moustaki (2021) as well as reference therein.

This document starts with a brief description of the models and assumptions often made about the underlying process. In the subsequent sections, the algorithm is described followed by a detailed illustration of how to use the package. Two data sets are included with the package and are used in the examples presented here. The “dass” data set consists of responses by a random sample of 1000 individuals to 42 four-category items designed to measure three different constructs. The “vocab” data set consists of responses to 10 dichotomous vocabulary items made by 1309 individuals. In the final section, other uses of the functions in the package are sketched and plans for future additions to the package are described. An appendix is also included that lists and describes all output from fitting models.

Log-multiplicative Assosciation Models

Both \(i\) and \(k\) denote variables and \(j_i\) and \(\ell_k\) denote categories of variables \(i\) and \(k\), respectively; however, to keep the notation simpler the subscripts on categories will be suppressed. The index \(m=1,\ldots, M\) is used to denote latent dimensions or traits. When describing the algorithm, \(n\) will be used to index for subjects (individuals, cases, etc), where \(n=1,\ldots, N\). The index \(n\) is suppressed in the presentation of the model. Let \(\mathbf{Y}\) be an \((I\times 1)\) vector of random categorical variables and \(\mathbf{y}= (y_1, \ldots, y_I)'\) be it’s realization where \(y_i=j\). The most general LMA for the probability that \(\mathbf{Y}=\mathbf{y}\) is \[\log (P(\mathbf{Y}=\mathbf{y})) = \lambda + \sum_{i=1} \lambda_{ij} + \sum_i \sum_{k>i} \sum_m \sum_{m'\ge m} \sigma_{mm'}\nu_{ijm}\nu_{k\ell m'}, \] where \(\lambda\) ensures that probabilities sum to 1, \(\lambda_{ij}\) is the marginal effect parameter for category \(j\) of variable \(i\), \(\sigma_{mm'}\) is the association parameter for dimensions \(m\) and \(m'\), and \(\nu_{ijm}\) and \(\nu_{k\ell m'}\) are category scale values for items \(i\) and \(k\) on dimensions \(m\) and \(m'\), respectively. The association parameters measure the strength of the relationship between items and the category scale values represent the structure.

LMA models as latent variable models can be derived as statistical graphical models where observed discrete variables (i.e., \(\mathbf{y}\)) are related to unobserved (potentially correlated) continuous ones (i.e., \(\mathbf{\theta}= \{\theta_m\}\)). The assumptions required to yield the general LMA model given above are that

  1. \(\mathbf{Y}\) follows a multinomial distribution,
  2. The categorical variables are independent conditional on the latent variables,
  3. The latent variables follow a homogeneous conditional multivariate normal distribution; that is, \(\mathbf{\Theta}|\mathbf{y} \sim MVN(\mathbf{\mu_y}, \mathbf{\Sigma}).\)

There are no latent variables in the LMA models, but the parameters for the distribution of the latent variables equal to or are functions of the parameters of the LMA models. The elements of the conditional covariance matrix \(\mathbf{\Sigma}\) are the \(\sigma_{mm'}\) parameters in the LMA model. The conditional means equal \[E(\theta_m|{\mathbf{y}}) = \sum_m \sigma_{mm} \left(\sum_i \nu_{ijm}\right) + \sum_{m'}\sigma_{mm'} \left(\sum_i \nu_{ijm'}\right).\]

The LMA above is very general and represents the model where each categorical variable is directly related to each of the latent variables and all latent variables are correlated. This model can be fit with sufficient identification constraints, but the current version of the pleLMA package only fits models where each categorical variable is directedly related to one and only one latent variable. This is not a limitation of pseudo-likelihood estimation, but of the current package. The identification constraints used in the package are that \(\sum_j\lambda_{ij}= 0\) and \(\sum_j \nu_{ijm}= 0\). Scaling identification constraints are also required, but these differ depending on the specific LMA model that is fit to data. The scaling identification constraints will given for each case of the model.

Relationship with Item Response Theory

Different IRT models can be fit by the placing restrictions the \(\nu_{ijm}\) parameters. For models in the Rasch (‘rasch’) family, the restrictions are that \[\nu_{ijm} = x_j,\] where the \(x_j\)s are typically equally spaced integers (e.g., 0, 1, 2, 3) and are the same for all items. The generalized partial credit model (‘gpcm’) places fewer restrictions on the \(\nu_{ijm}\) by allowing different weights for items and dimensions; namely, \[\nu_{ijm} = a_{im}x_j.\] The “nominal” model places no restrictions on the category scale values, \(\nu_{ijm}\).

As a default, the package sets \(\nu_{ijm}\) to equally spaced numbers where \(\sum_{j}\nu_{ijm}=0\) and \(\sum_j\nu_{ijm}^2=1\). These are starting values when fitting the nominal model and are the fixed \(x_j\)’s for the Rasch and GPCM. For both Rasch and GPCM, the \(x_j\)’s typically are set to equally spaced numbers; however, in the LMA framework, the \(x_j\)’s need not be equally spaced nor the same over items. In other words, the ‘pleLMA’ package allows for flexible category scaling and the user can set the \(x_j\) to whatever they want.

The Pseudo-likelihood Algorithm

Important for the pseudo-likelihood algorithm are the conditional distributions of the probability of a response on one item given values on all the others. The algorithm maximizes the product of the (log) likelihoods for all the conditionals, which can be done by using maximum likelihood of the conditional distributions. The conditional models that ‘pleLMA’ uses are \[ P(Y_{in}=j|\mathbf{y_{-i,n}}) = \frac{\exp (\lambda_{ij} + \nu_{ijm} \sum_{k\ne i}\sum_{m'} \sigma_{mm'}\nu_{k\ell(n) m'})} { \sum_h \exp(\lambda_{ih} + \nu_{ihm} \sum_{k\ne i}\sum_{m'} \sigma_{mm'}\nu_{k\ell(n) m'})} \hspace{1in} \\ = \frac{\exp (\lambda_{ij} + \nu_{ijm}\tilde{\theta}_{-i,mn})} { \sum_h \exp(\lambda_{ih} + \nu_{ihm} \tilde{\theta}_{-i,mn})}\qquad\qquad (1)\\ = \frac{\exp (\lambda_{ij} + \sum_{m'}\sigma_{mm'}\ddot{\theta}_{ijm'n})} { \sum_h \exp(\lambda_{ih} + \sum_{m'}\sigma_{mm'}\ddot{\theta}_{i,jm'n})}, \qquad (2)\] where \(n\) indicates a specific individual (subject, case, respondent, etc), \(\mathbf{y_{-i,n}}\) are responses by person \(n\) to all items except item \(i\), the subscript \(\ell(n)\) indicates that person \(n\) selected category \(\ell\) on item \(k\), and the predictor variables \(\tilde{\theta}_{-i,mn}\) and \(\ddot{\theta}_{ijm'n}\) are functions of the parameters representing interactions.

The predictor \(\tilde{\theta}_{-i,mn}\) in (1) are weighted sums of person \(n\)’s category scale values for \(k\ne i\), \[\tilde{\theta}_{-i,mn} = \sum_{k\ne i} \sum_{m'} \sigma_{mm'}\nu_{k\ell(n)m'}.\] Fitting the conditional multinomial logistic regression model (1) using \(\tilde{\theta}_{-i,mn}\) yields estimates of \(\lambda_{ij}\) and \(\nu_{ijm}\) parameters. We refer to these as “item regressions” because we are fitting models to each item.

In (2), the predictors are defined as \[\ddot{\theta}_{-i,jm'n} = \nu_{ijm}\sum_{k\ne i} \nu_{kl(n)m'}.\] for each \(m'=1,\ldots, M\). The predictor \(\ddot{\theta}_{-i,jm'n}\) not only depends on the individual, but also on the category \(j\) of item \(i\) that is used in model in (2). In particular, the sum is multiplied by \(\nu_{ijm}\). Using \(\ddot{\theta}_{-i,jm'n}\) yields estimates of \(\lambda_{ij}\) and all of the \(\sigma_{mm'}\)s. The \(\sigma_{mm'}\) parameters are restricted to be equal over ‘equations’ for different items and this restriction is built into the algorithm by stacking all the data (i.e., ‘stacked regressons’)

The algorithm is modular and has two major components.

  1. Uses \(\tilde{\theta}_{-i,mn}\) in (1) to obtain estimates of the \(\lambda_{ij}\) and \(\nu_{ijm}\) (nominal model) or \(a_{im}\) (GPCM) parameters, and

  2. Uses \(\ddot{\theta}_{-i,jm'n}\) in (2) to obtain estimates of \(\lambda_{ij}\) and the \(\sigma_{mm'}\) parameters.

The two components are combined to estimate LMA models corresponding to multidimensional GPCM or nominal models.

All algorithms work with a Master data set that is a vertical concatenation of the data for each category of each item and for each individual (i.e., a stacked data set). The number of rows equals the number of cases\(\times\) number of items \(\times\) number of categories per item. Model (1) is fit to a sub-set of the Master data set for a specific item (“item data” for item regressions), and model (2) is fit to the entire Master data set (“stacked data” for stacked regressions). The Master data set is properly formatted for input to ‘mlogit’, which also means that sub-sets are properly formatted.

Alogrithm I: Estimation of \(\lambda_{ij}\) and \(\nu_{ijm}\) (or \(a_{im}\))

  1. Up-date category scores by for each item i= 1,…, I:
    1. Create item data for modeling item \(i\) and compute weighted rest-scores \(\tilde{\theta}_{-i,mn}\) using the current values of the parameters.
    2. Up-date \(\nu_{ijm}\)s by fitting (1) to the item data for item \(i\) with \(\tilde{\theta}_{-i,mn}\) as the predictor variable.
    3. Save the log likelihood and up-dated \(\nu_{ijm}\)s in the log file and the master data set.
    4. Repeat steps (i) through (iii) until all item category scores have been up-dated.
  2. Check convergence
    1. If the algorithm has not converged go back to step 1.
    2. If the algorithm has converged, compute and save results.

Algorithm I requires values for \(\sigma_{mm'}\). For uni-dimensional models, we can set \(\sigma_{11}=1\), and for multidimensional models we need to input a matrix of \(\sigma_{mm'}\)s. The matrix could be based on prior knowledge or obtained by running Algorithm II. By default, the ‘pleLMA’ package sets starting values for matrix of the \(\sigma_{mm'}\)s equal to an identity matrix.

To obtain the \(a_{im}\) parameters of the GPCM requires a slight change in the computation of the predictor variables; namely, \[\nu_{ijm}\tilde{\theta}_{-i,mn} = \nu_{ijm}\sum_{k\ne i} \sum_{m'} \sigma_{mm'}\nu_{k\ell(n)m'} \\ \qquad\qquad= a_{im} x_j\sum_{k\ne i} \sum_{m'} \sigma_{mm'}\nu_{k\ell(n)m'} \\ \qquad\qquad = a_{im} (x_j\sum_{k\ne i} \sum_{m'} \sigma_{mm'}\nu_{k\ell(n)m'}) \\ \qquad\qquad = a_{im}\grave{\theta}_{-i,jmn}.\]

The predictor variables is \(\grave{\theta}_{-i,jmn}\) and the coefficient for this predictor variable will be \(a_{im}\).

Alogrithm II: Estimation of \(\lambda_{ij}\)s and \(\sigma_{mm'}\)

  1. Compute and add the \(\ddot{\theta}_{-i,jm'n}\) predictors to the stacked data
  2. Fit a single discrete choice model to the stacked data.
  3. Save results.

By fitting (2) to the stacked the data, the equality restrictions on the \(\sigma_{mm'}\) parameters over the items are imposed.

Algorithm III: Estimation of \(\lambda_{ij}\), \(\nu_{ijm}\) (or \(a_{im}\)), and \(\sigma_{mm'}\):

  1. Run Algorithm I to get estimates for \(\nu_{ijm}\) (or \(a_{im}\)) using starting values for \(\nu_{k\ell m}\) and \(\sigma_{mm'}\)s.
  2. Run Algorithm II to get estimates of \(\sigma}_{mm'}\) using current values of \(\nu_{ijm}\)s.
  3. Impose scaling constraint.
  4. Run Algorithm I to get estimates for \(\nu_{ijm}\) (or \(a_{im}\)) using currents values for \(\nu_{k\ell m}\) and \(\sigma_{mm'}\)s.
  5. Check convergence by comparing the maximum of the absolute value of the difference between the likelihoods on the last two iterations.
    1. If the algorithm has not converged repeat steps 2 through 4.
    2. If the algorithm has converged, compute statistics and save output.

When combining Algorithms I and II, a new step is added to Algorithm III: imposing a scaling identification constraint. This is required for the joint distribution (i.e., LMA model). Without the required identification constraint, the Algorithm III will not converge. This can be seen in the scaling constraint, because scale values could become very large and association parameters very small but their products remain the same. To impose the scaling identification constraint, the conditional covariance matrix is transformed into a conditional correlation matrix; that is, \(\sigma_{mm}^{*}=\sigma_{mm} \times c = 1\) and \(\sigma_{mm'}^{*}=\sigma_{mm'} \times \sqrt{c}\). The scale values also need to be adjusted, \(\nu_{ijm}^{*} = \nu_{ijm}/\sqrt{c}\). The method of imposing the scaling identification constraint differs from Paek and Anderson (2017) who used an option in SAS PROC MDC that allows parameters to be fixed to particular values.

The order of steps 2 and 3 in Algorithm III is not of great importance, but is more a matter of convenience. In Algorithm III, we started with up-dating estimates of the \(\nu_{ijm}\) parameters, which can set the algorithm in a good starting place (i.e., guard again non-singular estimates of \(\mathbf{\Sigma}\) in the first iteration). At convergence, the value of the maximum log likelihood from Algorithm III step 2 (i.e., ‘mlpl.phi’) equals the maximum likelihood from the stacked regression and it also equals the sum over items’ maximum likelihoods from step 4 (i.e., ‘mlpl.item’). These are the maximums of the log of the pseudo-likelihood function and should be equal. Also, at convergence, \(\sigma_{mm} = 1\) within rounding error.

Different models use the different algorithms. The models and the required algorithm as shown in the following table.

Dimensions Model Algorithm
0 Independence II
1 Rasch family II
1 GPCM I
1 Nominal I
> 1 Rasch family II
> 1 GPCM III
> 1 Nominal III

For dichotomous items, the 1pl model is just a Rasch model and the 2pl model is the same as a uni-dimensional GPCM and Nominal model. Also for \(x_j=1,2, \ldots, J\) and \(M=1\), the uniform association model for 2-way tables is a special case of the Rasch model. The multidimensional Rasch models can be thought of as generalizations of the uniform association model to higher-way tables (Anderson, Kateri, and Moustaki (2021)).

Algorithm II was proposed by Anderson, Li, and Vermunt (2007) for models in the Rasch family, and Algorithms I and III for the nominal model were proposed and studied by Paek (2016) (Paek and Anderson (2017)). Algorithms I and III for the GPCM and adapting Algorithm II for the independence models are (as far as I know) novel here. Using relatively small data sets (simulated and data from various studies), the parameters estimates from MLE and PLE for the LMA models are nearly identical and \(r\ge .98\).

The Package

The ‘pleLMA’ package uses base R for data manipulation, ‘stats’ for specifying formulas, and ‘graphics’ for plotting results. The current package uses the ‘mlogit’ package (Croissant (2020)) to the fit the conditional multinomial models (i.e., discrete choice models) to the data. We expect that given the use of base R, stats and graphics that the package will be forward compatible with future releases of R.

The function ‘ple.lma’ is the main wrapper function that takes as input the data and model specifications. This function calls three functions that perform the following tasks:

  1. Check for errors in the data and model specifciation using function ‘error check’
  2. Set up the problem using the function ‘set.up’
  3. Fit the model to data by calling either ‘fit.independence’, ‘fit.rasch’, ‘fit.gpcm’ or ‘fit.nominal’

The functions in steps 1, 2 and 3 can be run outside of the ‘ple.lma’ function. Functions that are called within the model fitting functions are also available, but these typically would not be run independently.

Many objects and a lot of information is produced by ‘ple.lma’ and a table with a full list of these objects along with their descriptions are provided in the appendix. Auxiliary or utility functions are provided to aid in examining and saving results that are most likely of interest. Most of the functions in the table below are run after a model has been fit to the data. There are two exception which are used internally to determine convergence of the ple algorithm, but can also be used to examine convergence on an item by parameter basis.

Auxiliary Functions Description
lma.summary Produces a summary of results
convergences.stats Statistics used to assess convergence of the Nominal model
convergenceGPCM Statistics use to assess convergnece of the GPCM
iteration.plot Plots estimated paramaters x iteration for GPCM and Nominal models
scalingPlot Graphs estimated scale value by integers for the Nominal model
reScaleItem Changes the scaling identification constraint by putting it on the category scale values (Nominal model)
theta.estimates Computes individuals’ estimated values on the latent traits

The use of all these functions are illustrated below.

Set Up

Install and Load Package

The Data

Two data sets are included with the package: “vocab” which has responses by 1,309 individuals to 10 dichotomous items, and “dass” which has responses by 1,000 individuals to 42 four-category items. The vocab data frame contains responses to vocabulary items from the 2018 General Social Survey and were retrieved July 2019 from https://gss.norc.org. For dichtomous data, the GPCM and Nominal models correspond to a two parameter logistic model and these data are used to illustrate the equvalency.

The larger “dass” data set, which is primarily using in this document (retrieved July, 2020 from https://openpsychometrics.org), consists of responses collected online during the period of 2017 – 2019 to 42 4-category items from the 38,776 respondents. Only a random sample of 1,000 is included with the package. The items were presented online to respondents in a random order. The items included in dass are responses to item on scales designed to measure depression (d1–d14), anxiety (a1–a13), and stress (s1–s15). To attach the data,

and for more information including the items themselves and the response scale, enter

The data should be a data frame where the rows are individuals or cases and the columns are different variables or items that will be modeled. The rows can be thought of as response patterns or cells of a cross-classification of the categorical variables. Within the data frame, the categories for each variable should run from 1 to the number of categories. In this version of the package, the number of categories per variable should all be the same; therefore, there should be at least one repsonse for each category of each variable (required by “mlogit”). There are no other restrictions on the number of categories.

Basic syntax of ‘ple.lma’

The basic syntax of the wrapper function ‘ple.lma’ is presented below with information about the input to the function. In the following sections, we go through a number of examples showing the usage of ‘ple.lma’ and auxliary functions. The function syntax is

ple.lma(inData, model.type, inItemTraitAdj = NULL, inTraitAdj = NULL, tol = NULL, starting.sv = NULL, starting.phi = NULL)

All models require data (inData) as described above and a model type. The possible values for model.type are

By default, the package sets \(x_j\) to equally spaced numbers centered at 0 and the sums of the squares equals 1. These values also act as starting values for the Nominal model. In the LMA framework, the \(x_j\)’s for the Rasch and GPCM models need not be equally spaced nor the same over items. In other words, the pleLMA package allows for flexible category scaling (i.e., \(x_{ij}\)). The ‘pleLMA’ package allows the user to set these number to desired values.

For the Rasch, GPCM and Nominal models both “inItemTraitAdj” and “inTraitAdj” must be given to complete the minimal model specification. The object “inTraitAdj” is an \((M\times M)\) trait by trait adjacency matrix where a 1 indicates that traits are correlated and 0 uncorrelated. The object “inItemTraitAdj” is an \((I\times M)\) item by trait adjacency matrix where a 1 indicates the item is directly related to a latent variable and 0 otherwise. Only one 1 should be in each the row of “inItemTraitAdj”.

The remaining objects are optional. The value given for "tol’ determines whether the pseudo-likelihood algorithm has converged. The algorithm is said to have converged when \(\mbox{criterion} < \mbox{tau}\) where the critierion equals the maximum over items of the absolute values of difference between items’ log likelihood from the last two iterations. The default is \(\mbox{tol} =1e-06\), which is fairly strong. Since the independence and Rasch model are fit only once, they do not require a tolerence value for the pseudo-likelihood algorthim.

The user can specify the starting values of the scale values for the Nominal model or the fixed \(x_j\)s for the GPCM and Rasch models. This is done using “starting.sv”, which is an item by \(\nu_{ijm}\)s (or \(x_j\)s) matrix. The default are the values are equally spaced, sum over categories equals zero, and the sum of squares equals 1.

The last option is to input a matrix of starting value(s) for the \(\sigma_{mm'}\), which within the package the \(\sigma_{mm'}\) are referred to as “phi” parameters. This is the terminology used in the RC(M) association and LMA literature. The default is an identity matrix.

Example: \(I=9\) items, \(J=4\) categories, \(N=250\) cases

The dass data used in this example consists of a subset N=250 cases and 3 items from each of three scale designed to measure depression (d1-d3), anxiety (a1-a3), and stress (s1-s3). The input data frame, inData, is created by follows:

##   d1 d2 d3 a1 a2 a3 s1 s2 s3
## 1  3  3  3  1  3  1  1  1  2
## 2  2  2  2  1  1  1  2  1  2
## 3  4  2  4  1  1  1  4  4  4
## 4  1  4  1  2  1  1  3  2  2
## 5  2  4  3  1  1  1  3  3  3
## 6  3  3  2  4  3  1  4  4  4

Uni-dimensional Models: \(M=1\)

Uni-dimensional model are those where there is only one latent trait (i.e., \(M=1\)). In graphical modeling terms, each observed categorical variable is directly related to the continuous (latent) varable and the categorical variables are independent conditional of the continuous varaible.

Input

The minimal commands for each type of model are given below. Since there are no interactions in the independence log-linear model, only 2 objects are required; namely,

## [1] "No errors detected in the input"
## Basic set up is complete

For all models, two messages are printed to the console: “No errors detected in the input” and “Basic set up is complete”. The first step in ‘ple.lma’ is to check the input for 11 possible errors. If an error is detected, the function will stop and issue an error message stating the problem. If no errors are detected, the function continues on to set up the data and objects needed by all models. Subsequently, the pseudo-likelihood algorithm begins.

For the models other than indepenendence, the trait \(\times\) trait adjacency matrix for \(M=1\) is simply a \((1\times 1)\) matrix of all ones; that is,

##      [,1]
## [1,]    1

The item \(\times\) trait adjacency matrix is an \((I\times M)\) matirx, which for simple uni-dimensional models is a vector of ones,

##       [,1]
##  [1,]    1
##  [2,]    1
##  [3,]    1
##  [4,]    1
##  [5,]    1
##  [6,]    1
##  [7,]    1
##  [8,]    1
##  [9,]    1

For models in the Rasch family, we simply add the adjacency matries and change the model.type,

## [1] "No errors detected in the input"
## Basic set up is complete

The independence log-linear model and models in the Rasch family only involve iterations within the package ‘mlogit’. The tolerance and convergence information reported for these models are from ‘mlogit’ of the stacked regression .

The GPCM and Nominal models involve iteratively fitting discrete choice models (i.e., conditional multinomial logistic regression models). In addition to messages about errors and set up, information is printed to the console about the progress of the algorithm. For the GPCM, the minimal input is

## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "3.6989447657316 > 1e-06"
## [1] "1.08080901620906 > 1e-06"
## [1] "0.487360982577798 > 1e-06"
## [1] "0.0911141013642691 > 1e-06"
## [1] "0.0120232347879323 > 1e-06"
## [1] "0.00599192655766956 > 1e-06"
## [1] "0.00162590365238202 > 1e-06"
## [1] "0.000135111864096871 > 1e-06"
## [1] "6.53300401722845e-05 > 1e-06"
## [1] "2.59365041301862e-05 > 1e-06"
## [1] "3.66155552455893e-06 > 1e-06"
## [1] "The Alogithm has converged: 7.20407683729718e-07 < 1e-06"

For each iteration of the pseudo-likelihood algorithm, the both the convergence criterion (i.e., the maximum absolute difference between items’ log likelihoods from the item regressions from the current and previous iteration) and the tolerance is printed to the console. In this case, the criterion decreases until it is less than the tolerance (default is 1e-06).

The minimal input for the nominal model is

## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "236.649905980524 > 1e-06"
## [1] "2.71037044210016 > 1e-06"
## [1] "0.840033082481511 > 1e-06"
## [1] "0.287427083080047 > 1e-06"
## [1] "0.0515301509960295 > 1e-06"
## [1] "0.00793554244540928 > 1e-06"
## [1] "0.0019453318297451 > 1e-06"
## [1] "0.000510242479663248 > 1e-06"
## [1] "5.78020893158282e-05 > 1e-06"
## [1] "3.59942290515392e-05 > 1e-06"
## [1] "1.67380641187265e-05 > 1e-06"
## [1] "2.89701125666397e-06 > 1e-06"
## [1] "Alogithm has converged: 3.44115989037164e-07 < 1e-06"

The option “starting.sv” is an (I x J) matrix of starting scale values (i.e., the \(\nu_{ijm}\)s) for Nominal models and are the fixed category scores \(x_j\) (or \(x_{ij}\)) for the Rasch and GPCMs. If the user wants to use values other than the default values, they can be input using “starting.sv”. For example, instead of equally spaced, centered, and scaled \(x_j\),

## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "19.5691018607401 > 1e-06"
## [1] "16.3283330924081 > 1e-06"
## [1] "2.50604276062921 > 1e-06"
## [1] "0.465396650323498 > 1e-06"
## [1] "0.18624435662133 > 1e-06"
## [1] "0.0270409368876585 > 1e-06"
## [1] "0.0105620122227492 > 1e-06"
## [1] "0.00420046499175442 > 1e-06"
## [1] "0.000631235661728624 > 1e-06"
## [1] "0.00012620415134279 > 1e-06"
## [1] "5.83316945608203e-05 > 1e-06"
## [1] "1.27844934922905e-05 > 1e-06"
## [1] "1.80869852783871e-06 > 1e-06"
## [1] "The Alogithm has converged: 7.69622545249149e-07 < 1e-06"

Full discussion of the available output is discussed later, but for now to determine which model is better, we take a quick look at the values of the maximum of the log pseudo-likelihood function, which equal -2427.440058 for the equally spaced scores and -2537.6872952 for the un-equally spaced scores. Both models have the same number of estimated parameters but the maximum of the log of the pseudo-likelihood (MLPL) is smaller for the model with alternative scores; therefore, the original model with equally spaced scores fits the data better. If the scores had been (0, 1, 2, 3), the model fit would be identical to the default scores set by the ‘ple.lma’ function. The only difference would be in the \(\hat{a}_{im}\)s. Alternative scores \(x_j\) can also be input for models in the Rasch family.

Using ‘starting.sv’ with the Nominal model sets the starting values for the \(\nu_{ijm}\) parameters. For all models, the starting values for \(\mathbf{\Sigma}\) can also be input. For example, the nominal model can be re-fit in fewer iterations if we start it using the parameter estimates from a previous run. For example, the scale values are in ‘estimates’ and \(\sigma_{11}\) is in ‘Phi.mat’.

##      loglike    lambda1   lambda2     lambda3     lambda4        nu1
## d1 -271.8619 -0.3072018 0.5729386  0.20220946 -0.46794626 -0.6244720
## d2 -278.1154 -0.6993157 0.3928989  0.32443811 -0.01802130 -0.6136364
## d3 -269.7005 -0.4754174 0.2657661  0.09164880  0.11800251 -0.6412970
## a1 -299.0364  0.3177529 0.3023269 -0.25703601 -0.36304379 -0.2799762
## a2 -238.3700  1.1304281 0.9194269 -0.04690862 -2.00294635 -0.6211805
## a3 -265.5721  0.6755939 0.4940930 -0.32983554 -0.83985144 -0.3996866
## s1 -262.8586 -1.8326788 0.8193626  0.53566601  0.47765013 -0.8423173
## s2 -267.1777 -1.1235991 0.4818343  0.44781800  0.19394681 -0.7782505
## s3 -247.3727 -0.9227059 0.6880819  0.14336499  0.09125901 -0.9037234
##            nu2        nu3       nu4
## d1 -0.07818989  0.1709838 0.5316781
## d2 -0.12937578  0.2060691 0.5369431
## d3 -0.06718694  0.1845619 0.5239221
## a1 -0.06605658 -0.0189421 0.3649749
## a2 -0.21095746 -0.0179575 0.8500955
## a3 -0.05546516  0.1073712 0.3477806
## s1 -0.09575696  0.3142374 0.6238368
## s2 -0.13235815  0.3038428 0.6067659
## s3 -0.28199663  0.4107563 0.7749637
## [1] "No errors detected in the input"
## Basic set up is complete
## [1] "238.369997006672 > 1e-06"
## [1] "Alogithm has converged: 2.50871607931913e-08 < 1e-06"

The algorithm converged after 1 full iteration.

Before fitting the multidimensional models we look at the output from fitting the models.

Output

The ‘pleLMA’ package produces large number of objects, some of which are NULL. Whether objects are null or not depends on the particular LMA model to fit the data (i.e., Alogirthm I, II or III used). In the appendix is a table of all 28 objects in the output and a brief description of each of them. The table also indicates whether a particular alogrithm used the object (i.e., the object is NULL or not NULL)

Typically not all of the objects in output need to be examined. The function ‘lma.summary( )’ organizes a summary of output that is likely to be of most importance and probably saved. The function ‘lma.summary( )’ yields a list with five elements:

  1. “report” – A report of information about the data, convergence, and fit statistics,
  2. “TraitByTrait” – the Trait \(\times\) Trait adjacency matrix,
  3. “ItemByTrait” – the Item \(\times\) Trait matrix,
  4. “estimates” – estimated \(\lambda_{ij}\) and \(\nu_{ijm}\) or \(a_{im}\) (and \(x_j\)’s for gpcm and rasch models).
  5. "phi – estimated \(\sigma_{mm'}\) parameters

For example, you can get a summary by entering

You can also request specific sections as follows. For the basic summary report,

##       [,1]                                                       
##  [1,]                                                            
##  [2,] =========================================================  
##  [3,] Pseudo-likelihood Estimation of nominal model              
##  [4,] =========================================================  
##  [5,] Report Date:  2021-10-01 12:03:28                          
##  [6,]                                                            
##  [7,] Data Information:                                          
##  [8,]    Number of cases/individuals  250                        
##  [9,]    Number of items  9                                      
## [10,]    Number of categories per item 4                         
## [11,]    Number of dimensions:  1                                
## [12,]                                                            
## [13,] Model Specification:                                       
## [14,]   Number of unique parameters 54                           
## [15,]   Number of unique marginal effects:  27                   
## [16,]   Number of unique category parameters (nu's or a's):  27  
## [17,]   Number of unique association parameters (phis): 0        
## [18,]                                                            
## [19,] Convergence Information:                                   
## [20,]   Number of iterations:  13                                
## [21,]   Tolerence set tol 1e-06                                  
## [22,]   Criterion  3.44115989037164e-07                          
## [23,]                                                            
## [24,] Model Fit Statistics:                                      
## [25,]   Maximum log pseudo-likelihood function: -2400.06537829016
## [26,]   AIC:   2346.06537829016                                  
## [27,]   BIC:   4501.97186701575                                  
## [28,]

AIC and BIC are included in the summary and may differ from those in the output from mlogit. The ‘ple.lma’ package computes these as \[\mbox{AIC} = -2*\mbox{mlpl} + p\] \[\mbox{BIC} = -2*\mbox{mlpl} + p*\log(N)\ ,\] where \(\mbox{mlpl}\) is the maximum of the log of the pseudo-likelihood function, \(p\) is the number of parameters, and \(N\) is the sample size (i.e., number of cases or individuals). Models with smaller values are better. Note that AIC tends to select more complex models and BIC tends to select simpler models. Deciding on a model should not rest solely on global statistics.

For a complete record of the model specification also requires

##      [,1]
## [1,]    1
##       [,1]
##  [1,]    1
##  [2,]    1
##  [3,]    1
##  [4,]    1
##  [5,]    1
##  [6,]    1
##  [7,]    1
##  [8,]    1
##  [9,]    1

The last two are objects contain the parameter estimates,

##      loglike    lambda1   lambda2     lambda3     lambda4        nu1
## d1 -271.8619 -0.3072018 0.5729386  0.20220946 -0.46794626 -0.6244720
## d2 -278.1154 -0.6993157 0.3928989  0.32443811 -0.01802130 -0.6136364
## d3 -269.7005 -0.4754174 0.2657661  0.09164880  0.11800251 -0.6412970
## a1 -299.0364  0.3177529 0.3023269 -0.25703601 -0.36304379 -0.2799762
## a2 -238.3700  1.1304281 0.9194269 -0.04690862 -2.00294635 -0.6211805
## a3 -265.5721  0.6755939 0.4940930 -0.32983554 -0.83985144 -0.3996866
## s1 -262.8586 -1.8326788 0.8193626  0.53566601  0.47765013 -0.8423173
## s2 -267.1777 -1.1235991 0.4818343  0.44781800  0.19394681 -0.7782505
## s3 -247.3727 -0.9227059 0.6880819  0.14336499  0.09125901 -0.9037234
##            nu2        nu3       nu4
## d1 -0.07818989  0.1709838 0.5316781
## d2 -0.12937578  0.2060691 0.5369431
## d3 -0.06718694  0.1845619 0.5239221
## a1 -0.06605658 -0.0189421 0.3649749
## a2 -0.21095746 -0.0179575 0.8500955
## a3 -0.05546516  0.1073712 0.3477806
## s1 -0.09575696  0.3142374 0.6238368
## s2 -0.13235815  0.3038428 0.6067659
## s3 -0.28199663  0.4107563 0.7749637
##      [,1]
## [1,]    1

The rows of `estimates’ correspond to items and the columns the parameter estimates. The “lam”s are the marginal effects for categories 1 through 4 and the “nu”s are the category scale values. Note that only three \(\lambda_{ij}\)s and three \(\nu_{ijm}\)s were estimated. The fourth value is found by the identification constraint on the locations of the parameters; that is, \(\hat{\lambda}_{i1} = - \sum_{j=2}^4\hat{\lambda}_{ij}\) and \(\hat{\nu}_{i1} = - \sum_{j=2}^4 \hat{\nu}_{ij}\). For the GPCM and Nominal model, the first column of ‘estimates’ contains the values of the maximum log-likelihood values from using MLE to fit each item’s item regression. The sums of these log-likelihoods equals ‘mlpl.item’.

For the GPCM (and models in the Rasch family), ‘estimates’ includes the \(x_j\)’s used to fit the model to data. For example,

##      loglike    lambda1   lambda2     lambda3      lambda4         a         x1
## d1 -273.5212 -0.1366067 0.4895941  0.19915641 -0.552143823 0.8754073 -0.6708204
## d2 -279.1964 -0.5731578 0.2981472  0.33436124 -0.059350596 0.8728514 -0.6708204
## d3 -273.1440 -0.2915822 0.1145274  0.09507062  0.081984131 0.8497784 -0.6708204
## a1 -302.1068  0.2861255 0.2509015 -0.27432783 -0.262699190 0.4242113 -0.6708204
## a2 -243.3139  0.9285286 0.7411034 -0.37057055 -1.299061523 0.8250458 -0.6708204
## a3 -268.1178  0.7339107 0.4886952 -0.33246800 -0.890137943 0.5667876 -0.6708204
## s1 -266.8711 -1.1926595 0.5551031  0.40194018  0.235616207 0.9920283 -0.6708204
## s2 -269.6688 -0.8145340 0.3257053  0.43824213  0.050586553 1.0340729 -0.6708204
## s3 -251.5000 -0.8155957 0.6201390  0.19691960 -0.001462858 1.2499226 -0.6708204
##            x2        x3        x4
## d1 -0.2236068 0.2236068 0.6708204
## d2 -0.2236068 0.2236068 0.6708204
## d3 -0.2236068 0.2236068 0.6708204
## a1 -0.2236068 0.2236068 0.6708204
## a2 -0.2236068 0.2236068 0.6708204
## a3 -0.2236068 0.2236068 0.6708204
## s1 -0.2236068 0.2236068 0.6708204
## s2 -0.2236068 0.2236068 0.6708204
## s3 -0.2236068 0.2236068 0.6708204

For more detailed information about items’ regressions after convergence, output from ‘mlogit’ is saved. All the information that can be extracted from ‘mlogit’ objects can be obtained (e.g., residuals, various fit statistics, etc). Below are the names and classes for these objects:

Dimensions Model item.log phi.log item.mlogit phi.mlogit
0 independence NULL NULL NULL mlogit
1 rasch NULL NULL NULL mlogit
1 gpcm list NULL list NULL
1 nominal list NULL list NULL
>1 rasch NULL NULL NULL mlogit
>1 gpcm list matrix list mlogit
>1 nominal list matrix list mlogit

Auxilary Functions

To further examine the convergence of the algorithm, we can look at the log files and the convergence statistics for all parameters. For iterations (log file), we can print the value of the parameters for each iteration. The object “n1$item.log” is a list where the 3rd dimension is the item. The history of iterations for item 1 is

Alternatively these can be plotted them using the function

If you run the above command you see that algorithm gets very close to the final values in about 5 iterations, but continues to meet the more stringent criterion given by “tol”. With these data and model, using random uniform starting scale values converged in about the same number of iterations and the correct order of the scale values is found within 5 iterations.

For another view of how well the algorithm converged, we can look at the differences between values from the last two iterations, which is given for each item’s maximum log-likelihood and all item parameters. The function ‘convergence.stats’ is internal to the ‘fit.nominal’ function and requires input that is not available in the model fit output. To run ‘convergance.stats’, you have to first run ‘set.up’ as follows:

The output will include lines such as

$diff.last.LogLike

[1] -1.135732e-07 -1.712294e-09 -3.595750e-07 1.972623e-07 3.441160e-07 8.999615e-08 7.581832e-08 4.029459e-09 1.782126e-07

$diff.last.lam2

[1] -2.899204e-09 -2.412534e-09 -2.610206e-09 -8.204556e-10 2.656844e-09 7.331145e-10 -3.150264e-10 -5.599238e-10 -4.942857e-11

$diff.last.nu4

[1] -9.993916e-09 -9.844411e-09 -9.974334e-09 -2.000938e-09 6.178604e-09 2.790911e-09 5.502777e-09 4.125844e-09 3.617542e-09

There is a “\(\$\)diff.last” for the items’ maximum log likelihoods (i.e., diff.lat.LogLike), lambda parameters (e.g, diff.last.lambda2), and scale values (e.g., diff.last.nu4). The length of each of these corresponds to the number of items. Even though tol\(=1e-06\), the largest differences for the item parameters are considerably smaller. For the GPCM model, the command is

For the nominal model, the function ‘scalingPlot’ graphs the scale values by integers and overlays a linear regression line. These can be used to determine the proper order of categories and whether a linear restriction could be imposed on them, such as done in a GPCM. The plots also convey how strongly related the items are to the latent trait where steeper slopes indicate the stronger relationships. To produce these plots

To fit the models, we imposed scaling identification constraints, which for GPCM and the Nominal model were \(\sigma_{mm}= 1\). However, we can change this such that a scaling constraint is put on one item (for each latent variable) and phis (i.e., sigma’s) are estimated. This teases apart the strength and structure of the relationships between items and the latent trait. The function `reScaleItem’ rescales the values from the Nominal model. One item (per latent variable) needs to be selected on which the scaling constriant is placed, and this is indicated using a vector anchor. To do the rescaling using item d1,

## $sNu
##          [,1]        [,2]        [,3]      [,4]
## d1 -0.7421600 -0.09292556  0.20320737 0.6318782
## d2 -0.7292823 -0.15375795  0.24490490 0.6381354
## d3 -0.7621558 -0.07984900  0.21934439 0.6226604
## a1 -0.3327406 -0.07850561 -0.02251194 0.4337581
## a2 -0.7382482 -0.25071452 -0.02134177 1.0103045
## a3 -0.4750116 -0.06591813  0.12760642 0.4133233
## s1 -1.0010605 -0.11380333  0.37345859 0.7414052
## s2 -0.9249197 -0.15730238  0.36110503 0.7211170
## s3 -1.0740391 -0.33514174  0.48816745 0.9210134
## 
## $sPhi.mat
##          [,1]
## [1,] 0.707996

If the log-multiplicative models are being used to for measurement, estimates of values on the latent variable can be computed using the estimated item category scale values and conditional variances/covariances (see formula for \(E(\theta_m|\mathbf{y})\)). The function ‘theta.estimates’ will compute these values:

The rows will correspond to individuals and colums to values for each latent variable (only 1 column for uni-dimensional models).

Multi-dimensional models, \(M>1\)

Since the items of dass are designed to assess three difference constructs, we fit a 3-dimensional model and allow the latent variables to be conditionaly correlated (i.e, within response pattern). We only need to change ‘inTraitAdj’ and ‘inItemTraitAdj’ to fit these models. For the Trait by Trait adjacency matrix,

##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1

The one’s in the off-diagonal indicate that latent variables are to be conditionally correlated. If we don’t want for example \(\theta_1\) and \(\theta_2\) to be correlated, a 0 should be put in the (1,2) and (2,1) cells of the matrix.

For the Item by Trait adjacency matrix,

##       [,1] [,2] [,3]
##  [1,]    1    0    0
##  [2,]    1    0    0
##  [3,]    1    0    0
##  [4,]    0    1    0
##  [5,]    0    1    0
##  [6,]    0    1    0
##  [7,]    0    0    1
##  [8,]    0    0    1
##  [9,]    0    0    1

The commands to fit the models are the same

## Basic set up is complete
## Basic set up is complete
## Basic set up is complete

The same evaluation and post fitting functions can be used. The one change is that iteration plots of phis (and lambdas) from iterations can graphed as well as of item statistics using

iterationPlot(n3)

Further more,

##       [,1]                                                      
##  [1,]                                                           
##  [2,] ========================================================= 
##  [3,] Pseudo-likelihood Estimation of nominal model             
##  [4,] ========================================================= 
##  [5,] Report Date:  2021-10-01 12:04:21                         
##  [6,]                                                           
##  [7,] Data Information:                                         
##  [8,]    Number of cases/individuals  250                       
##  [9,]    Number of items  9                                     
## [10,]    Number of categories per item 4                        
## [11,]    Number of dimensions:  3                               
## [12,]                                                           
## [13,] Model Specification:                                      
## [14,]   Number of unique parameters 57                          
## [15,]   Number of unique marginal effects:  27                  
## [16,]   Number of unique category parameters (nu's or a's):  27 
## [17,]   Number of unique association parameters (phis): 3       
## [18,]                                                           
## [19,] Convergence Information:                                  
## [20,]   Number of iterations:  21                               
## [21,]   Tolerence set tol 1e-06                                 
## [22,]   Criterion  5.85589077672921e-07                         
## [23,]                                                           
## [24,] Model Fit Statistics:                                     
## [25,]   Maximum log pseudo-likelihood function: -2327.6387229916
## [26,]   AIC:   2270.6387229916                                  
## [27,]   BIC:   4340.55417366505                                 
## [28,]

Note that for the multi-dimensional model, the object ‘phi.mlogit’ is no longer NULL for the GPCM and Nominal models. Stacked regression are required to get estimates of the \(\sigma_{mm'}\) parameters. Unlike the independence and Rasch models, the GPCM and Nominal models iteratively estimates the \(\sigma_{mm'}\) parameters.

The matrix of association parameters (conditional correlation matrix) is in the object

##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.1754634 0.2977881
## [2,] 0.1754634 1.0000000 0.1934626
## [3,] 0.2977881 0.1934626 1.0000000

Example: 42 items, N=1000, M=3

The same basic set-up is needed regardless of the number of items. For models fit to the full dass data set we need

The ‘ple.lma’ and all other functions work in the same manner as shown in our small example.

Computational Time

Larger numbers of items, more categories, more cases, and more complex models all increase the computational time. In the table below are the elapsed times in seconds for models fit to the dass data on a PC (Intel i7, CPU 290 GHZ and 160.GB RAM) using ple.lma defaults. The default basic linear algebra library system in R is not very fast; however, there are alternatives, including OpenBlAS, ATLAS, and the Intel MKL library. The elapsed times reported below were run using the Intel MKL BLAS, which is implemented in Open R (free from Microsoft).

Cases (N) Items (I) Dimensions (M) Independence Rasch GPCM Nominal
250 9 1 0.20 0.22 4.20 4.31
250 9 3 0.20 0.27 13.04 13.33
250 42 1 1.01 1.15 16.75 17.44
250 42 3 2.24 2.51 59.11 59.10
1000 9 1 1.08 1.23 17.34 17.94
1000 9 3 1.04 1.31 47.34 47.68
1000 42 1 15.96 16.68 242.21 243.55
1000 42 3 12.75 16.83 435.51 465.47

For most models and conditions, the elapsed times ranged from a fraction of a second to about 8 minutes. The times for the GPCM and Nominal models can be further sped up by taking measures that reduce the number of iterations. One method to do this is to use the estimated `Phi.mat’ from the Rasch model as starting values in the GPCM model. Subsequently, the parameter estimates from the GPCM model (i.e., \(\hat{a}_{im}x_j\) and \(\hat{\phi}_{mm'}\)) can be used as starting values in the Nominal model. A second method to reduce the computational time is to increase tolerance. Examining the convergence statistics and iterations plots for each parameter shows that the parameters estimates essentially do not change after 5 to 6 iterations; therefore, decreasing tolerence (tol) is acceptable. If more iterations are needed or desired, the current parameters estimates can be used as starting values. Using smart starting values and setting \(\mbox{tol}=1e-03\), the elapsed time for \(N=1000\), \(I=42\) and \(M=3\) GPCM went from \(435.51\) to \(289.44\) (i.e., \(7.3\) to \(4.8\) minutes) to and the time for the Nominal model dropped from \(465.47\) to \(302.29\) seconds (i.e., \(7.8\) to \(5.0\) minutes).

Example: Dichotomous Items

All of the models (independence, rasch, gpcm and nominal) can be fit to data where the categorical variables only have two categories. For uni-dimensional models and dichotomous data, the ‘rasch’ model type fits the one parameter logistic (’1pl"). The one parameter for each item is \(\lambda_{i1}\).

For uni-dimensional GPCM and Nominal models, the LMA corresponds to a two-parameter logistic model (‘2pl’). The two parameters are \(\lambda_{i1}\) and \(\nu_{i1}\) (or \(a_{i}\)) for each item. To illustrate the equivalency, we use the vocabulary data and fit the 2pl model as a GPCM and as a Nominal model.

data(vocab)
inItemTraitAdj <- matrix(1, nrow=ncol(vocab), ncol=1)
inTraitAdj <- matrix(1, nrow=1, ncol=1)
#--- 2 pl as a gpcm model
  g.2pl <- ple.lma(inData=vocab, model.type="gpcm", inItemTraitAdj, inTraitAdj, tol=1e-04)
## Basic set up is complete
#--- 2 pl as a nominal model
  n.2pl <- ple.lma(inData=vocab, model.type="nominal", inItemTraitAdj, inTraitAdj, tol=1e-04)
## Basic set up is complete

The two models have the same fit (i.e., the mlpl for both models equals -5917.6196), and the estimated parameter are also the same

g.2pl$estimates
##         loglike    lambda1    lambda2         a         x1        x2
## wordA -622.3013 -0.4787343  0.4787343 0.3127588 -0.7071068 0.7071068
## wordB -360.5954 -0.8639658  0.8639658 0.8740204 -0.7071068 0.7071068
## wordC -636.7369  1.0243899 -1.0243899 0.3076139 -0.7071068 0.7071068
## wordD -176.6813 -1.6742178  1.6742178 0.8598007 -0.7071068 0.7071068
## wordE -538.4564 -0.3814110  0.3814110 0.6742132 -0.7071068 0.7071068
## wordF -596.0831 -0.3351983  0.3351983 0.5315521 -0.7071068 0.7071068
## wordG -791.4167  0.6189210 -0.6189210 0.2288818 -0.7071068 0.7071068
## wordH -808.5806  0.6219073 -0.6219073 0.3400109 -0.7071068 0.7071068
## wordI -707.4387 -0.3105611  0.3105611 0.3006977 -0.7071068 0.7071068
## wordJ -679.3292  1.4262678 -1.4262678 0.7768449 -0.7071068 0.7071068
n.2pl$estimates
##         loglike    lambda1    lambda2        nu1       nu2
## wordA -622.3013 -0.4787343  0.4787343 -0.2211538 0.2211538
## wordB -360.5954 -0.8639658  0.8639658 -0.6180257 0.6180257
## wordC -636.7369  1.0243899 -1.0243899 -0.2175159 0.2175159
## wordD -176.6813 -1.6742178  1.6742178 -0.6079709 0.6079709
## wordE -538.4564 -0.3814110  0.3814110 -0.4767407 0.4767407
## wordF -596.0831 -0.3351983  0.3351983 -0.3758641 0.3758641
## wordG -791.4167  0.6189210 -0.6189210 -0.1618439 0.1618439
## wordH -808.5806  0.6219073 -0.6219073 -0.2404240 0.2404240
## wordI -707.4387 -0.3105611  0.3105611 -0.2126254 0.2126254
## wordJ -679.3292  1.4262678 -1.4262678 -0.5493123 0.5493123

The \(\hat{\lambda}_{ij}\)’s are identical. To show the equivalence for the representation of the interaction, the GPCM category scores \(x_j\) need to be multiplied by the weights \(\hat{a}_{i}\); that is,

g.2pl$estimates[, 4] * g.2pl$estimates[, 5:6]
##               x1        x2
## wordA -0.2211538 0.2211538
## wordB -0.6180257 0.6180257
## wordC -0.2175159 0.2175159
## wordD -0.6079709 0.6079709
## wordE -0.4767407 0.4767407
## wordF -0.3758641 0.3758641
## wordG -0.1618439 0.1618439
## wordH -0.2404240 0.2404240
## wordI -0.2126254 0.2126254
## wordJ -0.5493123 0.5493123

which are equal to the \(\hat{\nu}_{ij}\)’s from the Nominal model.

Other Uses and Future Releases

All the functions used by ‘pleLMA’ are available as source. The algorithm is modular and can be “canabilzied” for specific uses or alternative models. For example, in a replication study, the problem can be set up using the ‘set.up’ function once, and then use the “fit.rasch”, “fit.gpcm” or “fit.nominal”. In a replication, only the response vector in the Master data frame needs to be changed (i.e., do not have to re-create the master data frame) so a loop would go around the function that fits the model. This can be sped up even further by pulling the code out of the functions and only including what is absolutely necessary (and use parameter estimates from the previous model fit). This same strategy can be used to perform jackknife or bootstrap to get standard errors for parameters. Alternatively, functions can be pulled and modified to allow some items to be fit by say a GPCM and others by the Nominal model.

In future versions, options for fitting different models to items will be added, along with more complex latent structures, multiple methods for estimating standard errors, dealing with different numbers of categories per item, and the ability to include collateral information. Even though all of these variations are planned, the current version of the pleLMA package opens up more wide spread use of association models for categorical data.

Appendix

All of the objects potentially created by fitting a model to data are listed and described in the following table. Whether an object is NULL depends on the version of the pseudo-likelihood alogirthm that is used, which is determined by the model that was specified. Objects produced by versions of the algorithm are indicated by a check mark and objects that are null are blanks.

Object Description Algorithm I Algorithm II Algorithm III
model.type model type fit to data
TraitByTrait trait \(\times\) trait adjacency matrix
ItemByTrait item \(\times\) trait adjacency
item.by.trait vector indicating the trait that items are directly related to
ItemNames names of items used in the data set
PhiNames names of association parametr (i.e., \(\sigma_{mm'}\)s) in stacked data and formula
formula.item formula for item regressions
formula.phi formula for stacked regressions
npersons number of individuals or cases
nitems number of items
ncat number of categories per item
nless ncat - 1 = number of non-redundant \(\hat{\lambda}_{ij}\) and \(\hat{\nu}_{ijm}\) (or \(\hat{a}_{im}\)) per item
Maxnphi max number of \(\sigma_{mm'}\)s estimated ("phi\(\sigma\))
ntraits number of unobserved traits
starting.sv starting category scale value for Nominal or fixed scores for rasch and gpcm models
tol convergence criterion (default \(=1e-06\))
criterion max absolute difference between items’ log likelihoods on last 2 iterations
item.log log file of \(\hat{\nu}_{ijm}\)s (or \(a_{im}\)) and \(\hat{\lambda}_{ij}\)
phi.log log file of \(\hat{\phi}_{mm'}\) and \(\hat{\lambda}_{ij}\)
estimates item by estimated item parameters and log(Likelihood)
Phi.mat estimated \(\mathbf{\Sigma}\)s (matrix of association parameters)
item.mlogit list of ‘mlogit’ output from item regressions after convergence
phi.mlogit ‘mlogit’ output for \(\sigma_{mm'}\)s from stacked regression after convergence
mlpl.item value of the maximum log pseudo-likelihood function from item regressions
mlpl.phi value of the maximum log pseudo-likelihood function from stacked regression
AIC Akaike information criteria (smaller is better)
BIC Bayesian information criteria (smaller is better)

References

Anderson, Carolyn J., Maria Kateri, and Irini Moustaki. 2021. “Log-Linear and Log-Multiplicative Association Models for Categorical Data.” Manuscript.

Anderson, Carolyn J., Zhusan Li, and Jeoren J. K. Vermunt. 2007. “Estimation of Models in a Rasch Family for Polytomous Items and Multiple Latent Variables.” Journal of Statistical Software. https://doi.org/10.18637/jss.v020.i06.

Anderson, Carolyn J., Jay V. Verkuilen, and Buddy Peyton. 2010. “Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models.” Journal of Educational and Behavioral Statistics 35: 422–52. https://doi.org/10.3102/1076998609353117.

Anderson, Carolyn J., and Jeroen J. K. Vermunt. 2000. “Log-Multiplicative Association Models as Latent Variable Models for Nominal and/or Ordinal Data.” Sociological Methodology 30: 81–121. https://doi.org/10.1111/0081-1750.00076.

Anderson, Carolyn J., and Hsiu-Ting Yu. 2007. “Log-Multiplicative Association Models as Item Response Models.” Psychometrika 72: 5–23. https://doi.org/10.1007/s11336-005-1419-2.

Arnold, Barry C., and David Straus. 1991. “Pseudolikelihood Estimation: Some Examples.” The Indian Journal of Statistics 53: 233–43.

Becker, Mark. 1989. “On the Bivariate Normal Distribution and Association Models for Ordinal Categorical Data.” Statistics & Probability Letters 8: 435–40. https://doi.org/10.1016/0167-7152(89)90023-0.

Bouchet-Valat, Milan, Heather Turner, Michael Friendly, Jim Lemon, and Cabor Csardi. 2020. Package “Logmult”. https://github.com/nalimilan/logmult.

Chen, Yunxio, Xiaoou Li, Jingchen Liu, and Zhiliang Ying. 2018. “Robust Measurement via a Fused Latent Variable and Graphical Item Response Theory Model.” Psychometrika 85: 538–62. https://doi.org/10.1007/s11336-018-9610-4.

Croissant, Yves. 2020. “Estimation of Random Utility Models in R: The mlogit Package.” Journal of Statistical Software 95 (11): 1–41. https://doi.org/10.18637/jss.v095.i11.

Geys, Helena, Geert Molenberghs, and Louise M. Ryan. 1999. “Pseudolikelihood Modeling in Multivariate Outcomes in Developmental Toxicology.” Journal of the American Statistical Association 94: 734–45. https://doi.org/10.2307/2669986.

Goodman, Leo L. 1981. “Association Models and the Bivariate Normal for Contingency Tables With Ordered Categories.” Biometrika 68: 347–55. https://doi.org/10.1093/biomet/68.2.347.

Hessen, David J. 2012. “Fitting and Testing Conditional Multinomial Partial Credit Models.” Psychometrika 77: 693–709. https://doi.org/10.1007/s11336-012-9277-1.

Holland, Paul W. 1990. “The Dutch Identity: A New Tool for the Study of Item Response Models.” Psychometrika 55: 5–18. https://doi.org/10.1007/BF02294739.

Kruis, Joost, and Gunter Maris. 2016. “Three Representations of the Ising Model.” Scientific Reports 6: 1–10. https://doi.org/10.1038/srep34175.

Marsman, M., D. Borsboom, J. Kruis, S. Epskamp, R. van Bork, L. J. Waldorp, H. L.J. van der Maas, and G. Maris. 2018. “An Introduction to Network Psychometrics: Relating Ising Network Models to Item Response Theory Models.” Multivariate Behavioral Research 53: 15–35. https://doi.org/10.1080/00273171.2017.1379379.

Paek, Youngshil. 2016. “Pseudo-Likelihood Estimation of Multidimensional Item Response Theory Model.” PhD thesis, University of Illinois, Urbana-Champaign.

Paek, Youngshil, and Carolyn J. Anderson. 2017. “Pseudo-Likelihood Estimation of Multidimensional Response Models: Polytomous and Dichotomous Items.” In Quantitative Psychology — the 81st Annual Meeting of the Psychometric Society, edited by Andries van der Ark, Marie Wiberg, Steven A. Culpepper, Jeffrey A. Douglas, and Wen-Chung Wang, 21–30. NYC: Springer. https://doi.org/10.1007/978-3-319-56294-0_3.

Rom, Dror, and Sanat K. Sarkar. 1990. “Approximating Probability Integrals of Multivariate Normal Using Association Models.” Journal of Statistical Computation and Simulation 35 (1-2): 109–19. https://doi.org/10.1080/00949659008811237.

Rooij, Mark de. 2007. “The Analysis of Change, Netwon’s Law of Gravity and Association Models.” Journal of the Royal Statistical Society: Statistics in Society, Series A 171: 137–57. https://doi.org/10.1111/j.1467-985X.2007.00498.x.

———. 2009. “Ideal Point Discriminant Analysis Revisited with a Special Emphasis on Visualization.” Psychometrika 74: 317–30. https://doi.org/10.1007/S11336-008-9105-9.

Rooij, Mark de, and Willem Heiser. 2005. “Graphical Representations and Odds Ratios in a Distance-Association Model for the Analysis of Cross-Classified Data.” Psychometrika 70: 99–122. https://doi.org/10.1007/s11336-000-0848-1.

Turner, Heather, and David Firth. 2020. Generalized Nonlinear Models in R: An Overview of the Gnm Package. https://cran.r-project.org/package=gnm.

Wang, Yuchung J. 1987. “The Probability Intergrals of Bivariate Normal Distributions: A Contingency Table Approach.” Biometrika 74: 185–90. https://doi.org/10.1093/biomet/74.1.185.

———. 1997. “Multivariate Normal Integrals and Contingency Tables with Ordered Categories.” Psychometrika 62: 267–84. https://doi.org/10.1007/BF02295280.