Journal of Applied Measurement

P.O. Box 1283

Maple Grove, MN 55311


Article abstracts for Volumes 1 to 7 are available in pdf format. Just click on the link below.

Abstracts for Volume 1, 2000

Abstracts for Volume 2, 2001

Abstracts for Volume 3, 2002

Abstracts for Volume 4, 2003

Abstracts for Volume 5, 2004

Abstracts for Volume 6, 2005

Abstracts for Volume 7, 2006

Article abstracts for Volumes 8 to 19 are available in html format. Just click on the link below.

Abstracts for Volume 8, 2007

Abstracts for Volume 9, 2008

Abstracts for Volume 10, 2009

Abstracts for Volume 11, 2010

Abstracts for Volume 12, 2011

Abstracts for Volume 13, 2012

Abstracts for Volume 14, 2013

Abstracts for Volume 15, 2014

Abstracts for Volume 16, 2015

Abstracts for Volume 17, 2016

Abstracts for Volume 18, 2017

Abstracts for Volume 19, 2018

Abstracts for Volume 20, 2019


Current Volume Article Abstracts


Vol. 21, No. 1, Spring 2020

Rasch's Logistic Model Applied to Growth

Mark H. Stone


Rasch's logistic model for growth is explained by reviewing his analysis of piglet growth. An early formulation was given in India when Rasch visited with Rao to describe the statistic metameter as the distinguishing characteristic for determining the rate of growth. Next, several examples are given demonstrating growth with plots of growth over time using characteristic time with truncated data. The results of these growth plots and analyses are summarized given their implications and restraints for using this approach in determining rate of growth.


Psychometric Properties of the General Movement Optimality Score using Rasch Measurement

Vanessa Maziero Barbosa, Everet V. Smith, Arend Bos, Giovanni Cioni,

Andrea Guzzetta, Peter B. Marschik, Jasmin Pansy, Berndt Urlesberger,

Hong Yang, and Christa Einspieler


AIM: To explore the psychometric properties of the general movements optimality score (GMOS) by examining its dimensionality, rating scale functioning, and item hierarchies using Rasch measurement. METHODS: Secondary data analysis of the GMOS data for video-recording of 383 infants with uni-, multidimensional, and mixed Rasch partial credit models. Videos were scored based on the global General Movement Assessment categories, and on the amplitude, speed, spatial range, proximal and distal rotations, onset and offset, tremulous and cramped components of the upper and lower extremities (21 items), resulting in the GMOS. RESULTS: The GMOS data fits best to a unidimensional mixed Rasch model with three different classes of infants, with all but two items contributing to the infants' separation. Rating scales functioned well for 19 items. Item difficulty hierarchies varied depending on infants' class. No floor effect and no substantive gaps between item difficulty estimates were found. CONCLUSION: The GMOS has strong psychometric properties to distinguish infants with different functional motor performance and provides a quantitative measure of quality of movement. INTERPRETATION: The GMOS can be confidently used to assist with early diagnosis, grade motor performance, and provide a solid base to study individual general movement developmental trajectories.


Rasch Analysis of the Burn-Specific Pain Anxiety Scale: Evidence for the Abbreviated Version

A. E. E. de Jong, W. Tuinebreijer, H. W. C. Hofland, and N. E. E. Van Loey


The Burn-Specific Pain Anxiety Scale (BSPAS) estimates pain-related anxiety and determines the effect of treatment in patients with burns, especially regarding wound care. This study aimed to analyze the 9-item and the abbreviated 5-item BSPAS by the Rasch model. This prospective study included 161 patients admitted to Dutch burn centres. The BSPAS was administered during hospital stay resulting in 314 self-reports and was analysed using the Rasch unidimensional measurement model 2030 (RUMM 2030). Unidimensionality of the 9-item and 5-item BSPAS was confirmed. Initially, both versions did not fit the model due to response dependency. After creating subtests, fit to the model improved. After deleting "feeling insecure about my healing" and creating two subtests with three items, fit of the 9-item BSPAS was obtained, while the 5-item BSPAS fitted after creating a subtest with two items. The Rasch model demonstrated that both versions were unidimensional and were able to fit the model after adjusting for response dependency. Moreover, the 5-item BSPAS could be further improved by deleting "worrying about the possible pain." A 4-item abbreviated BSPAS (BSPAS-4I) captures pain-related anxiety and is proposed to be used in future studies and daily practice.


Trade-Offs in the Implementation of Observational Ratings Systems

Stephen M. Ponisciak, Rob Meyer, Anna Brown, and Tracy Schatzberg


A consensus has developed that high-quality teacher evaluation systems require multiple measures. We examine multiple measures from a large urban school district, which has included observational ratings and value-added ratings in its system since 2010. Evaluation systems that do not account for observer severity, classroom context, and other factors may yield different results from systems that do account for these factors. Choosing a simpler system involves a trade-off regarding a system's robustness or defensibility. Using a many-faceted Rasch model, we explore rating components like observer, time of year, and subdomain. We find high reliability of the resulting teacher ratings, some impact of adjusting for observer differences and differences between subdomains, and positive correlation with value-added measures. A comprehensive analysis like MFRM should be part of a district's evaluation system, even if only as a robustness check, and districts should examine how observational scores and classroom context are related.


Alignment of a Language Instrument Scores to CEFR Levels: Methodological and Empirical Considerations

Georgios D. Sideridis, Abdulrahman Al-Samrani, and Bjorn Norrbom


The purpose of the present report was to assess congruence between a language-based national examination (termed English placement test - EPT) and the Common European Framework of Reference for Languages (CEFR) levels. To this end, a series of methodological steps were put forth to accumulate evidence suggesting that language performance based on the EPT instrument can be split onto meaningful subgroups based on theoretical (expert judgement on difficulty level and CEFR correspondence) and empirical considerations (i.e., how well these levels and subgroups emerged). Participants were 2642 high school graduates who took on the EPT instrument as part of their entry criteria to the university and for the purposes of the present study only the structure subscale is presented. Items were classified as reflecting specific CEFR levels and a personbased analysis attempted to classify individuals sharing the same behavioral patterns. Results using a latent class analysis (LCA) indicated that a Pre-A1, an A1 an A2 a B1 and a B2 levels were present with regard to the structure domain of language. Results showed a strong alignment between the EPT structure domain and CEFR guidelines using various methodological approaches.


Validation of Egalitarian Education Questionnaire using Rasch Measurement Model

Nik Muhammad Hanis Nek Rakami, Nik Ahmad Hisham Ismail,

Noor Lide Abu Kassim, and Faizah Idrus


This paper describes the process of assessing the unidimensionality and validity of egalitarian education (EE) items based on the Rasch measurement model. Egalitarian education was measured by a self-developed 5 EE items of Likert-scale format. The process of assessing the validity of EE items involved a collection of data from 400 Malay teachers, who are teaching in government school around peninsular of Malaysia where the measurement of construct validity for the overall EE items were established using Winsteps. Various Rasch measurement tools were utilized to demonstrate the true unidimensionality and validity measure of the EE items and in meeting the needs of the Rasch measurement model. The findings show that the validity and unidimensionality of EE items can be truly established and can satisfy the characteristics of the Rasch measurement model.


Bootstrap Estimate of Bias for Intraclass Correlation

Xiaofeng Steven Liu and Kelvin Terrell Pompey


The estimates of intraclass correlations are known to be biased, but there are few analytical ways to assess the amount of bias. The analytical approach requires the normality assumption to estimate bias. Bootstrap requires no such assumption and can, therefore, be used to estimate bias, regardless of the model assumption. We utilize cluster bootstrapping to calculate the bias in estimating the intraclass correlation. A well-known dataset is provided to illustrate the bias estimation in a typical study design of intraclass correlation, and its implications for other study designs are also discussed.


Measuring Genuine Progress: An Example from the UN Millennium Development Goals (Corrected version)

William P. Fisher, Jr.


Proposals for incorporating information on the quality of human, social, and environmental conditions in more authentic and comprehensive versions of the Gross National Product (GNP) or Gross Domestic Product (GDP) date back to the foundations of econometrics. Typically treated as external to markets, these domains have lately been objects of renewed interest. Calls for accountability and transparency have expanded to include the now topical but previously neglected economic implications of human, social, and natural capital. Clear advantages for the measurement and management of these forms of capital can be drawn from econometric criteria for identifiable models of structurally invariant relationships. The United Nation's Millennium Development Goals (MDG) provide an example application of a probabilistic model for measurement used to evaluate data quality, reduce data volume with no loss of information, estimate linear units of comparison with known uncertainties, express measures from different sets of indicators in a common metric, and frame a meaningful interpretive context. Data on 22 MDG indicators from 64 countries are scored and analyzed. Model fit was reasonable, the item hierarchy tells a meaningful story of structural invariance in economic development, and Cronbach's alpha was 0.93. The measures estimated in this study correlated over 0.90 with independently produced measures of per-capita GDP and life satisfaction. These results provide a positive demonstration of relevant methods applicable in the context of today's Sustainable Development Goals 2030 Agenda.