Journal of Applied Measurement

P.O. Box 1283

Maple Grove, MN 55311


Article abstracts for Volumes 1 to 7 are available in pdf format. Just click on the link below.

Abstracts for Volume 1, 2000

Abstracts for Volume 2, 2001

Abstracts for Volume 3, 2002

Abstracts for Volume 4, 2003

Abstracts for Volume 5, 2004

Abstracts for Volume 6, 2005

Abstracts for Volume 7, 2006

Article abstracts for Volumes 8 to 14 are available in html format. Just click on the link below.

Abstracts for Volume 8, 2007

Abstracts for Volume 9, 2008

Abstracts for Volume 10, 2009

Abstracts for Volume 11, 2010

Abstracts for Volume 12, 2011

Abstracts for Volume 13, 2012

Abstracts for Volume 14, 2013

Abstracts for Volume 15, 2014

Abstracts for Volume 16, 2015


Current Volume Article Abstracts


Vol. 17, No. 1 Spring 2016

Assessing the Validity of a Continuum-of-care Survey: A Rasch Measurement Approach

Michael Peabody, Kelly D. Bradley, and Melba Custer


Satisfied patients are more likely to be compliant, have better outcomes, and are more likely to return to the same provider or institution for future care. The Satisfaction with a Continuum of Care survey (SCC) was designed to improve patient care using measures of patient satisfaction and facilitate a cultural shift from a “silos-ofcare” to a “continuum-of-care” mentality by fostering inter-departmental communication as patients moved between environments of care at a Midwestern rehabilitation hospital. This study provides a Rasch measurement framework for investigating issues related to survey reliability and validity. The results indicate that although certain aspects of the survey seem to function in a psychometrically sound manner, the questions are too easy to endorse and provide little information to help improve patient care. Suggestions for future revisions to this survey instrument are provided.


What You Don’t Know Can Hurt You: Missing Data and Partial Credit Model Estimates

Sarah L. Thomas, Karen M. Schmidt, Monica K. Erbacher, and Cindy S. Bergeman


The authors investigated the effect of missing completely at random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates.


Rasch Measurement of Collaborative Problem Solving in an Online Environment

Susan-Marie E. Harding and Patrick E. Griffin


This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.


The Impact of Item Parameter Drift in Computer Adaptive Testing (CAT)

Nicole Risk


This study looked at numerous aspects of item parameter drift (IPD) and its impact on measurement in computer adaptive testing (CAT). A series of CAT simulations were conducted, varying the amount and magnitude of IPD, as well as the size of the item pool. The effects of IPD on measurement precision, classification, and test efficiency, were evaluated using a number of criteria. These included bias, root mean square error (RMSE), absolute average difference (AAD), total percentages of misclassifcation, the number of false positives and false negatives, the total test lengths, and item exposure rates. The results revealed negligible differences when comparing the IPD conditions to the baseline condition for all measures of precision, classification accuracy, and test efficiency. The most relevant finding indicates that magnitude of drift has a larger impact on measurement precision than the number of items with drift.


Exploring the Utility of Logistic Mixed Modeling Approaches to Simultaneously Investigate Item and Testlet DIF on Testlet-based Data

Hirotaka Fukuhara and Insu Paek


This study explored the utility of logistic mixed models for the analysis of differential item functioning when item response data were testlet-based. Decomposition of differential item functioning (DIF) into item level and testlet level for the testlet-based data was introduced to separate possible sources of DIF: (1) an item, (2) a testlet, and (3) both the item and the testlet. Simulation study was conducted to investigate the performance of several logistic mixed models as well as the Mantel-Haenszel method under the conditions, in which the item-related DIF and testlet-related DIF were present simultaneously. The results revealed that a new DIF model based on a logistic mixed model with random item effects and item covariates could capture the item-related DIF and testlet-related DIF well under certain conditions.


What Are You Measuring? Dimensionality and Reliability Analysis of Ability and Speed in Medical School Didactic Examinations

James J. Thompson


Summative didactic evaluation often involves multiple choice questions which are then aggregated into exam scores, course scores, and cumulative grade point averages. To be valid, each of these levels should have some relationship to the topic tested (dimensionality) and be sufficiently reproducible between persons (reliability) to justify student ranking. Evaluation of dimensionality is difficult and is complicated by the classic observation that didactic performance involves a generalized component (g) in addition to subtest specific factors. In this work, 183 students were analyzed over two academic years in 13 courses with 44 exams and 3352 questions for both accuracy and speed. Reliability at all levels was good (>0.95). Assessed by bifactor analysis, g effects dominated most levels resulting in essential unidimensionality. Effect sizes on predicted accuracy and speed due to nesting in exams and courses was small. There was little relationship between person ability and person speed. Thus, the hierarchical grading system appears warrented because of its g-dependence.


Applying the Rasch Model to Measure Mobility of Women: A Comparative Analysis of Mobility of Informal Workers in Fisheries in Kerala, India

Nikhila Menon


Mobility or ‘freedom and ability to move’ is gendered in many cultural contexts. In this paper I analyse mobility associated with work from the capability approach perspective of Sen. This is an empirical paper which uses the Rasch Rating Scale Model (RSM) to construct the measure of mobility of women for the first time in the development studies discourse. I construct a measure of mobility (latent trait) of women workers engaged in two types of informal work, namely, peeling work and fish vending, in fisheries in the cultural context of India. The scale measure enables first, to test the unidimensionality of my construct of mobility of women and second, to analyse the domains of mobility of women workers. The comparative analysis of the scale of permissibility of mobility constructed using the RSM for the informal women workers shows that women face constraints on mobility in social and personal spaces in the socially advanced state of Kerala in India. Work mobility does not expand the real freedoms, hence work mobility can be termed as ‘bounded capability’ which is a capability ‘limited or bounded’ by either the social, cultural and gender norms or a combination of all of these. Therefore at the macro level, growth in informal employment in sectors like fisheries which improve mobility of women through work mobility does not necessarily expand the capability sets by contributing to greater freedoms and transformational mobility. This paper has a significant methodological contribution in that it uses an innovative method for the measurement of mobility of women in the development studies discipline.