Journal of Applied Measurement

P.O. Box 1283

Maple Grove, MN 55311


Article abstracts for Volumes 1 to 7 are available in pdf format. Just click on the link below.

Abstracts for Volume 1, 2000

Abstracts for Volume 2, 2001

Abstracts for Volume 3, 2002

Abstracts for Volume 4, 2003

Abstracts for Volume 5, 2004

Abstracts for Volume 6, 2005

Abstracts for Volume 7, 2006

Article abstracts for Volumes 8 to 18 are available in html format. Just click on the link below.

Abstracts for Volume 8, 2007

Abstracts for Volume 9, 2008

Abstracts for Volume 10, 2009

Abstracts for Volume 11, 2010

Abstracts for Volume 12, 2011

Abstracts for Volume 13, 2012

Abstracts for Volume 14, 2013

Abstracts for Volume 15, 2014

Abstracts for Volume 16, 2015

Abstracts for Volume 17, 2016

Abstracts for Volume 18, 2017


Current Volume Article Abstracts


Vol. 19, No. 1, Spring 2018

The Impact of Missing Values and Single Imputation upon Rasch Analysis Outcomes: A Simulation Study

Carolina Saskia Fellinghauer, Birgit Prodinger, and Alan Tennant


Imputation becomes common practice through availability of easy-to-use algorithms and software. This study aims to determine if different imputation strategies are robust to the extent and type of missingness, local item dependencies (LID), differential item functioning (DIF), and misfit when doing a Rasch analysis. Four samples were simulated and represented a sample with good metric properties, a sample with LID, a sample with DIF, and a sample with LID and DIF. Missing values were generated with increasing proportion and were either missing at random or completely at random. Four imputation techniques were applied before Rasch analysis and deviation of the results and the quality of fit compared. Imputation strategies showed good performance with <15% of missingness. The analysis with missing values performed best in recovering statistical estimates. The best strategy, when doing a Rasch analysis, is the analysis with missing values. If for some reason imputation is necessary, we recommend using the expectation-maximization algorithm.


Methods for the Comparison of Differential Item Functioning across Assessments

W. Holmes Finch, Maria Hernández Finch, Brian F. French, David E. McIntosh, and Lauren Moss


An important aspect of the educational and psychological evaluation of individuals is the selection of scales with appropriate evidence of reliability and validity for inferences and uses of the scores for the population of interest. One key aspect of validity is the degree to which a scale fairly assesses the construct(s) of interest for members of different subgroups within the population. Typically, this issue is addressed statistically through assessment of differential item functioning (DIF) of individual items, or differential test functioning (DTF) of sets of items within the same measure. When selecting an assessment to use for a given application (e.g., measuring intelligence), or which form of an assessment to use for a test administration, researchers need to consider the extent to which the scales work with all members of the population. Little research has examined methods for comparing the amount or magnitude of DIF/DTF present in two or more assessments when deciding which assessment to use. The current study made use of 7 different statistics for this purpose, in the context of intelligence testing. Results demonstrate that by using a variety of effect sizes, the researcher can gain insights into not only which scales may contain the least amount of DTF, but also how they differ with regard to the way in which the DTF manifests itself.


Equating Errors and Scale Drift in Linked-Chain IRT Equating with Mixed-Format Tests

Bo Hu


In linked-chain equating, equating errors may accumulate and cause scale drift. This simulation study extends the investigation on scale drift in linked-chain equating to mixed-format test. Specifically, the impact of equating method and the characteristics of anchor test and equating chain on equating errors and scale drift in IRT true score equating is examined. To evaluate equating results, a new method is used to derive true linking coefficients. The results indicate that the characteristic curve methods produce more accurate and reliable equating results than the moment methods. Although using more anchor items or an anchor test configuration with more IRT parameters can lower the variability of equating results, neither of them help control equating bias. Additionally, scale drift increases when an equating chain runs longer or poorly calibrated test forms are added to the chain. The role of calibration precision in evaluating equating results is highlighted.


Validation of Response Similarity Analysis for the Detection of Academic Cheating: An Experimental Study

Georgios D. Sideridis and Cengiz Zopluoglu


The purpose of the present study was to evaluate various analytical means to detect academic cheating in an experimental setting. The omega index was compared and contrasted given a gold criterion of academic cheating which entailed a discrepant score between two administrations using an experimental study with real test takers. Participants were 164 elementary school students who were administered a mathematics exam followed by an equivalent mock exam under conditions of strict and relaxed, invigilation, respectively. Discrepant scores were defined as exceeding 7 responses in any direction (correct or incorrect), based on what was expected due to chance. Results indicated that the omega index was successful in capturing more than 39% of the cases who exceeded the conventional ±7 discrepancy criteria. It is concluded that the response similarity analysis may be an important tool in detecting academic cheating.


Rasch Analysis of the Teachers’ Knowledge and Use of Data and Assessment (tKUDA) Measure

Courtney Donovan


Teachers are expected to use data and assessments to drive their instruction. This is accomplished at a classroom level via the assessment process. The teachers Knowledge and Use of Data and Assessment (tKUDA) measure was created to capture teachers’ knowledge and use of this assessment process. This paper explores the measure’s utility using Rasch analysis. Evidence of reliability and validity was seen for both knowledge and use factors. Scale was used as expected and item analyses demonstrates good spread with a few items identified for future revision. Item difficulty and results are connected back to literature. Findings support use of this measure to identify teachers’ knowledge and use of data and assessment in classroom practice.


Psychometric Properties and Differential Item Functioning of a Web-Based Assessment of Children’s Social Perspective-Taking

Beyza Aksu Dunya, Clark McKown, and Everett V. Smith


Social perspective-taking (SPT), which involves the ability infer others’ intentions, is a consequential social cognitive process. The purpose of this study is to evaluate the psychometric properties of a web-based social perspective-taking (SELweb SPT) assessment designed for children in kindergarten through third grade. Data were collected from two separate samples of children. The first sample included 3224 children and the second sample included 4419 children. Data were calibrated using Rasch dichotomous model (Rasch, 1960). Differential item and test functioning were also evaluated across gender and ethnicity groups. Across both samples, we found: evidence of consistent item fit; unidimensional item structure; and adequate item targeting. Poor item targeting at high and low ability levels suggests that more items are needed to distinguish low and high ability respondents. Analyses of DIF found some significant item-level DIF across gender, but no DIF across ethnicity. The analyses of person measure calibrations with and without DIF items evidenced negligible differential test functioning (DTF) across gender and ethnicity groups in both samples.


Assessment of Test Items with Rasch Measurement Model

Patrick U. Osadebe


The study was carried out to assess the difficulty index of each item of an Economics Achievement test with the Rasch model. The infit and outfit as well as the reliability of the test were determined. Three research questions were drawn to guide the study. A sample of 200 was randomly selected using simple random sampling of balloting and proportionate stratified random sampling. The instrument of the study was an Economics Achievement Test with 100 items. The test has face and content validities. It has a reliability coefficient of 0.86 established through the use of Kuder-Richardson 20 method. A Rasch model software of Winsteps version 3.75 was used to analyse the data collected. The result identified the difficulty index of each item. The infit and outfit of both MNSQ and ZSTD were determined. The reliability of the Economics Achievement Test was estimated. It was recommended among others that Rasch model should always be used in assessing the item difficulty of a test and ensure the stability of item parameters.