Journal of Applied Measurement

P.O. Box 1283

Maple Grove, MN 55311

 


Abstracts for Volume 8, 2007

Abstracts for Volume 9, 2008

Abstracts for Volume 10, 2009

Abstracts for Volume 11, 2010

Abstracts for Volume 12, 2011

Abstracts for Volume 13, 2012

Current Volume Article Abstracts

 

Vol. 14, No. 1 Spring 2013

A Bootstrap Approach to Evaluating Person and Item Fit to the Rasch Model

Edward W. Wolfe

Abstract

Historically, rule-of-thumb critical values have been employed for interpreting fit statistics that depict anomalous person and item response patterns in applications of the Rasch model. Unfortunately, prior research has shown that these values are not appropriate in many contexts. This article introduces a bootstrap procedure for identifying reasonable critical values for Rasch fit statistics and compares the results of that procedure to applications of rule-of-thumb critical values for three example datasets. The results indicate that rule-of-thumb values may over- or under-identify the number of misfitting items or persons.

****

Using The Rasch Measurement Model to Design a Report Writing Assessment Instrument

Wayne R. Carlson

Abstract

This paper describes how the Rasch measurement model was used to develop an assessment instrument designed to measure student ability to write law enforcement incident and investigative reports. The ability to write reports is a requirement of all law enforcement recruits in the state of Michigan and is a part of the state’s mandatory basic training curriculum, which is promulgated by the Michigan Commission on Law Enforcement Standards (MCOLES). Recently, MCOLES conducted research to modernize its training and testing in the area of report writing. A structured validation process was used, which included: a) an examination of the job tasks of a patrol officer, b) input from content experts, c) a review of the professional research, and d) the creation of an instrument to measure student competency. The Rasch model addressed several measurement principles that were central to construct validity, which were particularly useful for assessing student performances. Based on the results of the report writing validation project, the state established a legitimate connectivity between the report writing standard and the essential job functions of a patrol officer in Michigan. The project also produced an authentic instrument for measuring minimum levels of report writing competency, which generated results that are valid for inferences of student ability. Ultimately, the state of Michigan must ensure the safety of its citizens by licensing only those patrol officers who possess a minimum level of core competency. Maintaining the validity and reliability of both the training and testing processes can ensure that the system for producing such candidates functions as intended.

****

Using Multidimensional Rasch to Enhance Measurement Precision: Initial Results from Simulation and Empirical Studies

Magdalena Mo Ching Mok and Kun Xu

Abstract

This study aimed to explore the effect on measurement precision of multidimensional, as compared with unidimensional, Rasch measurement for constructing measures from multidimensional Likert-type scales. Many educational and psychological tests are multidimensional but common practice is to ignore correlations among the latent traits in these multidimensional scales in the measurement process. These practices may have serious validity and reliability implications. This study made use of both empirical data from 208,083 students, and simulated data simulated by 24 systematic combinations, each replicated 1000 times, of three conditions, namely, sample size, degree of dimensionality, and scale length to compare unidimensional and multidimensional approaches and to identify effects of sample size, dimensionality and scale length on measurement precision. Results showed that the multidimensional Rasch approach yielded more precise estimates than did unidimensional approach if the two dimensions were strongly correlated. The effect was more pronounced for long scales.

****

Using the Dichotomous Rasch Model to Analyze Polytomous Items

Qingping He and Chris Wheadon

Abstract

One of the most important applications of the Rasch measurement models in educational assessment is the equating of tests. An important feature of attainment tests is the use of both dichotomous and polytomous items. The partial credit model (PCM) developed by Masters (1982) represents an extension of the dichotomous Rasch model for analysing polytomous item data. The dichotomous Rasch model has been used primarily to analyse dichotomous item data. Whilst the partial credit model can provide detailed information on the performance of individual score categories of polytomous items, it is mathematically more complex to use than the dichotomous Rasch model and can, under certain circumstances, present difficulties in interpreting item measures and in practical applications. This study explores the potential of using the dichotomous Rasch model to analyse polytomous items and equate tests. Results obtained from a simulation study and from analysing the data of a science achievement test indicate that the partial credit model and the dichotomous Rasch model produce similar item and person measures and equivalent cut scores on different test forms.

****

With Hiccups and Bumps: The Development of a Rasch-based Instrument to Measure Elementary Students’ Understanding of the Nature of Science

Shelagh M. Peoples, Laura M. O’Dwyer, Katherine A. Shields, and Yang Wang

Abstract

This research describes the development process, psychometric analyses and part validation study of a theoretically- grounded Rasch-based instrument, the Nature of Science Instrument-Elementary (NOSI-E). The NOSI-E was designed to measure elementary students’ understanding of the Nature of Science (NOS). Evidence is provided for three of the six validity aspects (content, substantive and generalizability) needed to support the construct validity of the NOSI-E. A future article will examine the structural and external validity aspects. Rasch modeling proved especially productive in scale improvement efforts. The instrument, designed for large-scale assessment use, is conceptualized using five construct domains. Data from 741 elementary students were used to pilot the Rasch scale, with continuous improvements made over three successive administrations. The psychometric properties of the NOSI-E instrument are consistent with the basic assumptions of Rasch measurement, namely that the items are well-fitting and invariant. Items from each of the five domains (Empirical, Theory-Laden, Certainty, Inventive, and Socially and Culturally Embedded) are spread along the scale’s continuum and appear to overlap well. Most importantly, the scale seems appropriately calibrated and responsive for elementary school-aged children, the target age group. As a result, the NOSI-E should prove beneficial for science education research. As the United States’ science education reform efforts move toward students’ learning science through engaging in authentic scientific practices (NRC, 2011), it will be important to assess whether this new approach to teaching science is effective. The NOSI-E can be used as one measure of whether this reform effort has an impact.

****

Application of Single-level and Multi-level Rasch Models using the lme4 Package

Iasonas Lamprianou

Abstract

The aim of the article is to illustrate how researchers may use the lme4 package to run multilevel Rasch models. The lme4 package is a popular open-source software and is frequently used by researchers around the world to fit generalized mixed-effects models with crossed or partially crossed random effects. The article starts with a short discussion of the reasons why a researcher might, sometimes, be motivated to use a multi-level Rasch model and presents a practical example using empirical data. The main features of the lme4 package are presented, and finally, the paper presents information about other open-source software that could alternatively be used to fit multi-level Rasch models.

****

Rasch Modeling to Assess Albanian and South African Learners’ Preferences for Real-life Situations to be Used in Mathematics: A Pilot Study

Suela Kacerja, Cyril Julie, and Said Hadjerrouit

Abstract

This paper reports on an investigation on the real-life situations students in grades 8 and 9 in South Africa and Albania prefer to use in Mathematics. The functioning of the instrument used to assess the order of preference learners from both countries have for contextual situations is assessed using Rasch modeling techniques. For both the cohorts, the data fit the Rasch model. The differential item functioning (DIF) analysis rendered 3 items operating differentially for the two cohorts. Explanations for these differences are provided in terms of differences in experiences learners in the two countries have related to some of the contextual situations. Implications for interpretation of international comparative tests are offered, as are the possibilities for the cross-country development of curriculum materials related to contexts that learners prefer to use in Mathematics.

****

 

 

Vol. 14, No. 2 Summer 2013

Adaptive Testing for Psychological Assessment: How Many Items Are Enough To Run an Adaptive Testing Algorithm?

Michaela M. Wagner-Menghin and Geoff N. Masters

Abstract

Although the principles of adaptive testing were established in the psychometric literature many years ago (e.g., Weiss, 1977), and practice of adaptive testing is established in educational assessment, it not yet widespread in psychological assessment. One obstacle to adaptive psychological testing is a lack of clarity about the necessary number of items to run an adaptive algorithm. The study explores the relationship between item bank size, test length and measurement precision. Simulated adaptive test runs (allowing a maximum of 30 items per person) out of an item bank with 10 items per ability level (covering .5 logits, 150 items total) yield a standard error of measurement (SEM) of .47 (.39) after an average of 20 (29) items for 85-93% (64-82%) of the simulated rectangular sample. Expanding the bank to 20 items per level (300 items total) did not improve the algorithm’s performance significantly. With a small item bank (5 items per ability level, 75 items total) it is possible to reach the same SEM as with a conventional test, but with fewer items or a better SEM with the same number of items.

****

DIF Cancellation in the Rasch Model

Adam E. Wyse

Abstract

Differential item functioning (DIF) cancellation occurs when the cumulative effect of an item or set of items exhibiting DIF against one subgroup cancels with other items that exhibit DIF against the comparison group and hence results in non-existent DIF at the test level. This paper investigates DIF cancellation in the context of Rasch measurement. It is shown that this phenomenon is not a property of the Rasch model, but rather, a function of the manner in which item parameters are estimated and the way that DIF impacts these estimates. The conditions under which DIF cancellation would exist when using the Rasch model are suggested and a proof is provided to support this suggestion. Empirical examples are provided to refute prior suggestions that DIF cancellation always exists if the Rasch model is used.

****

Multidimensional Diagnostic Perspective on Academic Achievement Goal Orientation Structure, Using the Rasch Measurement Models

Daeryong Seo, Husein Taherbhai, and Insu Paek

Abstract

This study is designed to investigate a multidimensional structure of academic achievement goal orientations from a diagnostic perspective, using the Rasch measurement models. A data set of Korean students who responded to the Patterns of Adaptive Learning Survey (PALS) was analyzed. Both consecutive unidimensional and multidimensional Rasch measurement models were applied for comparative purposes. Each goal orientation dimension (i.e., the attitude) was standardized and then classified into three categorical levels, i.e., low, middle and high. These categorizations of goal dimensions were used to examine the role of students’ performanceapproach goals on mathematics achievement in relation with the other achievement goals. Results indicate that the multidimensional partial credit model was the best model with respect to the fit of the data to the models. Findings of the current study also demonstrate that practitioners who need specific feedback for instruction and/ or intervention can benefit from the multidimensional approach.

****

An Extension of a Bayesian Approach to Detect Differential Item Functioning

Sandip Sinharay

Abstract

The application of the existing test statistics to determine differential item functioning (DIF) requires large samples, but test administrators often face the challenge of detecting DIF with small samples. One advantage of a Bayesian approach over a frequentist approach is that the former can incorporate, in the form of a prior distribution, existing information on the inference problem at hand. Sinharay, Dorans, Grant, and Blew (2009) suggested the use of information from past data sets as a prior distribution in a Bayesian DIF analysis. This paper suggests an extension of the method of Sinharay et al. (2009). The suggested extension is compared to the existing DIF detection methods in a realistic simulation study.

****

The Development of the de Morton Mobility Index (DEMMI) in an Older Acute Medical Population: Item Reduction using the Rasch Model (Part 1)

Natalie A. de Morton, Megan Davidson, and Jennifer L. Keating

Abstract

The DEMMI (de Morton Mobility Index) is a new and advanced instrument for measuring the mobility of all older adults across clinical settings. It overcomes practical and clinimetric limitations of existing mobility instruments. This study reports the process of item reduction using the Rasch model in the development of the DEMMI. Prior to this study, qualitative methods were employed to generate a pool of 51 items for potential inclusion in the DEMMI. The aim of this study was to reduce the item set to a unidimensional subset of items that ranged across the mobility spectrum from bed bound to high levels of independent mobility. Fifty-one physical performance mobility items were tested in a sample of older acute medical patients. A total of 215 mobility assessments were performed. Seventeen mobility items that spanned the mobility spectrum were selected for inclusion in the new instrument. The 17 item scale fitted the Rasch model. Items operated consistently across the mobility spectrum regardless of patient age, gender, cognition, primary language or time of administration during hospitalisation. Using the Rasch model, an interval level scoring system was developed with a score range of 0 to 100.

****

A Comparison of Confirmatory Factor Analysis and Multidimensional Rasch Models to Investigate the Dimensionality of Test-Taking Motivation

Christine E. DeMars

Abstract

Using a scale of test-taking motivation designed to have multiple factors, results are compared from a confirmatory factor analysis (CFA) using LISREL and a multidimensional Rasch partial credit model using ConQuest. Both types of analyses work with latent factors and allow the comparison of nested models. CFA models most typically model a linear relationship between observed and latent variables, while Rasch models specify a non-linear relationship between observed and latent variables. The CFA software provides many more measures of overall fit than ConQuest, which is focused more on the fit of individual items. Despite the conceptual differences in these techniques, the results were similar. The data fit a three-dimensional model better than the one-dimensional or two-dimensional models also hypothesized, although some misfit remained.

****

Measuring Alternative Learning Outcomes: Dispositions to Study in Higher Education

Maria Pampaka, Julian Williams, Graeme Hutcheson, Laura Black, Pauline Davis, Paul Hernandez-Martinez, and Geoff Wake

Abstract

In this paper we describe the validation of two scales constructed to measure pre-university students’ changing disposition (i) to enter Higher Education (HE) and (ii) to further study mathematically-demanding subjects. Items were selected drawing on interview data, and on a model of disposition as socially- as well as self- attributed. Rasch analyses showed that the two scales each produce robust one-dimensional measures on what we call a ‘strength of commitment to enter HE’ and ‘disposition to study mathematically-demanding subjects further’ respectively. However, the former scale was initially found to suffer psychometrically from a ceiling effect, which we ‘corrected’ by adding some harder items at a later data point, and revised the scale according to our interpretation of subsequent results. We finally discuss the potential significance of the constructed measures of learning outcomes, as variables in monitoring or even explaining students’ progress into different subjects in HE.

****

 

Home