Journal of Applied Measurement

P.O. Box 1283

Maple Grove, MN 55311

 


Article abstracts for Volumes 1 to 7 are available in pdf format. Just click on the link below.

Abstracts for Volume 1, 2000

Abstracts for Volume 2, 2001

Abstracts for Volume 3, 2002

Abstracts for Volume 4, 2003

Abstracts for Volume 5, 2004

Abstracts for Volume 6, 2005

Abstracts for Volume 7, 2006

Article abstracts for Volumes 8 to 16 are available in html format. Just click on the link below.

Abstracts for Volume 8, 2007

Abstracts for Volume 9, 2008

Abstracts for Volume 10, 2009

Abstracts for Volume 11, 2010

Abstracts for Volume 12, 2011

Abstracts for Volume 13, 2012

Abstracts for Volume 14, 2013

Abstracts for Volume 15, 2014

Abstracts for Volume 16, 2015

 

Current Volume Article Abstracts

 

Vol. 17, No. 1, Spring 2016

Assessing the Validity of a Continuum-of-care Survey: A Rasch Measurement Approach

Michael Peabody, Kelly D. Bradley, and Melba Custer

Abstract

Satisfied patients are more likely to be compliant, have better outcomes, and are more likely to return to the same provider or institution for future care. The Satisfaction with a Continuum of Care survey (SCC) was designed to improve patient care using measures of patient satisfaction and facilitate a cultural shift from a “silos-ofcare” to a “continuum-of-care” mentality by fostering inter-departmental communication as patients moved between environments of care at a Midwestern rehabilitation hospital. This study provides a Rasch measurement framework for investigating issues related to survey reliability and validity. The results indicate that although certain aspects of the survey seem to function in a psychometrically sound manner, the questions are too easy to endorse and provide little information to help improve patient care. Suggestions for future revisions to this survey instrument are provided.

____________________

What You Don’t Know Can Hurt You: Missing Data and Partial Credit Model Estimates

Sarah L. Thomas, Karen M. Schmidt, Monica K. Erbacher, and Cindy S. Bergeman

Abstract

The authors investigated the effect of missing completely at random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates.

____________________

Rasch Measurement of Collaborative Problem Solving in an Online Environment

Susan-Marie E. Harding and Patrick E. Griffin

Abstract

This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.

____________________

The Impact of Item Parameter Drift in Computer Adaptive Testing (CAT)

Nicole Risk

Abstract

This study looked at numerous aspects of item parameter drift (IPD) and its impact on measurement in computer adaptive testing (CAT). A series of CAT simulations were conducted, varying the amount and magnitude of IPD, as well as the size of the item pool. The effects of IPD on measurement precision, classification, and test efficiency, were evaluated using a number of criteria. These included bias, root mean square error (RMSE), absolute average difference (AAD), total percentages of misclassifcation, the number of false positives and false negatives, the total test lengths, and item exposure rates. The results revealed negligible differences when comparing the IPD conditions to the baseline condition for all measures of precision, classification accuracy, and test efficiency. The most relevant finding indicates that magnitude of drift has a larger impact on measurement precision than the number of items with drift.

____________________

Exploring the Utility of Logistic Mixed Modeling Approaches to Simultaneously Investigate Item and Testlet DIF on Testlet-based Data

Hirotaka Fukuhara and Insu Paek

Abstract

This study explored the utility of logistic mixed models for the analysis of differential item functioning when item response data were testlet-based. Decomposition of differential item functioning (DIF) into item level and testlet level for the testlet-based data was introduced to separate possible sources of DIF: (1) an item, (2) a testlet, and (3) both the item and the testlet. Simulation study was conducted to investigate the performance of several logistic mixed models as well as the Mantel-Haenszel method under the conditions, in which the item-related DIF and testlet-related DIF were present simultaneously. The results revealed that a new DIF model based on a logistic mixed model with random item effects and item covariates could capture the item-related DIF and testlet-related DIF well under certain conditions.

____________________

What Are You Measuring? Dimensionality and Reliability Analysis of Ability and Speed in Medical School Didactic Examinations

James J. Thompson

Abstract

Summative didactic evaluation often involves multiple choice questions which are then aggregated into exam scores, course scores, and cumulative grade point averages. To be valid, each of these levels should have some relationship to the topic tested (dimensionality) and be sufficiently reproducible between persons (reliability) to justify student ranking. Evaluation of dimensionality is difficult and is complicated by the classic observation that didactic performance involves a generalized component (g) in addition to subtest specific factors. In this work, 183 students were analyzed over two academic years in 13 courses with 44 exams and 3352 questions for both accuracy and speed. Reliability at all levels was good (>0.95). Assessed by bifactor analysis, g effects dominated most levels resulting in essential unidimensionality. Effect sizes on predicted accuracy and speed due to nesting in exams and courses was small. There was little relationship between person ability and person speed. Thus, the hierarchical grading system appears warrented because of its g-dependence.

____________________

Applying the Rasch Model to Measure Mobility of Women: A Comparative Analysis of Mobility of Informal Workers in Fisheries in Kerala, India

Nikhila Menon

Abstract

Mobility or ‘freedom and ability to move’ is gendered in many cultural contexts. In this paper I analyse mobility associated with work from the capability approach perspective of Sen. This is an empirical paper which uses the Rasch Rating Scale Model (RSM) to construct the measure of mobility of women for the first time in the development studies discourse. I construct a measure of mobility (latent trait) of women workers engaged in two types of informal work, namely, peeling work and fish vending, in fisheries in the cultural context of India. The scale measure enables first, to test the unidimensionality of my construct of mobility of women and second, to analyse the domains of mobility of women workers. The comparative analysis of the scale of permissibility of mobility constructed using the RSM for the informal women workers shows that women face constraints on mobility in social and personal spaces in the socially advanced state of Kerala in India. Work mobility does not expand the real freedoms, hence work mobility can be termed as ‘bounded capability’ which is a capability ‘limited or bounded’ by either the social, cultural and gender norms or a combination of all of these. Therefore at the macro level, growth in informal employment in sectors like fisheries which improve mobility of women through work mobility does not necessarily expand the capability sets by contributing to greater freedoms and transformational mobility. This paper has a significant methodological contribution in that it uses an innovative method for the measurement of mobility of women in the development studies discipline.

____________________

Vol. 17, No. 2, Summer 2016

Creating a Physical Activity Self-Report Form for Youth using Rasch Methodology

Christine DiStefano, Russell Pate, Kerry McIver, Marsha Dowda, Michael Beets, and Dale Murrie

Abstract

Measurement of youth’s physical activity levels is recommended to ensure that children are meeting recommended activity guidelines. This article describes the creation of an instrument to measure youth’s levels of physical activity, where a strong test validation perspective (Benson, 1998) was followed to create the scale. The development process involved a mixed-method (qualitative followed by quantitative) framework. First, focus groups were conducted, where results informed item creation. Next, three alternative forms were created with different response formats to measure childrens’ frequency of participation in various physical activities and intensity of participation. Lastly, a sample of over 500 middle school children was obtained, where three different response scales were investigated. The optimal scale considered measurement of physical activity using a three-point Likert frequency; intensity of activity participation did not strongly contribute to the measurement of children’s activity levels. The final version form is thought to be acceptable for use with children in surveillance and large-group studies, as well as in smaller sample applications.

____________________

Examining the Psychometric Quality of Multiple-Choice Assessment Items using Mokken Scale Analysis

Stefanie A. Wind

Abstract

The concept of invariant measurement is typically associated with Rasch measurement theory (Engelhard, 2013). Concerned with the appropriateness of the parametric transformation upon which the Rasch model is based, Mokken (1971) proposed a nonparametric procedure for evaluating the quality of social science measurement that is theoretically and empirically related to the Rasch model. Mokken’s nonparametric procedure can be used to evaluate the quality of dichotomous and polytomous items in terms of the requirements for invariant measurement. Despite these potential benefits, the use of Mokken scaling to examine the properties of multiplechoice (MC) items in education has not yet been fully explored. A nonparametric approach to evaluating MC items is promising in that this approach facilitates the evaluation of assessments in terms of invariant measurement without imposing potentially inappropriate transformations. Using Rasch-based indices of measurement quality as a frame of reference, data from an eighth-grade physical science assessment are used to illustrate and explore Mokken-based techniques for evaluating the quality of MC items. Implications for research and practice are discussed.

____________________

A Practitioner’s Instrument for Measuring Secondary Mathematics Teachers’ Beliefs Surrounding Learner-Centered Classroom Practice

Alyson E. Lischka and Mary Garner

Abstract

In this paper we present the development and validation of a Mathematics Teaching Pedagogical and Discourse Beliefs Instrument (MTPDBI), a 20 item partial-credit survey designed and analyzed using Rasch measurement theory. Items on the MTPDBI address beliefs about the nature of mathematics, teaching and learning mathematics, and classroom discourse practices. A Rasch partial credit model (Masters, 1982) was estimated from the pilot study data. Results show that item separation reliability is .96 and person separation reliability is .71. Other analyses indicate the instrument is a viable measure of secondary teachers’ beliefs about reform-oriented mathematics teaching and learning. This instrument is proposed as a useful measure of teacher beliefs for those working with pre-service and in-service teacher development.

____________________

Using the Rasch Model and k-Nearest Neighbors Algorithm for Response Classification

Jon-Paul Paolino

Abstract

In this paper we propose using the k-nearest neighbors (k-NN) algorithm (Cover and Hart, 1967) for classifying and predicting the responses to dichotomous items. We show using the percent correct statistic how k-NN can be used with Rasch model parameter estimation methods such as joint maximum likelihood (JMLE), conditional maximum likelihood estimation (CMLE), marginal maximum likelihood estimation (MMLE), and marginal Bayes modal estimation (MBME). We further suggest how one can use the algorithm to predict responses on future assessments. The empirical data set that we used to illustrate this procedure was the fraction subtraction data set from Tatsuoka (1984). Using R software we show the accuracy and efficacy of k-NN for classifying responses.

____________________

Exploring Aberrant Responses using Person Fit and Person Response Functions

A. Adrienne Walker, George Engelhard, Jr., Mari-Wells Hedgpeth, and Kenneth D. Royal

Abstract

Person fit statistics provide equivocal interpretations regarding aberrant responses. This study uses person response functions (PRF) to supplement the interpretation of person fit statistics. Sixty-three multiple-choice items were administered to a sample of persons (N=31) who used guessing strategies to answer them. After answering each item, participants indicated which guessing strategy they used. The data were analyzed with a Rasch (1960) model, where the item calibrations were anchored to values obtained when the items were appropriately administered. The participants showed poor model-data fit as expected. Further examination of person misfit using person response functions suggests that PRF can provide information about absolute person fit to a model, whereas fit statistics provide information about relative fit, given the other persons in the testing group. PRF can also provide information about where and how person responses misfit the model. This additional information can assist practitioners in using and interpreting individual scores appropriately.

____________________

Evaluation of the Bifactor Nominal Response Model Analysis of a Health Efficacy Measure

Zexuan Han and Kathleen Suzanne Johnson Preston

Abstract

The bifactor nominal response item response theory (IRT) model, proposed by Cai, Yang and Hansen (2011), provides an extension of Bock’s (1972, 1997) unidimensional nominal response model to multidimensional IRT. This model has not been utilized in any published studies since its original development. In this study, the model was applied to data from a sample of college students (n = 799) to evaluate the psychometric properties of a health efficacy measure. The nominal response model has the unique capability to estimate the functioning of each single response category, and higher response categories were found to have better functioning in this study. Poor-functioning categories were identified and combined into their adjacent categories. Items with revised response format showed improved functioning. The bifactor nominal response model is a useful tool for evaluation of bifactor scales with ordered while non-equivalently functioning categories.

____________________

Measurement Properties of the Nordic Questionnaire for Psychological and Social Factors at Work: A Rasch Analysis

C. Røe, K. Myhre, G. H. Marchand, B. Lau, G. Leivseth, and E. Bautz-Holter

Abstract

The main aim of this study was to evaluate the measurement properties of the Nordic Questionnaire for Psychological and Social Factors at Work (QPS Nordic) and the domains of demand, control and support. The Rasch analysis (RUMM 2030) was based on responses from 226 subjects with back pain who completed the QPS Nordic dimensions of demand, control, and social support (30 items) at one year follow up. The Rasch analysis revealed disordered thresholds in a total of 25 of the 30 items. The domains of demand, control and support fit the Rasch model when analyzed separately. The demand domain was well targeted, whereas patients with current neck and back pain had lower control and higher support than reflected by the questions. Two items revealed DIF by gender, otherwise invariance to age, gender, occupation and sick-leave was documented. The demand, control support domains of QPS Nordic comprised unidimensional constructs with adequate measurement properties.

____________________

Ben Wright: A wisp of greatness: Brief photographic review of his life and times

Nikolaus Bezruczko

Abstract

With Ben Wright's death in October 2015, the Rasch measurement community lost its most enthuastic and dedicated supporter. In honor of Ben's great contribution to Rasch measurement, Nick Bezreczko was invited to write a memorial article about Ben's life, education, and scholarship.

____________________

Vol. 17, No. 3, Fall 2016

Accounting for Local Dependence with the Rasch Model: The Paradox of Information Increase

David Andrich

Abstract

Test theories imply statistical, local independence. Where local independence is violated, models of modern test theory that account for it have been proposed. One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation between two items in the dichotomous Rasch model, this paper derives three related implications. First, it formalises how the polytomous Rasch model for an item constituted by summing the scores of the dependent items absorbs the dependence in its threshold structure. Second, it shows that as a consequence the unit when the dependence is accounted for is not the same as if the items had no response dependence. Third, it explains the paradox, known, but not explained in the literature, that the greater the dependence of the constituent items the greater the apparent information in the constituted polytomous item when it should provide less information.

____________________

Applying the Many-Facet Rasch Measurement Model to Explore Reviewer Ratings of Conference Proposals

Kelly D. Bradley, Michael R. Peabody, and Richard K. Mensah

Abstract

For academic conferences, when proposals are submit they are often judged using a rating scale on identified criterion by reviewers who have a shared interest and expertise in the area under consideration. Given the multiple and varied reviewers, an analysis of psychometric properties like rater severity and consistency are important. However, many of the problems that plague the conference proposal selection process are the same issues that plague survey research: rater bias/severity, misuse of rating scale, and the use of raw scores as measures. We propose the use of the many-facet Rasch measurement model (MFRM) to combat these shortcomings and improve the quality of the conference proposal selection process. A set of American Educational Research Association (AERA) Special Interest Group (SIG) proposals is used as an example. The results identify proposals that were accepted based on calculating the mean of summed raw scores, but when MFRM is applied to adjust for judge severity the rank order of the proposals is substantially altered.

____________________

Sample Size and Probability Threshold Considerations with the Tailored Data Method

Adam E. Wyse

Abstract

This article discusses sample size and probability threshold considerations in the use of the tailored data method with the Rasch model. In the tailored data method, one performs an initial Rasch analysis and then reanalyzes data after setting item responses to missing that are below a chosen probability threshold. A simple analytical formula is provided that can be used to check whether or not the application of the tailored data method with a chosen probability threshold will create situations in which the number of remaining item responses for the Rasch calibration will or will not meet minimum sample size requirements. The formula is illustrated using a real data example from a medical imaging licensure exam with several different probability thresholds. It is shown that as the probability threshold was increased more item responses were set to missing and the parameter standard errors and item difficulty estimates also tended to increase. It is suggested that some consideration should be given to the chosen probability threshold and how this interacts with potential examinee sample sizes and the accuracy of parameter estimates when calibrating data with the tailored data method.

____________________

Development of an Upper Extremity Function Measurement Model

Ickpyo Hong, Annie N. Simpson, Chih-Ying Li, and Craig A. Velozo

Abstract

This study demonstrated the development of a measurement model for gross upper-extremity function (GUE). The dependent variable was the Rasch calibration of the 27 ICF-GUE test items. The predictors were object weight, lifting distance from floor, carrying, and lifting. Multiple regression was used to investigate the contribution that each independent variable makes to the model with 203 outpatients. Object weight and lifting distance were the only statistically and clinically significant independent variables in the model, accounting for 83% of the variance (p < 0.01). The model indicates that, with each one pound increase in object weight, item challenge increases by 0.16 (p < 0.00) logits, and with each one inch increase in distance lifted from floor, item challenge increased by 0.02 logits (p < 0.02). The findings suggest that the majority of the variance of the measurement model for the ICF-GUE can be explained by object weight and distance lifted from the floor.

____________________

Differential Item Functioning (DIF) and Subsequent Bias in Group Comparisons using a Composite Measurement Scale: A Simulation Study

Alexandra Rouquette, Jean-Benoit Hardouin, and Joël Coste

Abstract

Objective. To determine the conditions in which the estimation of a difference between groups for a construct evaluated using a composite measurement scale is biased if the presence of Differential Item Functioning (DIF) is not taken into account. Methods. Datasets were generated using the Partial Credit Model to simulate 642 realistic scenarios. The effect of seven factors on the bias on the estimated difference between groups was evaluated using ANOVA: sample size, true difference between groups, number of items in the scale, proportion of items showing DIF, DIF-size for these items, position of these items location parameters along the latent trait, and uniform/non-uniform DIF. Results. For uniform DIF, only the DIF-size and the proportion of items showing DIF (and their interaction term) had meaningful effects. The effect of non-uniform DIF was negligible. Conclusion. The measurement bias resulting from DIF was quantified in various realistic conditions of composite measurement scale use.

____________________

The Self-assessment Practices of Hong Kong Secondary Students: Findings with a New Instrument

Zi Yan

Abstract

Self-assessment is a core skill that enables students to engage in self-regulated learning. The purpose of this study was to examine the psychometric properties of a Self-assessment Practice Scale and to depict the characteristics of self-assessment practices of Hong Kong secondary students using this newly developed instrument. A total of 6,125 students from 10 Hong Kong secondary schools completed the survey. Both Rasch and factor analyses revealed a two-dimension scale structure (i.e., Self-directed Feedback Seeking and Self-reflection). The two subscales demonstrated acceptable psychometric properties and suggestions for further improvement were proposed. The findings regarding self-assessment practices of secondary students indicated that, in general, students were quite used to engaging in self-reflection based on available feedback, but they were less disposed to taking the initiative to seek feedback on their own performance. Key demographic variables, e.g., gender and year level, played important roles in students’ self-assessment practices. Girls had significantly higher selfassessment measures on both scales than did boys. Junior students had higher measures on both scales than did their senior counterparts. Implications and directions for future research were discussed.

____________________

The Measurement Properties of the Assessing Math Concepts’ Assessments of Primary Students’ Number Sense Skills

Christie Martin, Richard Lambert, Drew Polly, Chuang Wang, and David Pugalee

Abstract

The purpose of this study was to examine the measurement properties of the Assessing Math Concepts AMC Anywhere Hiding and Ten Frame Assessments, formative assessments of primary students’ number sense skills. Each assessment has two parts, where Part 1 is intended to be foundational skills for part two. Part 1 includes manipulatives whereas Part 2 does not. Student data from 228 kindergarten through second grade teachers with a total of 3,666 students was analyzed using Rasch scaling. Data analyses indicated that when the two assessments were examined separately the intended order of item difficulty was clear. When the parts of both assessments were analyzed together, the items in Part 2 were not consistently more difficult that the items in Part 1. This suggests an alternative sequence of tasks in that students may progress from working with a specific number with manipulatives then without manipulatives rather than working with a variety of numbers with manipulatives before moving onto assessments without manipulatives.

____________________

Rasch Analysis of the Malaysian Secondary School Student Leadership Inventory (M3SLI)

Mei-Teng Ling and Vincent Pang

Abstract

The importance of instilling leadership skills in students has always been a main subject of discussion in Malaysia. Malaysian Secondary School Students’ Leadership Inventory (M3SLI) is an instrument which has been piloted tested in year 2013. The main purpose of this study is to examine and optimize the functioning of the rating scale categories in M3SLI by investigating the rating scale category counts, average and expected rating scale category measures, and steps calibrations. In detail, the study was aimed to (1) identify whether the fivepoint rating scale was functioning as intended and (2) review the effect of a rating scale category revision on the psychometric characteristics of M3SLI. The study was carried out on students aged between 13 to 18 years (n = 2183) by stratified random sampling in 26 public schools in Sabah, Malaysia with the results analysed using Winsteps. This study found that the rating scale of Personality and Values constructs needed to be modified while the scale for Leadership Skills was maintained. For future studies, other aspects of psychometric properties like differential item functioning (DIF) based on demographic variables such as gender, school locations and forms should be researched on prior to the use of the instrument.

____________________

Vol. 17, No. 4, Winter 2016

Does Instruction Affect the Underlying Dimensionality of a Kinesiology Test?

Nikolaus Bezruczko, Eva Frank, and Kyle Perkins

Abstract

Does effective instruction, which changes students’ knowledge and possibly alters their cognitive functions, also affect the dimensionality of an achievement test? This question was examined by the parameterization of kinesiology test items (n = 42) with a Rasch dichotomous model, followed by an investigation of dimensionality in a pre- and post-test quasi-experimental study design. College students (n = 108) provided responses to kinesiology achievement test items. Then the stability of item difficulties, gender differences, and the interaction of item content categories with dimensionality were examined. In addition, a PCA/t-test protocol was implemented to examine dimensionality threats from the item residuals. Internal construct validity was investigated by regressing item content components on calibrated item difficulties. Measurement model item residuals were also investigated with statistical decomposition methods. In general, the results showed significant student achievement between pre and post testing, and dimensionality disturbances were relatively minor. The amount of unexpected item “shift” in an un-equated measurement dimension between pre and post testing was less than ten percent of the total items and largely concentrated among several unrelated items. An unexpected finding was a residual cluster consisting of several items testing related technical content. Complicating interpretation, these items tended to appear near the end of the test, which implicates test position as a threat to measurement equivalence. In general, the results across several methods did not tend to identify common threats and instead pointed to multiple sources of threats with varying degree of prominence. These results suggest conventional approaches to measurement equivalence that emphasize expedient overall procedures such as DIF, IRT, and factor analysis are probably capturing isolated sources of variability. Their implementation probably improves measurement equivalence but with substantial residual sources undetected.

____________________

Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement

Peter Hagell, and Albert Westergren

Abstract

Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N ? 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

____________________

Simultaneous Ability and Difficulty Estimation Via the Linear Discriminant Function

Jon-Paul Paolino

Abstract

In this paper, parameter estimation of the dichotomous Rasch model (Rasch, 1960) using the linear discriminant function (Fisher, 1936) is presented. This is accomplished by considering the scored item responses to be distinct groups and using a design matrix that is identical to one used in logistic regression for joint maximum likelihood estimation. The real dataset that was examined was the fraction subtraction dataset from Tatsuoka (1984). Through simulation parameter estimation accuracy using the linear discriminant function was compared to joint maximum likelihood estimation using logistic regression. Using the linear discriminant function person ability estimates from perfect total scores and total response scores of zero were estimable without using an ad hoc procedure, which is a well-known shortcoming of logistic regression based joint maximum likelihood estimation. Finally, computation of a closed-form solution for parameter estimation using the linear discriminant function is discussed.

____________________

Examining Class Differences in Method Effects Related to Negative Wording: An Example using Rasch Mixture Modeling

Grant B. Morgan, Christine DiStefano, and Robert W. Motl

Abstract

This study presents a mixture Rasch-based approach to investigating method effects associated with negatively worded items to illustrate how responses to this method effect vary depending on group characteristics. Using college students’ responses on the Rosenberg Self-Esteem scale (Rosenberg, M., 1989), four latent classes were identified using six personality measures associated with the presence of this method effect. In addition, Rasch-based parameter estimates suggested latent classes differed in their use of the scale, showing that the method effect associated with negatively worded items may be more prominent for subjects possessing selected personality traits than for others. The mixture model approach to investigating method effects provides a way to address systematic methodological variation that is left unaccounted when a heterogeneous population is analyzed as one group.

____________________

Assessment of Acute Trauma Exposure Responsefor FIRE-EMS Personnel

Melissa C. Hofmann

Abstract

Purpose. The purpose of this study was to develop an instrument that measures response to acute trauma exposure for firefighter and emergency medical service (EMS) personnel. The Acute Trauma Exposure Response Scale (ATERS) was intended to assess firefighter and EMS personnel response to acute trauma exposure from analytical, emotional, and physical perspectives. Methods. Data were analyzed on 97 firefighter and EMS personnel employed by a fire department in a midsized city in a western state. Principal component analysis (PCAR) using Winsteps software was employed to discover which variables in the set formed logical subsets that were independent of one another and included item analyses and assessment of internal consistency reliability (Cronbach’s alpha). Rasch analysis included examination of dimensionality, person and item reliability, scale use and function, and construct validity including person-item fit statistics. Results. Principal component analyses of residuals (PCAR) revealed three primary scales which were termed Emotional Psyche, Coping Ability, and Support Systems. Rasch analyses showed the ATERS performance to be acceptable as a new pilot measure with three distinct scales through reliability of person separation of .81 for Emotional Psyche, .66 for Coping Ability, and .63 for Support Systems, respectively (Nunnally and Bernstein, 1994; Carmines and Zeller, 1979; Devellis, 2012). The Rasch item reliability was .96 for Emotional Psyche, .95 for Coping Ability, and .97 for Support Systems. Response scale use and function was appropriate for each subscale. Validity was supported through PCA by evidence of good internal consistency. High item correlations indicated the items for each subscale were measuring a single construct. Likewise, Rasch analyses provided evidence of validity through an even spread of person ability to item difficulty for each of the three constructs. Good item fit provided proof of construct relevant variance and the absence of gaps along the unidimensional continuum indicated each construct to be represented adequately. Conclusion. The ATERS performs well as a measure of acute trauma exposure response for three primary constructs: Emotional Psyche, Coping Ability, and Support Systems with good Rasch person internal consistency reliability and factor structure. Items were deleted for each scale following PCA and Rasch analyses due to misfit and low loadings. Further research is recommended to optimally represent each construct in regards to person-item fit. Fire departments may utilize results of this study to assess current program effectiveness. Through evaluation, departments may incorporate programs and resources that are more effective at reducing stress associated with acute trauma, thereby increasing employees overall job satisfaction and performance.

____________________

A Rasch Rating Scale Analysis of the Presence of Nursing Scale-RN

Carol T. Kostovich, Beyza Aksu Dünya, Lee A. Schmidt, and Eileen G. Collins

Abstract

The phenomenon of nursing presence encompasses the emotional connection between nurse and patient, and technical skills performed by the nurse. The Presence of Nursing Scale-RN version (PONS-RN) was developed to measure nurses’ perceptions of their ability to be present to their patients. This study summarizes the process of re-evaluation of the psychometric properties of the PONS-RN instrument. A sample of 76 registered nurses providing direct patient care responded to the 31-item questionnaire. The Rasch rating scale model was used for assessing construct validity of PONS-RN data. A principal component analysis (PCA) of residuals supported appropriateness of the subscales defined by a 2-dimensional structure. The results of item and person fit analysis, rating scale functioning analysis and reliability analysis have demonstrated that the thirty-one item Presence of Nursing Scale-RN instrument yielded measures with high validity and reliability as two sub-scales.

____________________

Assessing the Psychometric Properties of Alternative Items for Certification

Mary Anne Krogh and Timothy Muckle

Abstract

Alternative items were added as scored items to the National Certification Examination for Nurse Anesthetists (NCE) in 2010. A common concern related to the new items has been their measurement attributes. This study was undertaken to evaluate the psychometric impact of adding these items to the examination. Candidates had a significantly higher ability estimate in alternative items than in multiple choice questions and 6.7% of test candidates performed significantly differently in alternative item formats. The ability estimates of multiple choice questions correlated at r = .58. The alternative items took significantly longer time to answer than standard multiple choice questions and discriminated to a higher degree than MCQs. The alternative items exhibited unidimensionality to the same degree as MCQs and the BIC confirmed the Rasch model as acceptable for scoring. The new item types were found to have acceptable attributes for inclusion in the certification program.

____________________

Likert is Pronounced “LICK-urt” not “LIE-kurt” and the Data are Ordinal not Interval

Patty Kero and Daniel Lee

Abstract

Likert-type scales are popular in educational research and often times analyzed using parametric tests. Implied in this kind of study is a general assumption that these data are interval in nature. The authors contend that this is an incorrect supposition as Likert type data are actually ordinal, hence any analysis should be restricted to non-parametric investigations. Such confusion is understandable as Likert-type responses are assigned numbers signifying varying degrees of agreement with respect to behaviors or attitudes giving rise to a certain quantitative air to these data. Such responses are qualitative with meaning limited specifically to the choices available to the respondent; no more and no less. The mode is the preferable measure of central tendency instead of the mean or standard deviation. Non-parametric analysis ensures future researchers do not mistakenly infer their results are replicable beyond that of their sample. Regrettably, Likert scales simply cannot meet this standard of reliability.

____________________

Home