abst.htm

Vol. 14, No. 1 Spring 2013

A Bootstrap Approach to Evaluating Person and Item Fit to the Rasch Model

Edward W. Wolfe

Abstract

Historically, rule-of-thumb critical values have been employed for interpreting fit statistics that depict anomalous person and item response patterns in applications of the Rasch model. Unfortunately, prior research has shown that these values are not appropriate in many contexts. This article introduces a bootstrap procedure for identifying reasonable critical values for Rasch fit statistics and compares the results of that procedure to applications of rule-of-thumb critical values for three example datasets. The results indicate that rule-of-thumb values may over- or under-identify the number of misfitting items or persons.

****

Using The Rasch Measurement Model to Design a Report Writing Assessment Instrument

Wayne R. Carlson

Abstract

This paper describes how the Rasch measurement model was used to develop an assessment instrument designed to measure student ability to write law enforcement incident and investigative reports. The ability to write reports is a requirement of all law enforcement recruits in the state of Michigan and is a part of the state’s mandatory basic training curriculum, which is promulgated by the Michigan Commission on Law Enforcement Standards (MCOLES). Recently, MCOLES conducted research to modernize its training and testing in the area of report writing. A structured validation process was used, which included: a) an examination of the job tasks of a patrol officer, b) input from content experts, c) a review of the professional research, and d) the creation of an instrument to measure student competency. The Rasch model addressed several measurement principles that were central to construct validity, which were particularly useful for assessing student performances. Based on the results of the report writing validation project, the state established a legitimate connectivity between the report writing standard and the essential job functions of a patrol officer in Michigan. The project also produced an authentic instrument for measuring minimum levels of report writing competency, which generated results that are valid for inferences of student ability. Ultimately, the state of Michigan must ensure the safety of its citizens by licensing only those patrol officers who possess a minimum level of core competency. Maintaining the validity and reliability of both the training and testing processes can ensure that the system for producing such candidates functions as intended.

****

Using Multidimensional Rasch to Enhance Measurement Precision: Initial Results from Simulation and Empirical Studies

Magdalena Mo Ching Mok and Kun Xu

Abstract

This study aimed to explore the effect on measurement precision of multidimensional, as compared with unidimensional, Rasch measurement for constructing measures from multidimensional Likert-type scales. Many educational and psychological tests are multidimensional but common practice is to ignore correlations among the latent traits in these multidimensional scales in the measurement process. These practices may have serious validity and reliability implications. This study made use of both empirical data from 208,083 students, and simulated data simulated by 24 systematic combinations, each replicated 1000 times, of three conditions, namely, sample size, degree of dimensionality, and scale length to compare unidimensional and multidimensional approaches and to identify effects of sample size, dimensionality and scale length on measurement precision. Results showed that the multidimensional Rasch approach yielded more precise estimates than did unidimensional approach if the two dimensions were strongly correlated. The effect was more pronounced for long scales.

****

Using the Dichotomous Rasch Model to Analyze Polytomous Items

Qingping He and Chris Wheadon

Abstract

One of the most important applications of the Rasch measurement models in educational assessment is the equating of tests. An important feature of attainment tests is the use of both dichotomous and polytomous items. The partial credit model (PCM) developed by Masters (1982) represents an extension of the dichotomous Rasch model for analysing polytomous item data. The dichotomous Rasch model has been used primarily to analyse dichotomous item data. Whilst the partial credit model can provide detailed information on the performance of individual score categories of polytomous items, it is mathematically more complex to use than the dichotomous Rasch model and can, under certain circumstances, present difficulties in interpreting item measures and in practical applications. This study explores the potential of using the dichotomous Rasch model to analyse polytomous items and equate tests. Results obtained from a simulation study and from analysing the data of a science achievement test indicate that the partial credit model and the dichotomous Rasch model produce similar item and person measures and equivalent cut scores on different test forms.

****

With Hiccups and Bumps: The Development of a Rasch-based Instrument to Measure Elementary Students’ Understanding of the Nature of Science

Shelagh M. Peoples, Laura M. O’Dwyer, Katherine A. Shields, and Yang Wang

Abstract

This research describes the development process, psychometric analyses and part validation study of a theoretically- grounded Rasch-based instrument, the Nature of Science Instrument-Elementary (NOSI-E). The NOSI-E was designed to measure elementary students’ understanding of the Nature of Science (NOS). Evidence is provided for three of the six validity aspects (content, substantive and generalizability) needed to support the construct validity of the NOSI-E. A future article will examine the structural and external validity aspects. Rasch modeling proved especially productive in scale improvement efforts. The instrument, designed for large-scale assessment use, is conceptualized using five construct domains. Data from 741 elementary students were used to pilot the Rasch scale, with continuous improvements made over three successive administrations. The psychometric properties of the NOSI-E instrument are consistent with the basic assumptions of Rasch measurement, namely that the items are well-fitting and invariant. Items from each of the five domains (Empirical, Theory-Laden, Certainty, Inventive, and Socially and Culturally Embedded) are spread along the scale’s continuum and appear to overlap well. Most importantly, the scale seems appropriately calibrated and responsive for elementary school-aged children, the target age group. As a result, the NOSI-E should prove beneficial for science education research. As the United States’ science education reform efforts move toward students’ learning science through engaging in authentic scientific practices (NRC, 2011), it will be important to assess whether this new approach to teaching science is effective. The NOSI-E can be used as one measure of whether this reform effort has an impact.

****

Application of Single-level and Multi-level Rasch Models using the lme4 Package

Iasonas Lamprianou

Abstract

The aim of the article is to illustrate how researchers may use the lme4 package to run multilevel Rasch models. The lme4 package is a popular open-source software and is frequently used by researchers around the world to fit generalized mixed-effects models with crossed or partially crossed random effects. The article starts with a short discussion of the reasons why a researcher might, sometimes, be motivated to use a multi-level Rasch model and presents a practical example using empirical data. The main features of the lme4 package are presented, and finally, the paper presents information about other open-source software that could alternatively be used to fit multi-level Rasch models.

****

Rasch Modeling to Assess Albanian and South African Learners’ Preferences for Real-life Situations to be Used in Mathematics: A Pilot Study

Suela Kacerja, Cyril Julie, and Said Hadjerrouit

Abstract

This paper reports on an investigation on the real-life situations students in grades 8 and 9 in South Africa and Albania prefer to use in Mathematics. The functioning of the instrument used to assess the order of preference learners from both countries have for contextual situations is assessed using Rasch modeling techniques. For both the cohorts, the data fit the Rasch model. The differential item functioning (DIF) analysis rendered 3 items operating differentially for the two cohorts. Explanations for these differences are provided in terms of differences in experiences learners in the two countries have related to some of the contextual situations. Implications for interpretation of international comparative tests are offered, as are the possibilities for the cross-country development of curriculum materials related to contexts that learners prefer to use in Mathematics.

****

Vol. 14, No. 2 Summer 2013

Adaptive Testing for Psychological Assessment: How Many Items Are Enough To Run an Adaptive Testing Algorithm?

Michaela M. Wagner-Menghin and Geoff N. Masters

Abstract

Although the principles of adaptive testing were established in the psychometric literature many years ago (e.g., Weiss, 1977), and practice of adaptive testing is established in educational assessment, it not yet widespread in psychological assessment. One obstacle to adaptive psychological testing is a lack of clarity about the necessary number of items to run an adaptive algorithm. The study explores the relationship between item bank size, test length and measurement precision. Simulated adaptive test runs (allowing a maximum of 30 items per person) out of an item bank with 10 items per ability level (covering .5 logits, 150 items total) yield a standard error of measurement (SEM) of .47 (.39) after an average of 20 (29) items for 85-93% (64-82%) of the simulated rectangular sample. Expanding the bank to 20 items per level (300 items total) did not improve the algorithm’s performance significantly. With a small item bank (5 items per ability level, 75 items total) it is possible to reach the same SEM as with a conventional test, but with fewer items or a better SEM with the same number of items.

****

DIF Cancellation in the Rasch Model

Adam E. Wyse

Abstract

Differential item functioning (DIF) cancellation occurs when the cumulative effect of an item or set of items exhibiting DIF against one subgroup cancels with other items that exhibit DIF against the comparison group and hence results in non-existent DIF at the test level. This paper investigates DIF cancellation in the context of Rasch measurement. It is shown that this phenomenon is not a property of the Rasch model, but rather, a function of the manner in which item parameters are estimated and the way that DIF impacts these estimates. The conditions under which DIF cancellation would exist when using the Rasch model are suggested and a proof is provided to support this suggestion. Empirical examples are provided to refute prior suggestions that DIF cancellation always exists if the Rasch model is used.

****

Multidimensional Diagnostic Perspective on Academic Achievement Goal Orientation Structure, Using the Rasch Measurement Models

Daeryong Seo, Husein Taherbhai, and Insu Paek

Abstract

This study is designed to investigate a multidimensional structure of academic achievement goal orientations from a diagnostic perspective, using the Rasch measurement models. A data set of Korean students who responded to the Patterns of Adaptive Learning Survey (PALS) was analyzed. Both consecutive unidimensional and multidimensional Rasch measurement models were applied for comparative purposes. Each goal orientation dimension (i.e., the attitude) was standardized and then classified into three categorical levels, i.e., low, middle and high. These categorizations of goal dimensions were used to examine the role of students’ performanceapproach goals on mathematics achievement in relation with the other achievement goals. Results indicate that the multidimensional partial credit model was the best model with respect to the fit of the data to the models. Findings of the current study also demonstrate that practitioners who need specific feedback for instruction and/ or intervention can benefit from the multidimensional approach.

****

An Extension of a Bayesian Approach to Detect Differential Item Functioning

Sandip Sinharay

Abstract

The application of the existing test statistics to determine differential item functioning (DIF) requires large samples, but test administrators often face the challenge of detecting DIF with small samples. One advantage of a Bayesian approach over a frequentist approach is that the former can incorporate, in the form of a prior distribution, existing information on the inference problem at hand. Sinharay, Dorans, Grant, and Blew (2009) suggested the use of information from past data sets as a prior distribution in a Bayesian DIF analysis. This paper suggests an extension of the method of Sinharay et al. (2009). The suggested extension is compared to the existing DIF detection methods in a realistic simulation study.

****

The Development of the de Morton Mobility Index (DEMMI) in an Older Acute Medical Population: Item Reduction using the Rasch Model (Part 1)

Natalie A. de Morton, Megan Davidson, and Jennifer L. Keating

Abstract

The DEMMI (de Morton Mobility Index) is a new and advanced instrument for measuring the mobility of all older adults across clinical settings. It overcomes practical and clinimetric limitations of existing mobility instruments. This study reports the process of item reduction using the Rasch model in the development of the DEMMI. Prior to this study, qualitative methods were employed to generate a pool of 51 items for potential inclusion in the DEMMI. The aim of this study was to reduce the item set to a unidimensional subset of items that ranged across the mobility spectrum from bed bound to high levels of independent mobility. Fifty-one physical performance mobility items were tested in a sample of older acute medical patients. A total of 215 mobility assessments were performed. Seventeen mobility items that spanned the mobility spectrum were selected for inclusion in the new instrument. The 17 item scale fitted the Rasch model. Items operated consistently across the mobility spectrum regardless of patient age, gender, cognition, primary language or time of administration during hospitalisation. Using the Rasch model, an interval level scoring system was developed with a score range of 0 to 100.

****

A Comparison of Confirmatory Factor Analysis and Multidimensional Rasch Models to Investigate the Dimensionality of Test-Taking Motivation

Christine E. DeMars

Abstract

Using a scale of test-taking motivation designed to have multiple factors, results are compared from a confirmatory factor analysis (CFA) using LISREL and a multidimensional Rasch partial credit model using ConQuest. Both types of analyses work with latent factors and allow the comparison of nested models. CFA models most typically model a linear relationship between observed and latent variables, while Rasch models specify a non-linear relationship between observed and latent variables. The CFA software provides many more measures of overall fit than ConQuest, which is focused more on the fit of individual items. Despite the conceptual differences in these techniques, the results were similar. The data fit a three-dimensional model better than the one-dimensional or two-dimensional models also hypothesized, although some misfit remained.

****

Measuring Alternative Learning Outcomes: Dispositions to Study in Higher Education

Maria Pampaka, Julian Williams, Graeme Hutcheson, Laura Black, Pauline Davis, Paul Hernandez-Martinez, and Geoff Wake

Abstract

In this paper we describe the validation of two scales constructed to measure pre-university students’ changing disposition (i) to enter Higher Education (HE) and (ii) to further study mathematically-demanding subjects. Items were selected drawing on interview data, and on a model of disposition as socially- as well as self- attributed. Rasch analyses showed that the two scales each produce robust one-dimensional measures on what we call a ‘strength of commitment to enter HE’ and ‘disposition to study mathematically-demanding subjects further’ respectively. However, the former scale was initially found to suffer psychometrically from a ceiling effect, which we ‘corrected’ by adding some harder items at a later data point, and revised the scale according to our interpretation of subsequent results. We finally discuss the potential significance of the constructed measures of learning outcomes, as variables in monitoring or even explaining students’ progress into different subjects in HE.

****

Vol. 14, No. 3 Fall 2013

The Development of the de Morton Mobility Index (DEMMI) in an Independent Sample of Older Acute Medical Patients: Refinement and Validation using the Rasch Model (Part 2)

Natalie A de Morton, Megan Davidson, and Jennifer L Keating

Abstract

This study describes the refinement and validation of the 17-item DEMMI in an independent sample of older acute medical patients. Instrument refinement was based on Rasch analysis and input from clinicians and researchers. The refined DEMMI was tested on 106 older general medical patients and a total of 312 mobility assessments were conducted. Based on the results of this study a further 2 items were removed and the 15 item DEMMI was adopted. The Rasch measurement properties of the DEMMI were consistent with estimates obtained from the instrument development sample. No differential item functioning was identified and an interval level scoring system was established. The DEMMI is the first mobility instrument for older people to be developed, refined and validated using the Rasch model. This study confirms that the DEMMI provides clinicians and researchers with a unidimensional instrument for measuring and monitoring changes in mobility of hospitalised older acute medical patients.

****

Rasch Modeling of Accuracy and Confidence Measures from Cognitive Tests

Insu Paek, Jihyun Lee, Lazar Stankov, and Mark Wilson

Abstract

The use of IRT models has not been rigorously applied in studies of the relationship between test-takers’ confidence and accuracy. This study applied the Rasch measurement models to investigate the relationship between test-takers’ confidence and accuracy on English proficiency tests, proposing potentially useful measures of under or overconfidence. The Rasch approach provided the scaffolding to formulate indices that can assess the discrepancy between confidence and accuracy at the item or total test level, as well as at particular ability levels locally. In addition, a “disattenuated” measure of association between accuracy and confidence, which takes measurement error into account, was obtained through a multidimensional Rasch modeling of the two constructs where the latent variance-covariance structure is directly estimated from the data. The results indicate that the participants tend to show overconfidence bias in their own cognitive abilities.

****

Baselines for the Pan-Canadian Science Curriculum Framework

Xiufeng Liu

Abstract

Using a Canadian student achievement assessment database, the Science Achievement Indicators Program (SAIP), and employing the Rasch partial credit measurement model, this study estimated the difficulties of items corresponding to the learning outcomes in the Pan-Canadian science curriculum framework and the latent abilities of students of grades 7, 8, 10, 11, 12 and OAC (Ontario Academic Course). The above estimates serve as baselines for validating the Pan-Canadian science curriculum framework in terms of the learning progression of learning outcomes and expected mastery of learning outcomes by grades. It was found that there was no statistically significant progression in learning outcomes from grades 4-6 to grades 7-9, and from grades 7-9 to grades 10-12; the curriculum framework sets mastery expectation about 2 grades higher than students’ potential abilities. In light of the above findings, this paper discusses theoretical issues related to deciding progression of learning outcomes and setting expectation of student mastery of learning outcomes, and highlights the importance of using national assessment data to establish baselines for the above purposes. This paper concludes with recommendations for further validating the Pan-Canadian science curriculum frameworks.

****

An Experimental Study Using Rasch Analysis to Compare Absolute Magnitude Estimation and Categorical Rating Scaling as Applied in Survey Research

Kristin L. K. Koskey, Toni A. Sondergeld, Svetlana A. Beltyukova, and Christine M. Fox

Abstract

Limited research has applied a measurement model to compare the rating scale functioning of categorical rating scaling (CRS) and absolute magnitude estimation scaling (MES) when rating subjective stimuli. We used an experimental design and applied the Rasch model to the survey data, with each respondent rating items using MES and one of four commonly used agreement-disagreement rating scales. The results indicated that the CRS and MES data were comparable in person and item separation and reliability when the respondents’ scales were known. MES had lower standard error for people and items; however MES had disordered step calibrations. Finally, the respondents reported preference of CRS to MES.

****

Developing of Two Instruments to Measure Attitudes of Vietnamese Parents and Students toward Schooling

Thi Kim Cuc Nguyen and Patrick Griffin

Abstract

The attitudes of parents and students towards schooling are often considered to be important factors associated with students’ educational outcomes. This article presents the process of constructing and calibrating two scales to measure the attitudes of students and parents in Vietnam, and then linking these two scales to compare the two groups. A set of items that covered both development and opportunity aspects of education was designed. After the items were trialled, a final version of 13 items was compiled. The two scales yielded scores that were shown to have logical, face, content and construct validity.

****

The Tendency of Individuals to Respond to High-Stakes Tests in Idiosyncratic Ways

Iasonas Lamprianou

Abstract

It has been frequently suggested that personal characteristics (e.g., language deficiencies, atypical schooling) may be responsible for the tendency of individuals to answer with aberrant response patterns to high stakes tests. This has not, however, been adequately validated using empirical data. This research uses datasets from seven mathematics, English and science papers to investigate the consistency with which individuals respond aberrantly across papers. Pupils who responded aberrantly on one paper were more likely to do so on other papers on the same subject. Also, pupils who responded aberrantly on one paper of one subject were more likely to do so on papers of another subject. Logistic multilevel models using the generation of aberrant response patterns as a dependent variable have suggested non-negligible intra-pupil and intra-school correlations.

****

Development and Validation of the Sense of Competence Scale, Revised

Cara McFadden, Gary Skaggs, and Steven Janosik

Abstract

The purpose of this study was to develop an instrument to measure the sense of competence of traditional age college students across the dimensions that define the construct. The Sense of Competence Scale-Revised (SCSR) was developed to provide a measure of Chickering’s (1969) first vector, an important psychosocial construct. Administrators can use data from the instrument to modify an institution’s academic and social environment to enhance the development of the intellectual, physical, and interpersonal competencies of college students. During the development and validation, various aspects of the SCS-R were examined in accordance with the validity framework outlined by Messick (1995). Of the six types of validity evidence proposed by Messick (1995), four were the primary focus: content, substantive, structural and generalizability. The evidence generated from the study suggested that the chosen items for the SCS-R support the validity of estimates of a student’s personal assessment of their sense of competence.

****

Vol. 14, No. 4 Winter 2013

Application of the Rasch Model to Measuring the Performance of Cognitive Radios

Edward W. Wolfe, Carl B. Dietrich, and Garrett Vanhoy

Abstract

Cognitive radios (CRs) are recent technological developments that rely on artificial intelligence to adapt a radio’s performance to suit environmental demands, such as sharing radio frequencies with other radios. Measuring the performance of the cognitive engines (CEs) that underlie a CR’s performance is a challenge for those developing CR technology. This simulation study illustrates how the Rasch model can be applied to the evaluation of CRs. We simulated the responses of 50 CEs to 35 performance tasks and applied the Random Coefficients Multidimensional Multinomial Logit Model (MRCMLM) to those data. Our results indicate that CEs based on different algorithms may exhibit differential performance across manipulated performance task parameters. We found that a multidimensional mixture model may provide the best fit to the simulated data and that the two algorithms simulated may respond to tasks that emphasize achieving high levels of data throughput coupled with lower emphasis on power conservation differently than they do to other combinations of performance task characteristics.

****

Properties of Rasch Residual Fit Statistics

Margaret Wu and Raymond J. Adams

Abstract

This paper examines the residual-based fit statistics commonly used in Rasch measurement. In particular, the paper analytically examines some of the theoretical properties of the residual-based fit statistics with a view to establishing the inferences that can be made using these fit statistics. More specifically, the relationships between the distributional properties of the fit statistics and sample size are discussed; some research that erroneously concludes that residual-based fit statistics are unstable is reviewed; and finally, it is analytically illustrated that, for dichotomous items, residual-based fit statistics provide a measure of the relative slope of empirical item characteristic curves. With a clear understanding of the theoretical properties of the fit statistics, the use and limitations of these statistics can be placed in the right light.

****

Validating Workplace Performance Assessments in Health Sciences Students: A Case Study from Speech Pathology

Sue McAllister, Michelle Lincoln, Alison Ferguson, and Lindy McAllister

Abstract

Valid assessment of health science students’ ability to perform in the real world of workplace practice is critical for promoting quality learning and ultimately certifying students as fit to enter the world of professional practice. Current practice in performance assessment in the health sciences field has been hampered by multiple issues regarding assessment content and process. Evidence for the validity of scores derived from assessment tools are usually evaluated against traditional validity categories with reliability evidence privileged over validity, resulting in the paradoxical effect of compromising the assessment validity and learning processes the assessments seek to promote. Furthermore, the dominant statistical approaches used to validate scores from these assessments fall under the umbrella of classical test theory approaches. This paper reports on the successful national development and validation of measures derived from an assessment of Australian speech pathology students’ performance in the workplace. Validation of these measures considered each of Messick’s interrelated validity evidence categories and included using evidence generated through Rasch analyses to support score interpretation and related action. This research demonstrated that it is possible to develop an assessment of real, complex, work based performance of speech pathology students, that generates valid measures without compromising the learning processes the assessment seeks to promote. The process described provides a model for other health professional education programs to trial.

****

Rasch Analysis for the Evaluation of Rank of Student Response Time in Multiple Choice Examinations

James J. Thompson, Tong Yang, and Sheila W. Chauvin

Abstract

The availability of computerized testing has broadened the scope of person assessment beyond the usual accuracyability domain to include response time analyses. Because there are contexts in which speed is important, e.g. medical practice, it is important to develop tools by which individuals can be evaluated for speed. In this paper, the ability of Rasch measurement to convert ordinal nonparametric rankings of speed to measures is examined and compared to similar measures derived from parametric analysis of response times (pace) and semi-parametric logarithmic time-scaling procedures. Assuming that similar spans of the measures were used, non-parametric methods of raw ranking or percentile-ranking of persons by questions gave statistically acceptable person estimates of speed virtually identical to the parametric or semi-parametric methods. Because no assumptions were made about the underlying time distributions with ranking, generality of conclusions was enhanced. The main drawbacks of the non-parametric ranking procedures were the lack of information on question duration and the overall assignment by the model of variance to the person by question interaction.

****

Assessing DIF Among Small Samples with Separate Calibration t and Mantel-Haenszel Chi-Square Statistics in the Rasch Model

Ira Bernstein, Ellery Samuels, Ada Woo, and Sarah L. Hagge

Abstract

The National Council Licensure Examination (NCLEX) program has evaluated differential item functioning (DIF) using the Mantel-Haenszel (M-H) chi-square statistic. Since a Rasch model is assumed, DIF implies a difference in item difficulty between a reference group, e.g., White applicants, and a focal group, e.g., African-American applicants. The National Council of State Boards of Nursing (NCSBN) is planning to change the statistic used to evaluate DIF on the NCLEX from M-H to the separate calibration t-test (t). In actuality, M-H and t should yield identical results in large samples if the assumptions of the Rasch model hold (Linacre and Wright, 1989, also see Smith, 1996). However, as is true throughout statistics, “how large is large” is undefined, so it is quite possible that systematic differences exist in relatively smaller samples. This paper compares M-H and t in four sets of computer simulations. Three simulations used a ten-item test with nine “fair” items and one potentially containing DIF. To address instability that may result from a ten-item test, the fourth used a 30-item test with 29 “fair” items and one potentially containing DIF. Depending upon the simulation, the magnitude of population DIF (0, .5, 1.0, and 1.5 z-score units), the ability difference between the focal and reference group (–1, 0, and 1 z-score units), the focal group size (0, 10, 20, 40, 50, 80, 160, and 1000), and the reference group size (500 and 1000) were varied. The results were that: (a) differences in estimated DIF between the M-H and t statistics are generally small, (b) t tends to estimate lower chance probabilities than M-H with small sample sizes, (c) neither method is likely to detect DIF, especially when it is of slight magnitude in small focal group sizes, and (d) M-H does marginally better than t at detecting DIF but this improvement is also limited to very small focal group sizes.

****

Application of Latent Variable Model in Rosenberg Self-Esteem Scale

Shing-On Leung and Hui-Ping Wu

Abstract

Latent Variable Models (LVM) are applied to Rosenberg Self-Esteem Scale (RSES). Parameter estimations automatically give negative signs hence no recoding is necessary for negatively scored items. Bad items can be located through parameter estimate, item characteristic curves and other measures. Two factors are extracted with one on self-esteem and the other on the degree to take moderate views, with the later not often being covered in previous studies. A goodness-of-fit measure based on two-way margins is used but more works are needed. Results show that scaling provided by models with more formal statistical ground correlated highly with conventional method, which may provide justification for usual practice.

****

A Rasch Analysis of the Statistical Anxiety Rating Scale

Eric D. Teman

Abstract

The conceptualization of a distinct construct known as statistics anxiety has led to the development of numerous rating scales, including the Statistical Anxiety Rating Scale (STARS), designed to assess levels of statistics anxiety. In the current study, the STARS was administered to a sample of 423 undergraduate and graduate students from a midsized, western United States university. The Rasch measurement rating scale model was used to analyze scores from the STARS. Misfitting items were removed from the analysis. In general, items from the six subscales represented a broad range of abilities, with the major exception being a lack of items at the lower extremes of the subscales. Additionally, a differential item functioning (DIF) analysis was performed across sex and student classification. Several items displayed DIF, which indicates subgroups may ascribe different meanings to those items. The paper concludes with several recommendations for researchers considering using the STARS.

****

Home