Journal of Applied Measurement

P.O. Box 1283

Maple Grove, MN 55311

 


 

Volume 19 Article Abstracts

 

Vol. 19, No. 1, Spring 2018

The Impact of Missing Values and Single Imputation upon Rasch Analysis Outcomes: A Simulation Study

Carolina Saskia Fellinghauer, Birgit Prodinger, and Alan Tennant

Abstract

Imputation becomes common practice through availability of easy-to-use algorithms and software. This study aims to determine if different imputation strategies are robust to the extent and type of missingness, local item dependencies (LID), differential item functioning (DIF), and misfit when doing a Rasch analysis. Four samples were simulated and represented a sample with good metric properties, a sample with LID, a sample with DIF, and a sample with LID and DIF. Missing values were generated with increasing proportion and were either missing at random or completely at random. Four imputation techniques were applied before Rasch analysis and deviation of the results and the quality of fit compared. Imputation strategies showed good performance with <15% of missingness. The analysis with missing values performed best in recovering statistical estimates. The best strategy, when doing a Rasch analysis, is the analysis with missing values. If for some reason imputation is necessary, we recommend using the expectation-maximization algorithm.

____________________

Methods for the Comparison of Differential Item Functioning across Assessments

W. Holmes Finch, Maria Hernández Finch, Brian F. French, David E. McIntosh, and Lauren Moss

Abstract

An important aspect of the educational and psychological evaluation of individuals is the selection of scales with appropriate evidence of reliability and validity for inferences and uses of the scores for the population of interest. One key aspect of validity is the degree to which a scale fairly assesses the construct(s) of interest for members of different subgroups within the population. Typically, this issue is addressed statistically through assessment of differential item functioning (DIF) of individual items, or differential test functioning (DTF) of sets of items within the same measure. When selecting an assessment to use for a given application (e.g., measuring intelligence), or which form of an assessment to use for a test administration, researchers need to consider the extent to which the scales work with all members of the population. Little research has examined methods for comparing the amount or magnitude of DIF/DTF present in two or more assessments when deciding which assessment to use. The current study made use of 7 different statistics for this purpose, in the context of intelligence testing. Results demonstrate that by using a variety of effect sizes, the researcher can gain insights into not only which scales may contain the least amount of DTF, but also how they differ with regard to the way in which the DTF manifests itself.

____________________

Equating Errors and Scale Drift in Linked-Chain IRT Equating with Mixed-Format Tests

Bo Hu

Abstract

In linked-chain equating, equating errors may accumulate and cause scale drift. This simulation study extends the investigation on scale drift in linked-chain equating to mixed-format test. Specifically, the impact of equating method and the characteristics of anchor test and equating chain on equating errors and scale drift in IRT true score equating is examined. To evaluate equating results, a new method is used to derive true linking coefficients. The results indicate that the characteristic curve methods produce more accurate and reliable equating results than the moment methods. Although using more anchor items or an anchor test configuration with more IRT parameters can lower the variability of equating results, neither of them help control equating bias. Additionally, scale drift increases when an equating chain runs longer or poorly calibrated test forms are added to the chain. The role of calibration precision in evaluating equating results is highlighted.

____________________

Validation of Response Similarity Analysis for the Detection of Academic Cheating: An Experimental Study

Georgios D. Sideridis and Cengiz Zopluoglu

Abstract

The purpose of the present study was to evaluate various analytical means to detect academic cheating in an experimental setting. The omega index was compared and contrasted given a gold criterion of academic cheating which entailed a discrepant score between two administrations using an experimental study with real test takers. Participants were 164 elementary school students who were administered a mathematics exam followed by an equivalent mock exam under conditions of strict and relaxed, invigilation, respectively. Discrepant scores were defined as exceeding 7 responses in any direction (correct or incorrect), based on what was expected due to chance. Results indicated that the omega index was successful in capturing more than 39% of the cases who exceeded the conventional ±7 discrepancy criteria. It is concluded that the response similarity analysis may be an important tool in detecting academic cheating.

____________________

Rasch Analysis of the Teachers’ Knowledge and Use of Data and Assessment (tKUDA) Measure

Courtney Donovan

Abstract

Teachers are expected to use data and assessments to drive their instruction. This is accomplished at a classroom level via the assessment process. The teachers Knowledge and Use of Data and Assessment (tKUDA) measure was created to capture teachers’ knowledge and use of this assessment process. This paper explores the measure’s utility using Rasch analysis. Evidence of reliability and validity was seen for both knowledge and use factors. Scale was used as expected and item analyses demonstrates good spread with a few items identified for future revision. Item difficulty and results are connected back to literature. Findings support use of this measure to identify teachers’ knowledge and use of data and assessment in classroom practice.

____________________

Psychometric Properties and Differential Item Functioning of a Web-Based Assessment of Children’s Social Perspective-Taking

Beyza Aksu Dunya, Clark McKown, and Everett V. Smith

Abstract

Social perspective-taking (SPT), which involves the ability infer others’ intentions, is a consequential social cognitive process. The purpose of this study is to evaluate the psychometric properties of a web-based social perspective-taking (SELweb SPT) assessment designed for children in kindergarten through third grade. Data were collected from two separate samples of children. The first sample included 3224 children and the second sample included 4419 children. Data were calibrated using Rasch dichotomous model (Rasch, 1960). Differential item and test functioning were also evaluated across gender and ethnicity groups. Across both samples, we found: evidence of consistent item fit; unidimensional item structure; and adequate item targeting. Poor item targeting at high and low ability levels suggests that more items are needed to distinguish low and high ability respondents. Analyses of DIF found some significant item-level DIF across gender, but no DIF across ethnicity. The analyses of person measure calibrations with and without DIF items evidenced negligible differential test functioning (DTF) across gender and ethnicity groups in both samples.

____________________

Assessment of Test Items with Rasch Measurement Model

Patrick U. Osadebe

Abstract

The study was carried out to assess the difficulty index of each item of an Economics Achievement test with the Rasch model. The infit and outfit as well as the reliability of the test were determined. Three research questions were drawn to guide the study. A sample of 200 was randomly selected using simple random sampling of balloting and proportionate stratified random sampling. The instrument of the study was an Economics Achievement Test with 100 items. The test has face and content validities. It has a reliability coefficient of 0.86 established through the use of Kuder-Richardson 20 method. A Rasch model software of Winsteps version 3.75 was used to analyse the data collected. The result identified the difficulty index of each item. The infit and outfit of both MNSQ and ZSTD were determined. The reliability of the Economics Achievement Test was estimated. It was recommended among others that Rasch model should always be used in assessing the item difficulty of a test and ensure the stability of item parameters.

____________________

 

Vol. 19, No. 2, Summer 2018

Comparing Disability Levels for Community-dwelling Adults in the United States and the Republic of Korea using the Rasch Model

Ickpyo Hong, Annie N. Simpson, Kit N. Simpson, Sandra S. Brotherton, and Craig A. Velozo

Abstract

This study compared disability levels between community-dwelling adults in the United States and South Korea using two national surveys of the United States and Korean National Health and Examination Survey (NHANES and KNHANES). The Rasch common-item equating method was used to create the same measurement framework and compared average disability levels. The disability levels between the two countries were estimated using the current disability estimation method (percentage of people having disability based on a single question). A higher percentage of American adults (20.5%) showed disability than the Korean adults (9.6%) based on the current estimation method; however, using the Rasch model American adults had significantly less disability (Mean = –3.00 logits, SD = 1.67) than the Korean adults (Mean = –2.48 logits, SD = 2.13). Complementary to comparisons of the frequency of disability, comparison of the combined magnitude and strength of disability across countries provides new information that may better inform public health and policy decisions.

____________________

Using the Rasch Model to Investigate Inter-board Comparability of Examination Standards in GCSE

Qingping He and Michelle Meadows

Abstract

By treating each examination as a polytomous item and a grade that a student achieved in the exam as a score on the item, the partial credit model (PCM) has been used to analyse data from examinations in 16 GCSE subjects taken by 16-year olds in England. These examinations are provided by four different exam boards. By further treating students taking the exams testing the same subject but provided by different exam boards as different subgroups, differential category functioning (DCF) analysis was used to investigate the comparability of standards at specific grades in the examinations between the exam boards. It was found that for most of the grades across the examinations, the magnitude of the DCF effect with respect to exam boards for the majority of the subjects studied is small, with the differences between grade difficulties for individual exam boards and the all-board difficulty in the unit of grade being less than one fifth of a grade. The effect of DCF varies between subjects and between grades within the same subject, with higher grades shown to be generally more comparable in standards than the lower grades between the exam boards.

____________________

Using Repeated Ratings to Improve Measurement Precision in Incomplete Rating Designs

Eli Jones and Stefanie A. Wind

Abstract

When selecting a design for rater-mediated assessments, one important consideration is the number of raters who rate each examinee. In balancing costs and rater-coverage, rating designs are often implemented wherein only a portion of the examinees are rated by each judge, resulting in large amounts of missing data. One drawback to these sparse rating designs is the reduced precision of examinee ability estimates they provide. When increasing the number of raters per examinee is not feasible, another option may be to increase the number of ratings provided by each rater per examinee. This study applies a Rasch model to explore the effect of increasing the number of rating occasions used by raters to judge examinee proficiency. We used a simulation study to approximate a sparse but connected rater network with a sequentially increasing number of repeated ratings per examinee. The generated data were used to explore the influence of repeated ratings on the precision of rater, examinee, and task parameter estimates as measured by parameter standard errors, the correlation of sparse parameter estimates to true estimates, and the root mean square error of parameter estimates. Results suggest that increasing the number of rating occasions significantly improves the precision of examinee and rater parameter estimates. Results also suggest that parameter recovery levels of rater and task estimates are quite robust to reductions in the number of repeated ratings, although examinee parameter estimates are more sensitive to them. Implications for research and practice in the context of rater-mediated assessment designs are discussed.

____________________

The Impact of Differential Item Functioning on the Warwick-Edinburgh Mental Well-Being Scale

Hong Eng Goh, Ida Marais, Michael Ireland

Abstract

Establishing the internal validity of psychometric instruments is an important research priority, and is especially vital for instruments that are used to collect data to guide public policy decisions. The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) is a well-established and widely-used instrument for assessing individual differences in well-being. The current analyses were motivated by concerns that metal wellbeing items that refer to interpersonal relationships (Items 9 and 12) may operate differently for those in a relationship compared to those not in a relationship. To assess this, the present study used item characteristic curves (ICC) and ANOVA of residuals to scrutinize the differential item functioning (DIF) of the 14 WEMWBS items for participant relationship status (n with partner = 261, n without partner = 210). Items 5, 9, and 12 showed evidence of DIF which impacted group mean differences. Item 5 (“energy to spare”) was unexpected, however plausible explanation is discussed. For participants at the same level of mental wellbeing, those in a relationship scored higher on items 9 and 12 than those not in a relationship. This suggests these items are sensitive to non-wellbeing related variance associated with relationship status. Implications and future research directions are discussed.

____________________

Rasch Analysis of the Brief Symptom Inventory-18 with African Americans

Ruth Chu-Lien Chao, Kathy Green, and Kranti Dugar

Abstract

Although the United States offers some of the most advanced psychological services in the world, not everyone in U.S. shares equally in these services, and health disparities persist when assessments do not appropriately measure different populations’ mental health problems. To address this assessment issue, we conducted factor and Rasch analyses to assess the psychometric characteristics of the Brief Symptom Inventory-18 (BSI-18) to evaluate whether the BSI is culturally appropriate for assessing African Americans’ psychological distress. The dimensional structure of the BSI was first identified and held up under cross-validation with a second subsample. The measure was unidimensional among African Americans. Our results also suggested minimal person separation, stability across subsamples, and little differential item functioning. Most African Americans identified themselves on the low end of the categories in a 0-4 rating scale, indicating their low endorsement of the items on the BSI. Rasch analyses were completed with the original scale but also collapsing the scale to three points, with some increase in separation and reliability for the collapsed scale. Differences in mean person position were found for mental health-related variables, consistent with hypotheses. Implications for theory and research on multicultural health scales are discussed as are effects of severe item skewness on analyses.

____________________

Development and Calibration of Chemistry Items to Create an Item Bank, using the Rasch Measurement Model

Joseph N. Njiru and Joseph T. Romanoski

Abstract

This article describes the development and calibration of items from the 1997 to 2006 Tertiary Entrance Exams (TEE) in Chemistry conducted by the Curriculum Council of Western Australia for the purposes of establishing a Chemistry item bank. Only items that met the strict Rasch measurement criterion of ordered thresholds were included. Item Residuals and Chi-square conformity of the items were likewise scrutinized. Further, specialist experts in chemistry were employed to ascertain the qualitative properties of the items, particularly the item wording, so as to provide accurate item descriptors. An item bank of 174 items was created. This item bank may now be accurately used by teachers in their classrooms for the purposes of developing class assessments in Chemistry and/or for classroom diagnostic purposes.

____________________

Psychometric Evaluation of the Revised Current Statistics Self-efficacy (CSSE-26) in a Graduate Student Population using Rasch Analysis

Pei-Chin Lu, Samantha Estrada, and Steven Pulos

Abstract

The Current Statistics Self-Efficacy (CSSE) scale, developed by Finney and Schraw (2003), is a 14-item instrument to assess students’ statistics self-efficacy. No previous research has used the Rasch measurement models to evaluate the psychometric structure of its scores at the item level, and only a few of them have applied the CSSE in a graduate school setting. A modified 30-item CSSE scale was tested on a graduate student population (N = 179). The Rasch rating scale analysis identified 26 items forming a unidimensional measure. Assumptions of sample-free and test-free measurement were confirmed, showing scores from the CSSE-26 are reliable and valid to assess graduate students’ level of statistics self-efficacy. Findings suggest the CSSE-26 could help facilitate professors’ understanding and enhancement of students’ statistics self-efficacy.

____________________

 

Vol. 19, No. 3, Fall 2018

The Impact of Levels of Discrimination on Vertical Equating in the Rasch Model

Stephen M. Humphry

Abstract

Aligning scales in vertical equating carries a number of challenges for practitioners in contexts such as large-scale testing. This paper examines the impact of high and low discrimination on the results of vertical equating when the Rasch model is applied. A simulation study is used to show that different levels of discrimination introduce systematic error into estimates. A second simulation study shows that for the purpose of vertical equating, items with high or low discrimination contain information about translation constants that contains systematic error. The impact of differential item discrimination on vertical equating is examined and subsequently illustrated in terms of a real data set from a large-scale testing program, with vertical links between grade 3 and 5 numeracy tests. Implications of the results for practitioners conducting vertical equating with the Rasch model are identified, including monitoring progress over time. Implications for other item response models are also discussed.

____________________

Person-Level Analysis of the Effect of Cognitive Loading by Question Difficulty and Question Time Intensity on Didactic Examination Fluency (Speed-Accuracy Tradeoff)

James J. Thompson

Abstract

Fluency may be considered as a conjoint measure of work product quality and speed. It is especially useful in educational and medical settings to evaluate expertise and/or competence. In this paper, didactic exams were used to model fluency. Binned propensity matching with question difficulty and time intensity was used to define a “load” variable and construct fluency (sum correct/ elapsed response time). Response surfaces as speed-accuracy tradeoffs resulted from the analysis. Person by load fluency matrices behaved well in Rasch analysis and warranted the definition of a person fluency variable (“skill”). A path model with skill and load as mediators substantially described the fluency data. The indirect paths through skill and load dominated direct variable effects. This is supportive evidence that skill and load have stand-alone merit. Therefore, it appears that the constructs of skill, load, and fluency could provide psychometrically defensible descriptors when utilized in appropriate contexts.

____________________

Detecting Rater Effects under Rating Designs with Varying Levels of Missingness

Rose E. Stafford, Edward W. Wolfe, Jodi M. Casabianca, and Tian Song

Abstract

Previous research has shown that indices obtained from partial credit model (PCM) estimates can detect severity and centrality rater effects, though it remains unknown how rater effect detection is impacted by the missingness inherent in double-scoring rating designs. This simulation study evaluated the impact of missing data on rater severity and centrality detection. Data were generated for each rater effect type, which varied in rater pool quality, rater effect prevalence and magnitude, and extent of missingness. Raters were flagged using rater location as a severity indicator and the standard deviation of rater thresholds a centrality indicator. Two methods of identifying extreme scores on these indices were compared. Results indicate that both methods result in low Type I and Type II error rates (i.e., incorrectly flagging non-effect raters and not flagging effect raters) and that the presence of missing data has negligible impact on the detection of severe and central raters.

____________________

A Rasch Model Analysis of the Emotion Regulation Questionnaire

Michael J. Ireland, Hong Eng Goh, and Ida Marais

Abstract

The 10-item Emotion Regulation Questionnaire (ERQ) was developed to measure individual differences in the tendency to use two common emotion regulation strategies: cognitive reappraisal and suppression. The current study examined the psychometric properties of the ERQ in a heterogeneous mixed sample of 713 (64.9% female) community residents using the polytomous Rasch model. The results showed that the 10-item ERQ was multidimensional and supported the two distinct factors. The reappraisal and suppression subscales were both found to be unidimensional and fit the Rasch model. No evidence of local dependence was observed. The five response categories also functioned as intended. Differential item functioning (DIF) was assessed across subsamples defined by gender, self-report experiencing symptoms of mental illness, regular meditation practice, and age groupings. No evidence emerged of items functioning differently across any of these groups. Using Rasch measure scores, a number of meaningful group differences in person location emerged. Less use of reappraisal was reported by younger adults, non-meditators, and those reporting experiencing symptoms of mental illness. Non-meditators also reported greater use of suppression compared with regular meditators; no other age group, gender, or symptomatic group differences emerged on suppression.

____________________

Measuring the Impact of Caring for a Spouse with Alzheimer’s Disease: Validation of the Alzheimer’s Patient Partners Life Impact Questionnaire (APPLIQue)

Peter Hagell, Matthew Rouse, and Stephen P. McKenna

Abstract

Alzheimer’s disease (AD) is the most common form of dementia, characterized by cognitive, psychiatric and behavioral symptoms and increasing dependency. Family members typically assume increasing caregiving responsibilities, with considerable quality of life (QoL) impact. This article describes the testing of a needs-based QoL questionnaire for AD family caregivers. Initial analyses, according to Rasch measurement theory, suggested that items applied to spousal rather than non-spousal caregivers. Following removal of non-spousal responders, a 25-item questionnaire was identified that exhibited acceptable model fit, a mean (SD) person location of 0.194 (1.42) logits, residual correlations less thah or equal to 0.173 and absence of DIF by age, gender or administration. Reliability was 0.85. This new measure, the Alzheimer’s Patient Partners Life Impact Questionnaire (APPLIQue), may fill an important gap in assessing the impact of AD on spousal caregivers and outcomes of interventions aimed at caregivers as well as persons with AD.

____________________

Psychometric Properties of an Instrument to Measure Word Problem Solving Skills in Mathematics

Chunlian Jiang, Do-Hong Kim, and Chuang Wang

Abstract

The purpose of this study is to examine the psychometric properties of an instrument to measure word problem solving skills in mathematics related to speed with 706 sixth grade Chinese and Singaporean students. Rasch measurement models were applied to examine the reliability, unidimensionality, rating scale functioning, item difficulty, and person difficulty. The differential item functioning (DIF) analysis was also performed to examine the differences in item difficulty estimates between Chinese and Singaporean students. Results suggest that the data satisfied the unidimensionality requirements of the Rasch model and that most of the item difficulty measures aligned the person ability distribution. The instrument demonstrated adequate reliability. The fit statistics were within acceptable limits for the vast majority of items, with a few exceptions. The rating scale structure functioned properly although the middle categories had very few observations. Deleting misfitting cases and collapsing middle categories slightly improved the psychometric properties. DIF analysis revealed that four items were more difficult for Chinese students whereas two other items were more difficult for the Singaporean students. Results also indicated that the Chinese participants scored higher than the Singaporean participants for 11 of the 14 items and the Singaporean students scored higher than their Chinese cohorts in the other 3 items. The validation of this instrument has implications for the teaching and learning of mathematical word problems in practice.

____________________

Evaluation of Self-management Program Outcomes: Adaptation and Testing of a Swedish Version of the Health Education Impact Questionnaire (heiQ)

Christine Kumlien, Michael Miller, Cecilia Fagerstrom, and Peter Hagell

Abstract

Self-management programs require a range of indicators to evaluate their outcomes. The Health Education Impact Questionnaire (heiQ) was developed to meet this need. The heiQ contains 40 items with 4 response categories, representing eight scales. We developed a Swedish version of the heiQ that was tested by cognitive interviews (n = 15) and psychometrically (n = 177) using classical test theory (CTT) and Rasch measurement theory (RMT). The Swedish heiQ was easily understood by interviewees and met CTT criteria, with supported scaling assumptions (corrected item-total correlations, less than or equal to 0.37) and reliability (ordinal alpha less than or equal to 0.78). General support was demonstrated for the measurement properties of the eight heiQ scales by acceptable RMT fit. However, there were signs of malfunctioning response categories for four items in two scales, and of suboptimal item coverage of the measurement continua. The Swedish heiQ appears comparable to other available language versions. Further efforts may be needed to optimize response categories and measurement precision.

____________________

Developing and Validating a Scientific Multi-Text Reading Comprehension Assessment: In the Text Case of the Dispute of whether to Continue the Fourth Nuclear Power Plant Construction in Taiwan

Lin Hsiao-Hui and Yuh-tsuen Tzeng

Abstract

This study aimed to advance the Scientific Multi-Text Reading Comprehension Assessment (SMTRCA) by developing a rubric which consisted of 4 subscales: information retrieval, information generalization, information interpretation, and information integration. The assessment tool included 11 close-ended and 8 open-ended items and its rubric. Two texts describing opposing views of the dispute of whether to continue the Fourth Nuclear Power Plant construction in Taiwan were developed and 1535 grade 5-9 students read these two texts in a counterbalanced order and answered the test items. First, the results showed that the Cronbach’s a values were more than .9, indicating very good intra-rater consistency. The Kendall coefficient of concordance of the inter-rater reliability was larger than .8, denoting a consistent scoring pattern between raters. Second, the analysis of many-facet Rasch measurement showed that there were significant difference in rater severity, and both severe and lenient raters could distinguish high versus low-ability students effectively. The comparison of the rating scale model and the partial credit model indicated that each rater had a unique rating scale structure, meaning that the rating procedures involve human interpretation and evaluation during the scoring processes so that it is difficult to reach a machine-like consistency level. However, this is in line with expectations of typical human judgment processes. Third, the Cronbach’s a coefficient of the full assessment were above .85, denoting that the SMTRCA has high internal-consistency. Finally, confirmatory factory analysis showed that there was an acceptable goodness-of-fit among the SMTRCA. These results suggest that the SMTRCA was a useful tool for measuring multi-text reading comprehension abilities.

____________________

 

Vol. 19, No. 4, Winter 2018

Hierarchical and Higher-Order Factor Structures in the Rasch Tradition: A Didactic

Perman Gochyyev and Mark Wilson

Abstract

In this paper, we consider hierarchical and higher-order factor models and the relationship between them, and, in particular, we use Rasch models to focus on the exploration of these models. We present these models, their similarities and/or differences from within the Rasch modeling perspective and discuss their use in various settings. One motivation for this work is that certain well-known similarities and differences between the equivalent models in the two-parameter logistic model (2PL) approach do not apply in the Rasch modeling tradition. Another motivation is that there is some ambiguity as to the potential uses of these models, and we seek to clarify those uses. In recent work in the Item Response Theory (IRT) literature, the estimation of these models has been mostly presented using the Bayesian framework: here we show the use of these models using traditional maximum likelihood methods. We also show how to re-parameterize these models, which in some cases can improve estimation and convergence. These alternative parameterizations are also useful in “translating” suggestions for the 2PL models to the Rasch tradition (since these suggestions involve the interpretation of item discriminations, which are required to be unity in the Rasch tradition). Alternative parameterizations can also be used to clarify the relationship among these models. We discuss the use of these models for modeling multidimensionality and testlet effects and compare the interpretation of the obtained solutions to the interpretation for the multidimenisional Rasch model—a more common approach for accounting multidimensionality in the Rasch tradition. We demonstrate the use of these models using the partial credit model.

____________________

Factor Structure of the Community Reintegration of Service-Members (CRIS)in Veterans with Blast-Related Mild Traumatic Brain Injury

J. Kay Waid-Ebbs, Pey-Shan Wen, David P. Graham, Kathleen Ray, Audrey J. Leroux, Maureen K. O’Connor, and Drew Helmer

Abstract

Veterans with blast-related mild traumatic brain injury (mTBI) report difficulty engaging in life roles, also referred to as participation. Current measures are either global or lack comprehensive coverage of life roles and have not been validated in Veterans with mTBI. The Community Reintegration of Service-members instrument (CRIS) is a promising measure that was specifically developed for Veterans using a well-formulated conceptual framework and Rasch analysis. However, the CRIS has not been validated in Veterans with mTBI. Two data sets were combined for 191 Veterans with blast-related mTBI to conduct a confirmatory factor analysis of the CRIS. High residual and low loading items (33) were removed to improve the model fit. The remaining items demonstrated high correlation (0.87-0.89) between subscales and high test re-test (0.85 to 0.95). Mean scores were better for Veterans without Post Traumatic Stress Disorder (PTSD) or depression compared to Veterans with PTSD or depression. The refined CRIS offers a valid comprehensive measure of participation for Veterans with blast-related mTBI. Future directions include examining aspects of participation that may not be covered by the CRIS for Veterans with mTBI.

____________________

Examination of Item Quality in a State-Wide Music Assessment Program using Rasch Methodology

Yin Burgess, Jin Liu, and Mihaela Ene

Abstract

Students’ academic performance has been routinely assessed in various subjects, including arts education. The current study uses Rasch methodology to investigate item quality for an annual state-wide arts assessment program administered to 4th grade students. All multiple-choice items were previously analyzed through the true score theory (TST) framework to examine item difficulty, differential item functioning (DIF), and distractor quality. However, these traditional methods are sample-specific, and score interpretations are limited to the particular group being tested. Rasch methodology provides a sample-free framework for item analysis. This approach has the advantage of producing sample-invariant item parameters and using goodness-of-fit criteria to detect problematic items, leading to more accurate item analysis results. Study results suggest that majority of the items performed well and the test was appropriate to its intended audience and evaluation purpose. It also validates the test score interpretation and the use of this assessment program.

____________________

Validation Instrument to Evaluate Students’ Perception of Virtual Manipulatives in Learning Mathematics

Fereshteh Zeynivandnezhad

Abstract

The advent of new technologies has replaced physical manipulatives — which are physical models of equivalent representation of concepts — by virtual manipulatives which many learners and teachers find useful in the mathematics classroom. The current study investigated students’ motivation to engage with virtual manipulatives as a tool in the mathematics education. Activity theory was used to conduct a multicomponential survey of virtual manipulatives in education and administered it to 442 Iranian high school students with the aim of examining students’ perception of various aspects of the manipulatives. Using the Rasch-Andrich rating scale model (RSM), an item response theory model, psychometric features of the instrument were examined item endorsibility, learners’ ability, fit, and unidimensionality. The validated instrument can be used to find the factors that could improve students’ perceptions of virtual manipulatives in the mathematics classroom.

____________________

Psychometric Properties and Convergent Validity of the Chinese Version of the Rosenberg Self-Esteem Scale

Meng-Ting Lo, Ssu-Kuang Chen, and Ann A. O’Connell

Abstract

The present study used the Rasch rating scale model (RSM) to reassess the psychometric properties of the Chinese version of Rosenberg self-esteem scale (RSES) among 501 Grade 10 students in Taiwan. The reliability, dimensionality, and differential item functioning were examined. The dimensionality assumption was met after excluding item 8 (“I wish I could have more respect for myself.”). The successive response categories for item 7 (“I feel that I am a person of worth, at least on an equal plane with others.”) were not located in an expected order. After eliminating items 7 and 8 from analysis, the remaining 8-item RSES had acceptable fit statistics, good content coverage and high categorical omega, Rasch person and item reliability. The five response categories performed well; evidence for convergent validity was established through the high correlation between RSES and psychological being scores. Implications and recommendations for instrument users are discussed.

____________________

Rasch Analysis of the Revised Two-Factor Study Process Questionnaire: A Validation Study

Vernon Mogol, Yan Chen, Marcus Henning, Andy Wearn, Jennifer Weller, Jill Yielder, and Warwick Bagg

Abstract

The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) was developed in 1998 using the true score theory to measure students’ deep approaches (DA) and surface approaches (SA) to learning. Using Rasch analyses, this study aimed to 1) validate the R-SPQ-2F’s two-factor structure, and 2) explore whether the full scale (FS), after reverse scoring responses to SA items, could measure learning approach as a uni-dimensional construct. University students (N = 327) completed an online version of the R-SPQ-2F. The researchers validated the RSPQ-2F by showing that items on the three rating scales (DA, SA, and FS) had acceptable fit; both DA and FS, but not SA, showed acceptable targeting function; and all three scales had acceptable reliabilities (0.74 - 0.79). The DA and SA scales, not the FS, satisfied the unidimensionality requirement, supporting the claim that student approaches to learning are represented by DA and SA as separate constructs.

____________________

A Measurement Model of City-Based Consumer Patriotism in Developing Countries: The Case of Vietnam

Ngoc Chu Nguyen Mong and Trong Hoang

Abstract

This study examined a measurement model for the construct of consumer patriotism in the context of city-based consumers in Vietnam, a developing country, and the linkage of consumer patriotism with consumer ethnocentrism. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were conducted to assess the measurement model. A mediator effect test was utilised to test the hypothesis of the model, using a multiple regression procedure. Two studies were carried out, the first a preliminary study with a convenience sample of 230 people and the second a full study with a probability sample of 300 people. Both studies showed that there was an acceptable fit for the measurement model of consumer patriotism. In addition, consumer patriotism was found to be a mediator in the connection of natural patriotism and ethnocentrism for city-based Vietnamese consumers.

____________________

Home