Journal of Applied Measurement
P.O. Box 1283
Maple Grove, MN 55311
Volume 16, 2015 Article Abstracts
Vol. 16, No. 1 Winter 2015
A Mathematical Theory of Ability Measure Based on Partial Credit Item Responses
Nan L. Kong
Abstract
This paper defines a measure of examinees’ abilities using additivity, the fundamental property of a measure, based on the partially-credited item responses. The fundamental properties of this newly-defined ability measure are demonstrated using mathematical proofs. This paper also shows that interactive ability and conditional ability are measurable with additivity. Finally, the paper looks at the ability measures associated with subscales and their decompositions.
____________________
Differential Item Functioning Analysis by Applying Multiple Comparison Procedures
Paolo Eusebi and Svend Kreiner
Abstract
Analysis within a Rasch measurement framework aims at development of valid and objective test score. One requirement of both validity and objectivity is that items do not show evidence of differential item functioning (DIF). A number of procedures exist for the assessment of DIF including those based on analysis of contingency tables by Mantel-Haenszel tests and partial gamma coefficients.
The aim of this paper is to illustrate Multiple Comparison Procedures (MCP) for analysis of DIF relative to a variable defining a very large number of groups, with an unclear ordering with respect to the DIF effect. We propose a single step procedure controlling the false discovery rate for DIF detection. The procedure applies for both dichotomous and polytomous items. In addition to providing evidence against a hypothesis of no DIF, the procedure also provides information on subset of groups that are homogeneous with respect to the DIF effect. A stepwise MCP procedure for this purpose is also introduced.
____________________
Visually Discriminating Upper Case Letters, Lower Case Letters and Numbers
Janet Richmond, Russell F. Waugh, and Deslea Konza
Abstract
English and number literacy are important for successful learning and testing student literacy and numeracy standards enables early identification and remediation of children who have difficulty. Rasch measures were created with the RUMM2020 computer program for the perceptual constructs of visual discrimination upper case letters, lower case letters and numbers. Thirty items for Visual Discrimination of Upper Case Letters (VDUCL), 36 for Lower Case Letters (VDLCL) and 20 for Visual Discrimination of Numbers (VDN) were presented to 324 Pre-Primary through Year 4 children, aged 4-9 years old. All students attended school in Perth, Western Australia. Eighteen of the initial 30 items for VDUCL, thirty-one of the original 36 items for VDLCL and thirteen of the original 20 items for VDN were used to create linear scales (the others were deleted due to misfit) and these clearly showed which letters and numbers children said were easy and which were hard.
____________________
Testing the Multidimensionality of the Inventory of School Motivation in a Dutch Student Sample
Hanke Korpershoek, Kun Xu, Magdalena Mo Ching Mok, Dennis M. McInerney, and Greetje van der Werf
Abstract
A factor analytic and a Rasch measurement approach were applied to evaluate the multidimensional nature of the school motivation construct among more than 7,000 Dutch secondary school students. The Inventory of School Motivation (McInerney and Ali, 2006) was used, which intends to measure four motivation dimensions (mastery, performance, social, and extrinsic motivation), each comprising of two first-order factors. One unidimensional model and three multidimensional models (4-factor, 8-factor, higher order) were fit to the data. Results of both approaches showed that the multidimensional models validly represented the school motivation among Dutch secondary school pupils, whereas model fit of the unidimensional model was poor. The differences in model fit between the three multidimensional models were small, although a different model was favoured by the two approaches. The need for improvement of some of the items and the need to increase measurement precision of several first-order factors are discussed.
____________________
Measuring Teaching Assistants’ Efficacy
using the Rasch Model
Zi Yan, Chun Wai Lum, Rick Tze Leung Lui, Steven Sing Wa Chu, and, Ming Lui
Abstract
Teaching assistants (TAs) play an influential role in primary and secondary schools. But there is an absence in literature about the TA’s efficacy, and to date no instrument is available for measuring TA’s efficacy. The present study aims to develop and validate a scale (Teaching Assistant Efficacy Scale, TAES) for measuring TA’s efficacy on identified capabilities. A total of 531 teaching assistants from Hong Kong schools participated in the survey. The multidimensional Rasch model was used to analyse the data. The results revealed that a 5-dimension structure of TA’s efficacy was supported. The final 30-item version of TAES assesses TA’s efficacy on learning support, teaching support, behaviour management, cooperation, and administrative support. The Rasch reliabilities for all five dimensions were around 0.90. The 6-category response structure worked well for the scale. Further research was recommended to validate and test the robustness of the TAES both in Hong Kong and elsewhere.
____________________
Detecting Measurement Disturbance Effects: The Graphical Display Of Item Characteristics
Randall E. Schumacker
Abstract
Traditional identification of misfitting items in Rasch measurement models have interpreted the Infit and Outfit z standardized statistic. A more recent approach made possible by Winsteps is to specify “group = 0” in the control file and subsequently view the item characteristic curve for each item against the true probability curve. The graphical display reveals whether an item follows the true probability curve or deviates substantially, thus indicating measurement disturbance. Probability of item response and logit ability are easily copied into data vectors in R software then graphed. An example control file, output item data, and subsequent preparation of an overlay graph for misfit items are presented using Winsteps and R software. For comparison purposes the data are also analyzed using a multi-dimensional (MD) mapping procedure.
____________________
Criteria Weighting with Respect to Institution’s Goals
for Faculty Selection
Sheu Hua Chen, Yen Ting Chen, and Hong Tau Lee
Abstract
Employers frequently select an employee among numerous candidates. They have to evaluate these candidates by multiple criteria that raise the problem of how to determinate the relative importance of these criteria. Traditionally, when engaging a new employee, the employer will develop a set of criteria and their associate weightings according with its institution’s goals. However, the weight setting also reflects the priority of goals. It is frequently ignored. That is to say, it is necessary to recheck whether the weighting set reflects the institution’s goals’ priority appropriately. In this research, we proposed a mechanism that gives the chance to review the criteria weighting to see if it is adequately satisfies its institution’s actual goals. This double-check procedure can further help the employer select appropriate personnel for his or her institution.
____________________
Gendered Language Attitudes: Exploring Language as a Gendered Construct using Rasch Measurement Theory
Kris A. Knisely and Stefanie A. Wind
Abstract
Gendered language attitudes (GLAs) are gender-based perceptions of language varieties based on connections between gender-related and linguistic characteristics of individuals, including the perception of language varieties as possessing degrees of masculinity and femininity. This study combines substantive theory about language learning and gender with a model based on Rasch measurement theory to explore the psychometric properties of a new measure of GLAs. Findings suggest that GLAs is a unidimensional construct and that the items used can be used to describe differences among students in terms of the strength of their GLAs. Implications for research, theory, and practice are discussed. Special emphasis is given to the teaching and learning of languages.
____________________
Vol. 16, No. 2 Spring 2015
Implications of Removing Random Guessing from Rasch Item Estimates in Vertical Scaling
Ida Marais
Abstract
Large scale testing programs often involve a number of assessments that include multiple choice items administered
to students in different grades. The Rasch model is sometimes used to transform the raw test scores onto
a common vertical scale of proficiency. However, with multiple choice items students may guess and the Rasch
model makes no provision for guessing. In this study a procedure for removing random guessing from Rasch
item estimates is applied to two assessments. The results showed that, when there was guessing, the vertical
scale of proficiency was shrunk. Moreover, the highly proficient students were penalised more than the low
proficiency students were advantaged by guessing. After removing the effect of guessing from the estimates, the
vertical scale was more spread out. Also, because proficient students answer the more difficult items correctly
at a greater rate than the less proficient students, they obtained the greatest benefit when the effect of guessing
had been removed from the estimates of these items.
____________________
Funding Medical Research Projects: Taking into Account Referees’ Severity and Consistency through Many-Faceted Rasch Modeling of Projects’ Scores
Luigi Tesio, Anna Simone, Mariuzs T. Grzeda, Michela Ponzio, Gabriele Dati, Paola Zaratin, Laura Perucca, and Mario A. Battaglia
Abstract
The funding policy of research projects often relies on scores assigned by a panel of experts (referees). The nonlinear
nature of raw scores and the severity and inconsistency of individual raters may generate unfair numeric project
rankings. Rasch measurement (“many-facets” version, MFRM) provides a valid alternative to scoring. MFRM was
applied to the scores achieved by 75 research projects on multiple sclerosis sent in response to a previous annual call
by FISM-Italian Foundation for Multiple Sclerosis. This allowed to simulate, a posteriori, the impact of MFRM on
the funding scenario.
The applications were each scored by 2 to 4 independent referees (total = 131) on a 10-item, 0-3 rating scale called
FISM-ProQual-P. The rotation plan assured “connection” of all pairs of projects through at least 1 shared referee.
The questionnaire fulfilled satisfactorily the stringent criteria of Rasch measurement for psychometric quality (unidimensionality,
reliability and data-model fit). Arbitrarily, 2 acceptability thresholds were set at a raw score of 21/30
and at the equivalent Rasch measure of 61.5/100, respectively. When the cut-off was switched from score to measure 8
out of 18 acceptable projects had to be rejected, while 15 rejected projects became eligible for funding. Some referees,
of various severity, were grossly inconsistent (z-std fit indexes <–1.9 or >1.9)
The FISM-ProQual-P questionnaire seems a valid and reliable scale. MFRM may help the decision-making process
for allocating funds to MS research projects but also in other fields. In repeated assessment exercises it can help the
selection of reliable referees. Their severity can be steadily “calibrated”, thus obviating the need to “connect” them
with other referees assessing the same projects.
____________________
A Family of Rater Accuracy Models
Edward W. Wolfe, Hong Jiao, and Tian Song
Abstract
Engelhard (1996) proposed a rater accuracy model (RAM) as a means of evaluating rater accuracy in rating
data, but very little research exists to determine the efficacy of that model. The RAM requires a transformation
of the raw score data to accuracy measures by comparing rater-assigned scores to true scores. Indices computed
based on raw scores also exist for measuring rater effects, but these indices ignore deviations of rater-assigned
scores from true scores. This paper demonstrates the efficacy of two versions of the RAM (based on dichotomized
and polytomized deviations of rater-assigned scores from true scores) to two versions of raw score rater
effect models (i.e., a Rasch partial credit model, PCM, and a Rasch rating scale model, RSM). Simulated data
are used to demonstrate the efficacy with which these four models detect and differentiate three rater effects:
severity, centrality, and inaccuracy. Results indicate that the RAMs are able to detect, but not differentiate, rater
severity and inaccuracy, but not rater centrality. The PCM and RSM, on the other hand, are able to both detect
and differentiate all three of these rater effects. However, the RSM and PCM do not take into account true scores
and may, therefore, be misleading when pervasive trends exist in the rater-assigned data.
____________________
Using PISA as an International Benchmark in Standard Setting
Gary W. Phillips and Tao Jiang
Abstract
This study describes how the Programme for International Student Assessment (PISA) can be used to internationally
benchmark state performance standards. The process is accomplished in three steps. First, PISA items are
embedded in the administration of the state assessment and calibrated on the state scale. Second, the international
item calibrations are then used to link the state scale to the PISA scale through common item linking. Third, the
statistical linking results are used as part of the state standard setting process to help standard setting panelists
determine how high their state standards need to be in order to be internationally competitive. This process was
carried out in Delaware, Hawaii, and Oregon, in three subjects—science, mathematics and reading with initial
results reported by Phillips and Jiang (2011). An in depth discussion of methods and results are reported in this
article for one subject (mathematics) and one state (Hawaii).
____________________
Investigating the Function of Content and Argumentation Items in a Science Test: A Multidimensional Approach
Shih-Ying Yao, Mark Wilson, J. Bryan Henderson, and Jonathan Osborne
Abstract
The latest national science framework has formally stated the need for developing assessments that test both
students’ content knowledge and scientific practices. In response to this call, a science assessment that consists
of (a) content items that measure students’ understanding of a grade eight physics topic and (b) argumentation
items that measure students’ argumentation competency has been developed. This paper investigated the function
of these content and argumentation items with a multidimensional measurement framework from two perspectives.
First, we performed a dimensionality analysis to investigate whether the relationship between the content
and argumentation items conformed to test deign. Second, we conducted a differential item functioning analysis
in the multidimensional framework to examine if any content or argumentation item unfairly favored students
with an advanced level of English literacy. Methods and findings of this study could inform future research on
the validation of assessments measuring higher-order and complex abilities.
____________________
Using a Rasch Model to Account for Guessing as a Source of Low Discrimination
Stephen Humphry
Abstract
The most common approach to modelling item discrimination and guessing for multiple-choice questions is
the three parameter logistic (3PL) model. However, proponents of Rasch models generally avoid using the 3PL
model because to model guessing entails sacrificing the distinctive property and advantages of Rasch models.
One approach to dealing with guessing based on the application of Rasch models is to omit responses in which
guessing appears to play a significant role. However, this approach entails loss of information and it does not
account for variable item discrimination. It has been shown, though, that provided specific constraints are met,
it is possible to parameterize discrimination while preserving the distinctive property of Rasch models. This
article proposes an approach that uses Rasch models to account for guessing on standard multiple-choice items
simply by treating it as a source of low item discrimination. Technical considerations are noted although a
detailed examination of such considerations is beyond the scope of this article.
____________________
Chi-Squared Test of Fit and Sample Size — A Comparison between a Random Sample Approach and a Chi-Square Value Adjustment Method
Daniel Bergh
Abstract
Chi-square statistics are commonly used for tests of fit of measurement models. Chi-square is also sensitive
to sample size, which is why several approaches to handle large samples in test of fit analysis have been
developed. One strategy to handle the sample size problem may be to adjust the sample size in the analysis of fit.
An alternative is to adopt a random sample approach. The purpose of this study was to analyze and to compare
these two strategies using simulated data.
Given an original sample size of 21,000, for reductions of sample sizes down to the order of 5,000 the adjusted
sample size function works as good as the random sample approach. In contrast, when applying adjustments
to sample sizes of lower order the adjustment function is less effective at approximating the chi-square value
for an actual random sample of the relevant size. Hence, the fit is exaggerated and misfit under-estimated using
the adjusted sample size function. Although there are big differences in chi-square values between the two approaches
at lower sample sizes, the inferences based on the p-values may be the same.
____________________
Properties of the Tampa Scale for Kinesiophobia across Workers with Different Pain Experiences and Cultural Backgrounds: A Rasch Analysis
M. B. Jørgensen, E. Damsgård, A. Holtermann, A. Anke, K. Søgaard, and C. Røe
Abstract
The main aim of this study was to evaluate whether the construct validity of the Tampa Scale for Kinesiophobia
(TSK) is consistent with respect to its scaling properties, unidimensionality and targeting among workers with
different levels of pain. The 311 participating Danish workers reported kinesiophobia by TSK (13 statement
version) and number of days with pain during the past year (less than 8 days, less than 90 days and greater than
90 days). A Rasch analysis was used to evaluate the measurement properties of the TSK in the workers across
pain levels, ages, genders and ethnicities. The TSK did not fit the Rasch model, but removing one item solved
the poorness of fit. Invariance was found across the pain levels, ages and genders. Thus, with a few modifications,
the TSK was shown to capture a unidimensional construct of fear of movement in workers with different
pain levels, ages, and genders.
____________________
Vol. 16, No. 3 Fall 2015
Comparison of Models and Indices for Detecting Rater Centrality
Edward W. Wolfe and Tian Song
Abstract
To date, much of the research concerning rater effects has focused on rater severity/leniency. Consequently,
other potentially important rater effects have largely ignored by those conducting operational scoring projects.
This simulation study compares four rater centrality indices (rater fit, residual-expected correlations, rater slope,
and rater threshold variance) in terms of their Type I and Type II error rates under varying levels of centrality
magnitude, centrality pervasiveness, and rating scale construction when each of four latent trait models is fitted
to the simulated data (Rasch rating scale and partial credit models and the generalized rating scale and partial
credit models). Results indicate that the residual-expected correlation may be most appropriately sensitive to
rater centrality under most conditions.
____________________
Measuring Psychosocial Impact of CBRN Incidents by the Rasch Model
Stef van Buuren and Diederik J. D. Wijnmalen
Abstract
An effective response to chemical, biological, radiological and nuclear (CBRN) incidents requires capability
planning based upon an assessment of risks in which all types of possible consequences of such incidents have
been taken into account. CBRN incidents can have a wide range of consequences of which psychological and
social effects (possibly leading to societal unrest) are often pointed out as very likely to occur. The goal of our
research was to establish an objective measurement of psychosocial impact of CBRN incidents with the use of
the Rasch model. We created a list of eleven items, each of which tapped into an aspect of psychosocial impact
of incidents. Eleven judges scored ten CBRN scenarios on this list of items. Two items needed to be removed
due to misfit. The resulting nine-items test fitted the Rasch model well. Three items showed mild forms of differential
item functioning, but were retained in the test. The reliability of the instrument was 0.83. The scale can
be used to quantitatively measure the inherently qualitative nature of psychosocial impact of CBRN incident
scenarios in order to better compare this type of impact with quantitative impact types such as number of casualties,
costs, etc. Administration of the scale is simple and takes about one minute per scenario. We recommend
wider use of the Rasch model for improving the quality of total impact measurement in case of being faced with
both qualitative and quantitative types of impact.
____________________
Using the Partial Credit Model to Evaluate the Student Engagement in Mathematics Scale
Micela Leis, Karen M. Schmidt, and Sara E. Rimm-Kaufman
Abstract
The Student Engagement in Mathematics Scale (SEMS) is a self-report measure that was created to assess
three dimensions of student engagement (social, emotional, and cognitive) in mathematics based on a single
day of class. In the current study, the SEMS was administered to a sample of 360 fifth graders from a large
Mid-Atlantic district. The Rasch partial credit model (PCM) was used to analyze the psychometric properties
of each sub-dimension of the SEMS. Misfitting items were removed from the final analysis. In general, items
represented a range of engagement levels. Results show that the SEMS is an effective measure for researchers
and practitioners to assess upper elementary school students’ perception of their engagement in math. The paper
concludes with several recommendations for researchers considering using the SEMS.
____________________
Estimation of Parameters of the Rasch Model and Comparison of Groups in Presence of Locally Dependent Items
Mohand-Larbi Feddag, Myriam Blanchin, Véronique Sébille, and Jean-Benoit Hardouin
Abstract
Measurement specialists routinely assume examinee responses to test are independent of one another. However,
previous research has shown that many tests contain item dependencies, and not accounting for these dependencies
leads to misleading estimates of item and person parameters. In this paper, the marginal maximum likelihood
estimation in Rasch model with the violation of the local independence is studied. The power of the Wald test
on a group effect parameter on the latent traits in cross-sectional studies is examined under the local independence
and the local item dependence assumptions. The different results are illustrated with simulation studies.
____________________
Help Me Tell My Story: Development of an Oral Language Measurement Scale
Patrick Charles, Michelle Belisle, Kevin Tonita, and Julie Smith
Abstract
Help Me Tell My Story (HMTMS) is an assessment tool that uses a holistic approach and an electronic application
to measure the oral language development of pre-kindergarten and kindergarten children. It includes
access to an online portal that provides meaningful information to caregivers, educators and administrators.
This study examines the psychometric characteristics of one of the five questionnaires included in the HMTMS
assessment, which explores the ability of children to talk to family members, friends and teachers. It uses an
unrestricted partial credit Rasch version to analyse data from 844 children. Results indicate that, although we
obtained a modest reliability index, the scale’s psychometric characteristics are within effective ranges, as no
response dependency was found and the items constitute a unidimensional scale. There is no differential item
functioning (DIF) related to gender, grade levels and ethnicity on this scale. Thus this assessment tools is appropriate
for use in early years oral language measurement.
____________________
A Dual-purpose Rasch Model with Joint Maximum Likelihood Estimation
Xiao Luo and John T. Willse
Abstract
In practice, there is a growing need of reporting both overall score for the ranking/decision-making purpose and
subscores for the diagnostic purpose. The Rasch model with subdimensions (RMS) was employed in this study
to address this problem. A joint maximum likelihood estimation (JMLE) procedure was proposed to obtain computationally
efficient estimation for this model. A simulation study was conducted to investigate the properties
of this model with the JMLE procedure in conditions with varying sample size, test lengths and subdimension
loading structure. Results indicated that in general, parameters were estimated well using the JMLE procedure.
The item parameters and overall ability parameters in RMS were in accordance with parameters obtained from
the Rasch model.
____________________
Using Rasch Analysis to Evaluate Accuracy of Individual Activities of Daily Living (ADL) and Instrumental Activities of Daily Living (IADL) for Disability Measurement
Bruce Friedman and Yanen Li
Abstract
Our study objectives were to examine the accuracy of individual activities of daily living (ADLs) and instrumental
ADLs (IADLs) for disability measurement, and determine whether dependence or difficulty is more useful for
disability measurement. We analyzed data from 499 patients with 2+ ADLs or 3+ IADLs who participated in a
home visiting nurse intervention study, and whose function had been assessed at study baseline and 22 months.
Rasch analysis was used to evaluate accuracy of 24 individual ADL and IADL items. The individual items
differed in the amount of information provided in measuring functional disability along the range of disability,
providing much more information in (usually) one part of the range. While nearly all of the Item Information
Curves (IICs) for the ADL dependence, IADL difficulty, and IADL dependence items were unimodal with one
information peak each, the IICs for ADL difficulty exhibited a bimodal pattern with two peaks. Which of the
individual items performed better in disability measurement varied by the extent of functional disability (i.e., by
how disabled the patients were). The information peaks of most ADLs and many IADLs rise or drop steeply in
a relatively short distance. Thus, whether dependence or difficulty is superior often changes very quickly along
the disability continuum. There was considerable heterogeneity in which individual items provided the most
and the least information at the three points of interest examined across the disability range (–2 SD units, mean,
+2 SD units). While the disability region (low, medium, and high disability) for which each individual item
provided the most information remained quite stable between baseline and 22 months for ADL difficulty, IADL
difficulty, and IADL dependence, relatively large shifts occurred for ADL dependence items. At the disability
mean dependence items offered more information for assessment than difficulty. While ADLs also provided
more information at –2 and +2 SD units, there was more heterogeneity at these points for IADLs, with little
difference between dependence and difficulty assessment for some IADLs.
____________________
Vol. 16, No. 4 Winter 2015
Using the Rasch Model to Measure
the Extent to which Students Work Conceptually
with Mathematics
Eivind Kaspersen
Abstract
Differences between working conceptually and procedurally with mathematics are well documented. In short, working procedurally can be characterized as learning and applying ‘rules without reason.’ Working conceptually, in contrast, means creating and applying a web of knowledge. To continue this line of research, an instrument that is able to measure the level of conceptual work, and that is based on the basic requirements of measurement, is desireable. As such, this paper presents a Rasch calibrated instrument that measures the extent to which students work conceptually with mathematics. From a sample of 133 student teachers and 185 Civil Engineering students, 20 items are concluded as being productive for measurement.
____________________
Rasch Model Parameter Estimation via the Elastic Net
Jon-Paul Paolino
Abstract
In this paper we investigate the novel method, penalized joint maximum likelihood estimation (PJMLE), for estimating the parameters of the Rasch model (Rasch, 1960). Here we use joint maximum likelihood estimation (JMLE) along with elastic net penalization using the glmnet package (Friedman, Hastie, and Tibshirani,
2010) in R to obtain estimates for item difficulties and examinee abilities. Through simulation we compared the accuracy of PJMLE to conditional maximum likelihood estimation (CMLE), marginal maximum likelihood estimation (MMLE), and marginal Bayes modal estimation (MBME). We show that PJMLE successfully estimates parameters of a Rasch model when the number of items is greater than the number of examinees, which is
a shortcoming of traditional estimation techniques. In addition, we further show that PJMLE performs similarly to traditional techniques when the number of examinees is greater than the number of assessment items without specifying a mixing distribution or a prior distribution.
____________________
A Rasch Analysis of the KeyMath-3 Diagnostic Assessment
Helyn Kim, Karen M. Schmidt, William M. Murrah, Claire E. Cameron, and David Grissmer
Abstract
Effectively assessing children’s academic development can help school professionals make placement decisions and prepare appropriate instructional supports. The KeyMath-3 Diagnostic Assessment (Connolly, 2008) is a widely used assessment of children’s mathematical abilities; however, despite much use, the measurement properties of the KeyMath-3 DA have not been examined, aside from the development and standardization
phases. The current study conducted a Rasch analysis of the Basic Concepts content area of the KeyMath-3 DA in a diverse sample of 308 young children to assess the quality of the assessment. Rasch analytic procedures examined unidimensionality, item and person fit statistics, reliability, and item hierarchy. Misfitting items were further examined, and response patterns were modified. In general, results show that the Basic Concepts subscale is a good measure of the underlying construct of young children’s understanding of the basic concepts in mathematics. Implications are discussed.
____________________
Psychometric Properties of the Attitudes toward Physical Activity Scale: A Rasch Analysis Based on Data From Five Locations
Magdalena Mo Ching Mok, Ming Kai Chin, Shihui Chen, Arunas Emeljanovas, Brigita Mieziene, Michal Bronikowski, Ida Laudanska-Krzeminska, Ivana Milanovic, Milan Pasic, Govindasamy Balasekaran, Kia Wang Phua, and Daga Makaza
Abstract
This article describes the development and validation of the Attitudes toward Physical Activity Scale (APAS) to measure the attitudes, beliefs, and self-efficacy toward physical activity by children at the primary school level. The framework included: physical fitness, self-efficacy, personal best goal orientation in physical activity, interest in physical activity, importance of physical activity, benefits of physical activity, contributions of video exercise to learning in school subjects, contributions of video exercise to learning about health and environmental support. The sample comprised of 630 school students between grades 1 and 7 from five countries, namely Lithuania (29%), Poland (26%), Serbia (19%), Singapore (16%) and Zimbabwe (11%). Rasch analysis found empirical evidence in support of measurement validity of the APAS in terms of Rasch item reliabilities, unidimensionality, effectiveness of response categories, and absence of gender differential item functioning (DIF). The validation of the APAS according to the Rasch model meant that a dependable tool was established for gauging programme effectiveness of intervention programs on physical activity of primary school children in classroom settings at various geographical locations globally.
____________________
Development and Analysis of a Scale for Meauring Teachers’ Sense of Efficacy in Urban Schools (SEUS)
Mary Garner, Julie Kokan, Kathy Annis, Mark Baker, Maggie Phillips, Catherine Head, Doug Hearrington, Daniel Yanosky, and Marie Holbein
Abstract
Research in teacher self-efficacy has a long history that can be traced back to Bandura (1986) and has been shown to be linked to teacher performance. This article presents evidence for teacher self-efficacy in urban schools, a construct that is separate from but related to the more general construct of teacher self-efficacy. An instrument was developed and validated by a team of university faculty, urban teachers, and school administrators. The Teachers’ Sense of Efficacy in Urban Schools (SEUS) is a 15-item instrument designed to address factors that are important for success in teaching in an urban environment, including working effectively with English language learners, students with disabilities, economically disadvantaged students, cultural diversity, literacy, technology, differentiation, and assessment data. The present study analyzes SEUS on multiple levels, using the Rasch partial credit model.
____________________
Testing the Multidimensionality in Teacher Interpersonal Behavior: Validating the Questionnaire on Teacher Interaction Using the Rasch Measurement Model
Gavin W. Fulmer and Quek Choon Lang
Abstract
This study investigated the perceptions of 1235 students of their form teachers’ interpersonal behaviors across 40 classrooms in 24 Singaporean secondary schools. The 32-item Questionnaire on Teacher Interaction (QTI) survey was administered to obtain the initial quantitative data of teacher behaviors perceived by the students in these classrooms. The eight scales of QTI are: Leadership, Helping/Friendly, Understanding, Student Responsibility/Freedom, Uncertain, Dissatisfied, Admonishing, and Strict. The Rasch measurement model was used to estimate students’ traits with respect to each subscale, and then to examine its proposed multidimensional structure. Findings demonstrate overall good fit of the responses with the Rasch model for each subscale. Findings also support the hypothesized relationships among the eight dimensions proposed for the QTI.
____________________
Planning a Study for Testing the Rasch Model given Missing Values due to the use of Test-booklets
Takuya Yanagida, Klaus D. Kubinger, and Dieter Rasch
Abstract
Though calibration of an achievement test within a psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to statistical foundations. However, Kubinger, Rasch, and Yanagida (2009, 2011) suggested an approach for the determination of sample size according to a given Type-I- and Type-II-risk and a certain effect of model contradiction when testing the Rasch model. The approach uses a three-way analysis of variance design with mixed classification. For the while, their simulation studies deal with complete data, meaning every examinee is administered with all of the items of an item pool. The simulation study now presented in this paper deals with the practical relevant case, in particular for large-scale assessments, that item presentation happens to use several test-booklets. As a consequence, there are missing values by design. Therefore, the question to be considered is, whether this approach works in this case as well. Besides the fact, that data are not normally distributed but there is a dichotomous variable (an examinee either solves an item or fails to solve it), only a single entry for each cell exists in the given three-way analysis of variance design – if at all, due to missing values. Hence, the obligatory test-statistic’s distribution may not be retained, in contrast to the case of having no missing values. The result of our simulation study, despite applying only to a very special scenario, is that this approach works, indeed: Whether test-booklets were used or every examinee is administered all of the items changes nothing in respect to the actual Type-I-risk or to the power of the test, given almost the same amount of information of examinees per item. However, as the results are limited to a special scenario, we currently recommend any interested researcher to simulate the appropriate one in advance by him/her-self.
____________________
The Reliability and Validity of the Power-Load-Margin Inventory: A Rasch Analysis
Patrick C. Hardigan, Stanley R. Cohen, and Kathleen P. Hagen
Abstract
Margin is a function of the relationship of stress to strength. The greater the margin, the more likely students are able to successfully navigate academic structures. This study examined the psychometric properties of a newly created instrument designed to measure margin—the Power-Load-Margin Inventory (PLMI). The PLMI was created using eight domains: (A) Student’s aptitude and ability, (B) Course structure, (C) External motivation, (D) Student health, (E) Instructor style, (F) Internal motivation, (G) Life opportunities, and (H) University support structure. A three-point response scale was used to measure the domains: (1) stress, (2) neither stress nor strength, and (3) strength. The PLMI was administered to 586 medical, dental, and pharmacy students. A Rasch rating scale model was used to examine the psychometric properties of the PLMI. The PLMI demonstrated acceptable psychometric properties for use with pharmacy, dental, and medical students. The PLMI’s primary weakness was with the subscales’ reliability. We attribute this to the small number of items per subscale.
____________________