Journal of Applied Measurement
P.O. Box 1283
Maple Grove, MN 55311
Volume 18, 2017, Article Abstracts
Vol. 18, No. 1, Spring 2017
Constructing an Outcome Measure of Occupational Experience: An Application of Rasch Measurement Methods
Brett Berg, Karen Atler, and Anne G. Fisher
Abstract
Rasch methods were used to evaluate and further develop the Daily Experiences of Pleasure, Productivity,
and Restoration Profile (PPR Profile) into a health outcome measure of occupational experience. Analyses of
263 participant PPR Profiles focused on rating scale structure, dimensionality, and reliability. All rating scale
categories increased with the intended meaning of the scales, but only 20 of the 21 category measures fit the
Rasch rating scale model (RRSM). Several items also did not fit the RRSM and results of residual principal
components analyses suggested possible second dimensions in each scale. More importantly, reliability coefficients
were very low and participants could not be separated into more than one group as demonstrated by low
person separation indices. The authors offer several recommendations for the next steps in the development of
the PPR Profile as a health outcome measure of occupational experience.
____________________
Comparing Imputation Methods for Trait Estimation Using the Rating Scale Model
Rose E. Stafford, Christopher R. Runyon, Jodi M. Casabianca, and Barbara G. Dodd
Abstract
This study examined the performance of four methods of handling missing data for discrete response options on
a questionnaire: (1) ignoring the missingness (using only the observed items to estimate trait levels); (2) nearestneighbor
hot deck imputation; (3) multiple hot deck imputation; and (4) semi-parametric multiple imputation.
A simulation study examining three questionnaire lengths (41-, 20-, and 10-item) crossed with three levels of
missingness (10%, 25%, and 40%) was conducted to see which methods best recovered trait estimates when
data were missing completely at random and the polytomous items were scored with Andrich’s (1978) rating
scale model. The results showed that ignoring the missingness and semi-parametric imputation best recovered
known trait levels across all conditions, with the semi-parametric technique providing the most precise trait
estimates. This study demonstrates the power of specific objectivity in Rasch measurement, as ignoring the
missingness leads to generally unbiased trait estimates.
____________________
Rasch Analysis of a Behavioral Checklist for the Assessment of Pain in Critically Ill Adults
Christophe Chénier, Gilles Raîche, Nadine Talbot, Bianca Carignan, and Céline Gélinas
Abstract
Patients hospitalized in the intensive care unit (ICU) are often unable to report their pain, which is a problem
since untreated pain is associated with negative health outcomes. The use of behavioral pain scales are recommended
for the detection of the presence of pain in this vulnerable population. Previous validation studies have
used classical techniques, and several psychometrics properties remain unknown. In this paper, data obtained
from a behavioral checklist of dichotomized items was utilized to evaluate the instrument’s dimensionality, its
construct validity and its capacity to distinguish between levels of pain by using Rasch measurement. A sample
of 239 ICU patients was used to collect the data. Results showed that, while unidimensionality was acceptable,
concerns remained about the local independence and item fit indices. A third of the items showed misfit. Finally,
while items had a great reliability (0.97), persons’ measures had a rather low reliability (0.62) and only 1.28
strata of pain could be distinguished. The narrow range of pain levels in the sample could explain this poor
performance and further study is needed, with a sample exhibiting a wider range of pain levels.
____________________
Scale Anchoring with the Rasch Model
Adam E. Wyse
Abstract
Scale anchoring is a method to provide additional meaning to particular scores at different points along a score
scale by identifying representative items associated with the particular scores. These items are then analyzed
to write statements of what types of performance can be expected of a person with the particular scores to help
test takers and other stakeholders better understand what it means to achieve the different scores. This article
provides simple formulas that can be used to identify possible items to serve as scale anchors with the Rasch
model. Specific attention is given to practical considerations and challenges that may be encountered when applying
the formulas in different contexts. An illustrative example using data from a medical imaging certification
program demonstrates how the formulas can be applied in practice.
____________________
Evaluating Model-Data Fit by Comparing Parametric and Nonparametric Item Response Functions: Application of a Tukey-Hann Procedure
Jeremy Kyle Jennings and George Engelhard, Jr
Abstract
This study describes an approach for examining model-data fit for the dichotomous Rasch model using Tukey-
Hann item response functions (TH-IRFs). The procedure proposed in this paper is based on an iterative version
of a smoothing technique proposed by Tukey (1977) for estimating nonparametric item response functions (IRFs).
A root integrated squared error (RISE) statistic (Douglas and Cohen, 2001) is used to compare the TH-IRFs to
the Rasch IRFs. Data from undergraduate students at a large university are used to demonstrate this iterative
smoothing technique. The RISE statistic is used for comparing the item response functions to assess model-data
fit. A comparison between the residual based Infit and Outfit statistics and RISE statistics are also examined.
The results suggest that the RISE statistic and TH-IRFs provide a useful analytical and graphical approach for
evaluating item fit. Implications for research, theory and practice related to model-data fit are discussed.
____________________
Rasch Derived Teachers’ Emotions Questionnaire
Kristin L. K. Koskey, Renee R. Mudrey, and Wondimu Ahmed
Abstract
The purpose of this research was to estimate the reliability of the scores produced from and validity of the
inferences drawn from the revised 90-item Teachers’ Emotion Questionnaire consisting of three measures:
frequency of emotional expressivity, self-efficacy for regulation of emotional expressivity when teaching, and
self-efficacy for regulation of context-specific emotional expressivity. A void exists in an instrument assessing
teachers’ regulation and communication of their emotions. One-hundred seventeen practicing teachers participated
in this study at Time 1 and 46 at Time 2. Rasch rating scale analyses indicated sufficient item and person
separation and reliability and some support for the construct validity of the inferences drawn from the measures.
Test re-test reliability for the person estimates was supported for all three measures over a four-week period:
r =.592, p < .001, r = .473, p < .01, and r =.641, p < .001, respectively. Concurrent validity for the self-efficacy
for regulation of emotional expressivity when teaching measure with the re-appraisal and suppression sub-scales
on the Emotional Regulation Questionnaire (Gross and John, 2003) was supported at Time 1. Modifications to
rating scales and future directions for assessing teachers’ emotions based on these results are discussed.
____________________
Measuring Alcohol Marketing Engagement: The Development and Psychometric Properties of the Alcohol Marketing Engagement Scale
Angela Robertson, David T. Morse, Kristina Hood, and Courtney Walker
Abstract
Ample evidence exists in support of the influence of media, both traditional and electronic, on perceptions
and engagement with alcohol marketing. We describe the development, calibration, and evidence for technical
quality and utility for a new measure, the Alcohol Marketing Engagement Scale. Using two samples of college
undergraduates (n1 = 199, n2 = 732), we collected field test responses to a total of 13 items. Initial support for
scale validity is presented via correlations with attributes previously shown to be related to alcohol engagement.
While the joint map of estimated scale locations of items and respondents indicates the need for further scale
development, the results of the present analyses are promising. Implications for use in research are discussed.
____________________
Vol. 18, No. 2, Summer 2017
Developing an Engineering Design Process Assessment using Mixed Methods
Stefanie A. Wind, Meltem Alemdar, Jeremy A. Lingle, Jessica D. Gale, and Roxanne A. Moore
Abstract
Recent reforms in science education worldwide include an emphasis on engineering design as a key component of student proficiency in the Science, Technology, Engineering, and Mathematics disciplines. However, relatively little attention has been directed to the development of psychometrically sound assessments for engineering. This study demonstrates the use of mixed methods to guide the development and revision of K-12 Engineering Design Process (EDP) assessment items. Using results from a middle-school EDP assessment, this study illustrates the combination of quantitative and qualitative techniques to inform item development and revisions. Overall conclusions suggest that the combination of quantitative and qualitative evidence provides an in-depth picture of item quality that can be used to inform the revision and development of EDP assessment items. Researchers and practitioners can use the methods illustrated here to gather validity evidence to support the interpretation and use of new and existing assessments.
____________________
Psychometric Validation of the 10-item Connor-Davidson Resilience Scale
John Ehrich, Angela Mornane, and Tim Powers
Abstract
Resilience is the personality trait of having positive dispositions which enable individuals to cope with stressful situations. Hence, a reliable resilience scale can provide useful information on understanding and treating individuals suffering from stress and trauma. The 10-item Connor-Davidson Resiliance Scale (CD-RISC-10) is a candidate scale. However, very little psychometric research has been conducted on this scale and, moreover, psychometric analyses to date have not been conclusive. To attain further evidence of the scale’s psychometric properties, we tested the CD-RISC-10 on 288 adult Education major students at an Australian University using both traditional (factor analyses) and modern (Rasch) measurement approaches. Factor analyses indicated good psychometric functioning of the scale. However, Rasch modelling revealed evidence of item misfit and multiple dimensions. Optimal performance was achieved after the removal of two misfitting items indicating a well-functioning 8-item scale.
____________________
The Use of Differential Item Functioning (DIF) Analysis to Distinguish Between Similar Job Roles
Nicole M. Risk and James R. Fidler
Abstract
Two primary roles in the clinical laboratory are those of Medical Technologist (MT) and Medical Laboratory Technician (MLT). Job analyses, which form the foundation of test blueprints employed for credentialing practitioners, suggest a reasonable amount of overlap in the tasks performed by MTs and MLTs. However, credentialing assessments must clearly distinguish between the two roles and ensure that they address competencies appropriate to each practice designation. Differential item functioning (DIF) analysis techniques were applied to explore and differentiate the two laboratory practitioner job roles as an aspect of examination development. Results from the analysis suggest a high degree of similarity between these two groups in terms of scope of tasks performed. Subject matter expert interpretation suggests that the assessments are more appropriately differentiated by underlying level of task knowledge rather than scope of tasks. DIF may be applicable to other exploratory investigations that seek to differentiate job roles comprised of common competencies.
____________________
PSM7 and PSM8: Validating Two Problem-solving Measures
Jonathan D. Bostic, Toni A. Sondergeld, Timothy Folger, and Lance Kruse
Abstract
New mathematics standards were adopted broadly across the United States of America between 2011-2013. Problem solving is a central facet of these new standards. Given new standards and the prominence of mathematical problem solving, there is a need for valid and reliable assessments that measure students’ abilities related to those standards. Moreover, Rasch measurement techniques support psychometric analyses during validation studies, effectively measuring students’ and items’ properties in ways not afforded by true score theory. This manuscript builds upon past research (see Bostic and Sondergeld, 2015a, 2015b) with a validity study of two related problem-solving measures for grades seven and eight. Results from this validation study indicated that the problem-solving measures for grades seven and eight had sufficient evidence for their use.
____________________
Infit and Outfit: Interpreting Statistical Significance and Magnitude of Misfit in Conjunction
Christine E. DeMars
Abstract
In many areas of statistics it is common practice to present both a statistical significance test and an effect size. In contrast, for the Infit and Outfit indices of item misfit, it has historically been common to focus on either the mean square (MS; an index of the magnitude of misfit) or the statistical significance, but not both. If the statistical significance and effect size are to be used together, it is important not only that the Type I error rate matches the nominal alpha level, but also that, for any given magnitude of misfit, the expected value of the MS is independent of sample size. This study confirmed that the average MS for several simulated misfitting items was nearly the same for large and small samples, although necessarily the variance depended on sample size. Thus, if the item fit is statistically significant, the MS appears to be a reasonable index for judging the magnitude of the misfit in the sample, although one must recognize that the estimate of the magnitude will be less stable in small samples, as is true for all effect sizes.
____________________
Measuring Health-related Transportation Barriers in Urban Settings
Sara M. Locatelli, Lisa K. Sharp, Saming T. Syed, Shikhi Bhansari, and Ben S. Gerber
Abstract
Access to reliable transportation is important for people with chronic diseases considering the need for frequent medical visits and for medications from the pharmacy. Understanding of the extent to which transportation barriers, including lack of transportation, contribute to poor health outcomes has been hindered by a lack of consistency in measuring or operationally defining “transportation barriers.” The current study uses the Rasch measurement model to examine the psychometric properties of a new measure designed to capture types of transportation and associated barriers within an urban context. Two hundred forty-four adults with type 2 diabetes were recruited from within an academic medical center in Chicago and completed the newly developed transportation questions as part of a larger National Institutes of Health funded study (ClinicalTrials.gov identifier: NCT01498159). Results suggested a two subscale structure that reflected 1) general transportation barriers and 2) public transportation barriers.
____________________
General Ability or Distinct Scholastic Aptitudes? A Multidimensional Validity Analysis of a Psychometric Higher-Education Entrance Test
Dvir Kleper and Noa Saka
Abstract
The present study explored the construct validity of the Psychometric Entrance Test (PET) for higher education in Israel, as represented by the factorial structure of the scholastic aptitudes it measures, and focused on whether the test presents a single measure of overall ability or a measure of the fields of knowledge that are being tested. In Study 1, we used Exploratory Factor Analysis to generate hypotheses regarding the factorial structure of the test. In Study 2, Confirmatory Factor Analysis was carried out to compare competing models that were constructed based on theoretical considerations and the results of Study 1. The findings indicated that a two-layered hierarchical model, encompassing both a General Ability factor and three scholastic domain-specific factors (Verbal Reasoning, Quantitative Reasoning, and English), showed the best fit. Within the framework of the CFA, several statistical procedures were applied to assess reliability (indicator and complexity) and validity (convergent and divergent.)
____________________
Factorial and Item-Level Invariance of a Principal Perspectives Survey: German and U.S. Principals
Chuang Wang, Dawson R. Hancock, and Ulrich Müller
Abstract
This study examined the factorial and item-level invariance of a survey of principals’ job satisfaction and perspectives about reasons and barriers to becoming a principal with a sample of US principals and another sample of German principals. Confirmatory factor analysis (CFA) and differential item functioning (DIF) analysis were employed at the test and item level, respectively. A single group CFA was conducted first, and the model was found to fit the data collected. The factorial invariance between the German and the US principals was tested through three steps: (a) configural invariance; (b) measurement invariance; and (c) structural invariance. The results suggest that the survey is a viable measure of principals’ job satisfaction and perspectives about reasons and barriers to becoming a principal because principals from two different cultures shared a similar pattern on all three constructs. The DIF analysis further revealed that 22 out of the 28 items functioned similarly between German and US principals.
____________________
Vol. 18, No. 3, Fall 2017
A Facets Analysis of Analytic vs. Holistic Scoring of Identical Short Constructed-Response Items: Different Outcomes and Their Implications for Scoring Rubric Development
Milja Curcin and Ezekiel Sweiry
Abstract
In scoring short constructed-response items it may be possible to apply different rubric types depending on the trait of achievement assessed. A rating scale and a
partial credit Many-Facet Rasch Models (MFRM) were used to investigate whether levels-based (holistic) and hybrid (analytic) scoring rubrics functioned interchangeably when scoring short-
response English reading comprehension test items. Whereas most research in similar contexts has focused solely on rater reliability, the use of MFRM in this study enabled examination of
both the reliability and rating scale functioning aspects of scoring rubrics in parallel. It also enabled consideration of their effects on item and examinee parameters. This more
comprehensive approach allowed the findingsto be linked to general notions of rubric construct-relevance and score interpretation, and to demonstrate an approach to generating evidence for
a more balanced consideration of advantages and disadvantages of each rubric in terms of both reliability and validity.
____________________
Q-Matrix Optimization Based on the Linear Logistic Test Model
Lin Ma and Kathy E. Green
Abstract
This study explored optimization of item-attribute matrices with the linear logistic test model (Fischer, 1973), with optimal models explaining more variance in item difficultydue to identifieditem attributes. Data were 8th-grade mathematics test item responses of two TIMSS 2007 booklets. The study investigated three catego-ries of attributes (content, cognitive process, and comprehensive cognitive process) at two grain levels (larger, smaller) and also compared results with random attribute matrices. The proposed attributes accounted for most of the variance in item difficultyfor two assessment booklets (81% and 65%). The variance explained by the content attributes was very small (13% – 31%), less than variance explained by the comprehensive cognitive process attributes which explained much more variance than the content and cognitive process attributes. The variances explained by the grain level were similar to each other. However, the attributes did not predict the item difficulties of two assessment booklets equall.
____________________
Mapping a Data Modeling and Statistical Reasoning Learning Progression using Unidimensional and Multidimensional Item Response Models
Robert Schwartz, Elizabeth Ayers, and Mark Wilson
Abstract
Data modeling is an approach to basic concepts of data and statistics in middle school that helps students to transform their initial, and often misguided, understandings of variability and chance to forms of reasoning that coordinate chance with variability by designing learning environments that support this reasoning by allowing students to invent and revise models. The Assessing Data Modeling and Statistical Reasoning (ADMSR) proj-ect is a collaborative effort between measurement and learning specialists that has developed a curricular and embedded assessment system based on a framework of seven constructs that describe the elements of statistical learning. Taken together, the seven constructs described above form a learning progression. There are different ways to conceive and measure learning progressions. The approach used by the ADMSR project followed the “four building blocks” approach outlined by the Berkeley Evaluation and Assessment Research (BEAR) Center and the BEAR Assessment System. The finalbuilding block of this approach involves the ap-plication of a measurement model. This paper focuses on the application of unidimensional and multidimensional item response theory (IRT) measurement models to the data from the ADMSR project. Unidimensional IRT models are applied to aid in construct development and validation to see if the proposed theory of development presented by the construct map is supported by the results from an administration of the instrument. Multidi-mensional IRT measurement models are applied to examine the relationships between the seven constructs in the ADMSR learning progression. When applying the multidimensional model, specificlinks between levels of the constructs are analyzed across constructs after the application of a technique to align the seven dimensions.
____________________
Psychometric Properties of the Classroom Assessment Scoring System (Pre-K): Implications for Measuring Interaction Quality in Diverse Early Childhood Settings
Dan Cloney, Cuc Nguyen, Raymond J Adams, Collette Tayler, Gordon Cleveland, and Karen Thorpe
Abstract
The Classroom Assessment Scoring System (CLASS) is an observational instrument assessing the nature of everyday interactions in educational settings. The instrument has strong theoretical groundings; however, prior empirical validation of the CLASS has exposed some psychometric weaknesses. Further the instrument has not been the subject of psychometric analysis at the indicator level. Using a large dataset including observations of 993 Australian classrooms, confirmatoryfactor analysis is used to replicate findingsfrom the few existing validation studies. Item response modelling is used to examine individual indicator behaviour. Latent growth models are used to produce new findingsabout estimating factor scores. Findings show that the CLASS exhibits stable psychometric properties within classrooms over repeated observations. Model fitis improved and factor scores are more reliable when the repeated-observations made in administering the CLASS are accounted for statistically. It is recommended that researchers enforce a fixednumber of repeated observations to minimise bias.
____________________
Ordered Partition Model for Confidence Marking Modeling
Oliver Prosperi
Abstract
Confidencemarking is increasingly used in multiple choice testing situations, but when the Rasch measurement model is applied to the data, only the binary data is used, discarding the information given by the confidene mark-ing. This study shows how Wilson’s ordered partition model (OPM), a member of the Rasch family of models, can be used to model the confidenceinformation. The result is a model which is in strict relation to the binary Rasch model, since the Rasch ICC’s are “split” into a set of curves each representing a confidencelevel. The new model provides a set of item parameters that map the probability of being in each confidencelevel in relation to the test-taker’s ability. The study provides a powerful diagnostic tool to assess item difficult, overconfidenceor misuse of confidence levels but also the fact that a question is particularly tricky or creates a lot of doubt
____________________
Development of an Item Bank for the Assessment of Knowledge on Biology in Argentine University Students
Marcos Cupani, Tatiana Castro Zamparella, Gisella Piumatti, and Grupo Vinculado
Abstract
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants’ test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Córdoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd ? ±2.0), diferential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment iArgentina.
____________________
Vol. 18, No. 4, Winter 2017
The Effects of Item Placement in the Young Schema Questionnaire
Victoria K. Moir, Christopher W. Lee, and Ida Marais
Abstract
The Young Schema Questionnaire (YSQ) was developed to measure ‘Early Maladaptive Schemas’ (EMS), a
construct central to Schema Therapy (ST). Traditionally YSQ items were placed in a grouped format for each
schema but in recent versions of the questionnaire, items are presented in a random order. This study investigates
the effect of item placement on the psychometric properties of the questionnaire. On different occasions,
participants completed two versions of the YSQ short form, one with items grouped according to schemas and
another where items were placed in a random order. Responses were analysed using the polytomous Rasch model
of measurement (partial credit parameterization). Results show that the two versions are not psychometrically
equivalent. There were greater differences between the clinical and non-clinical group means for the grouped
format than the random format and greater person separation. There was more response dependence between
items in the grouped format which has been linked to inflated reliability indices.
____________________
Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting
Kari J. Hodge and Grant B. Morgan
Abstract
Residual-based fit statistics are commonly used as an indication of the extent to which the item response data
fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result
in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis
were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates
were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th
and 97.5th percentile of the simulated item’s INFIT distributions using this 95% confidence-like interval. This
is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT
estimates fell within the 0.7 to 1.3 rule-of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th
and 97.5th percentile of the simulated item’s OUTFIT distributions. This is a 13 percentage point difference in
items that were classified as acceptable. When using the rule-of-thumb ranges for fit estimates the magnitude
of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate
that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions
than traditional rule of thumb critical values.
____________________
Approximate Functional Relationship between IRT and CTT Item Discrimination Indices: A Simulation, Validation, and Practical Extension of Lord’s (1980) Formula
John T. Kulas, Jeffrey A. Smith, and Hui Xu
Abstract
Lord (1980) presented a purely conceptual equation to approximate the nonlinear functional relationship between
classical test theory (CTT; aka true score theory) and item response theory (IRT) item discrimination indices.
The current project proposes a modification to his equation that makes it useful in practice. The suggested
modification acknowledges the more common contemporary CTT discrimination index of a corrected item-total
correlation and incorporates item difficulty. We simulated slightly over 768 trillion individual item responses
to uncover a best-fitting empirical function relating the IRT and CTT discrimination indices. To evaluate the
effectiveness of the function, we applied it to real-world test data from 16 workforce and educational tests. Our
modification results in shifted functional asymptotes, slopes, and points of inflection across item difficulties.
Validation with the workforce and educational tests suggests good prediction under common assumption testing
conditions (approximately normal distribution of abilities and moderate item difficulties) and greater precision
than Lord’s (1980) formula.
____________________
Social Desirability Amongst Chinese Teachers
Randall E. Schumacker and Cathy Ka Weng Hoi
Abstract
Research has suggested that self-reported responses on surveys can be affected by a participant’s tendency
toward social desirability, which would prevent them from revealing their true feelings or behaviors. Researchers
should provide evidence that their results have not been affected by socially desirable responses using the
Marlowe–Crowne Social Desirability Scale (MC-SDS). Past research has used the 33-item original form and
13-item short form of the MC-SDS, although a few researchers have found questionable validation of the 13 item
MC-SDS in several populations. Traditional factor analysis failed to converge on a factor structure. Therefore,
the current research was conducted using a Rasch dichotomous model analysis on the original 33-item MCSDS,
a 20-item MC-SDS, and the 13-item MC-SDS. Findings indicated that the 33-item MC-SDS had several
overlapping items, the 20-item MC-SDS provided a somewhat meaningful measure of social desirability, and the
13-item MC-SDS did not provide sufficient item distribution or separation to produce a meaningful measure. A
researcher should check on the factor structure of the MC-SDS when using it in their research, especially with
different cultural populations.
____________________
I’m scared to go to School! Capturing the Effects of Chronic Daily Fears on Students’ Concept of Self
Rense Lange, Cynthia Martínez-Garrido, and Alexandre Ventura
Abstract
Students may experience considerable fear and stress in school settings, and based on Dweck’s (2006) notion
of “mindset” we hypothesized that fear introduces qualitative changes in students’ self-concepts. Hypotheses
were tested on 3847 third-grade students from nine Iberoamerican countries (Bolivia, Chile, Colombia, Cuba,
Ecuador, Panama, Peru, Spain, and Venezuela), who completed Murillo’s (2007) adaptation of Marsh’ (1988)
SDQ-I. Rasch scaling indicated that the information-content of High-Fear students’ ratings was more localized
across the latent dimension than was that of Low-Fear students, and their ratings also showed less cognitive
variety. The resulting measurement distortions could be captured via logistic regression over the ratings’ residuals.
Also, using training and validation samples (with respectively 60 and 40% of all cases), it proved possible
to predict students’ fear levels and their gender. We see the present findings as a first step towards implementing
an online warning and detection system for signs of bullying among students.
____________________
Confidence to Perform in the Global Marketplace: Constructing and Validating a Survey Instrument for Community College Students
Snejana Slantcheva-Durst and Mingyang Liu
Abstract
This article discusses the construction and validation of an instrument to gauge community college students’
confidence to perform in the global marketplace. The instrument was designed to capture students’ beliefs
in their own abilities to successfully carry out job-related tasks in cross-cultural work environments that are
globally-interconnected and constantly at flux. The instrument items emerged from a comprehensive review
of literature, nationwide workforce skills initiatives, rounds of expert panel analyses, and focus groups. Items
were formulated within Bandura’s framework of self-efficacy, and the instrument was analyzed with Rasch
measurement. The Rasch analysis, conducted on a sample of 741 students, provided evidence of the content
validity of the items, the generalizability of the measure, and its external validity. The instrument can offer
useful feedback to community college internationalization-focused staff in their efforts to assess outcomes of
international initiatives for community college students, thus supporting program assessment, evaluation of
student growth, and institutional decision-making.
____________________
Measuring Anger Types among Malaysian Adolescents using the Rasch Model
Ahmad Zamri Khairani, Nor Shafrin Ahmad, and Mohd Zahuri Khairani
Abstract
Adolescences is an important transitional phase in human development where they experience physiological
as well as psychological changes. Nevertheless, these changes are often understood by teachers, parents, and
even the adolescents themselves. Thus, conflicts exist and adolescents are affected from the conflict physically
and emotionally. An important state of emotions that result from this conflict is anger. This article describes the
development and validation of the 34-item Adolescent Anger Inventory (AAI) to measure types of anger among
Malaysian adolescents. A sample of 2,834 adolescents in secondary school who provide responses that were
analyzed using Rasch model measurement framework. The 4 response category worked satisfactorily for the
scale developed. A total of 11 items did not fit to the model’s expectations, and thus dropped from the final scale.
The scale also demonstrated satisfactory reliability and separation evidence. Also, items in the AAI depicted no
evidence of DIF between 14- and 16-year-old adolescents. Nevertheless, the AAI did not have sufficient items
to target adolescents with a high level of physical aggressive anger.
____________________