Journal of Applied Measurement

P.O. Box 1283

Maple Grove, MN 55311

 


 

Volume 18, 2017, Article Abstracts

 

Vol. 18, No. 1, Spring 2017

Constructing an Outcome Measure of Occupational Experience: An Application of Rasch Measurement Methods

Brett Berg, Karen Atler, and Anne G. Fisher

Abstract

Rasch methods were used to evaluate and further develop the Daily Experiences of Pleasure, Productivity, and Restoration Profile (PPR Profile) into a health outcome measure of occupational experience. Analyses of 263 participant PPR Profiles focused on rating scale structure, dimensionality, and reliability. All rating scale categories increased with the intended meaning of the scales, but only 20 of the 21 category measures fit the Rasch rating scale model (RRSM). Several items also did not fit the RRSM and results of residual principal components analyses suggested possible second dimensions in each scale. More importantly, reliability coefficients were very low and participants could not be separated into more than one group as demonstrated by low person separation indices. The authors offer several recommendations for the next steps in the development of the PPR Profile as a health outcome measure of occupational experience.

____________________

Comparing Imputation Methods for Trait Estimation Using the Rating Scale Model

Rose E. Stafford, Christopher R. Runyon, Jodi M. Casabianca, and Barbara G. Dodd

Abstract

This study examined the performance of four methods of handling missing data for discrete response options on a questionnaire: (1) ignoring the missingness (using only the observed items to estimate trait levels); (2) nearestneighbor hot deck imputation; (3) multiple hot deck imputation; and (4) semi-parametric multiple imputation. A simulation study examining three questionnaire lengths (41-, 20-, and 10-item) crossed with three levels of missingness (10%, 25%, and 40%) was conducted to see which methods best recovered trait estimates when data were missing completely at random and the polytomous items were scored with Andrich’s (1978) rating scale model. The results showed that ignoring the missingness and semi-parametric imputation best recovered known trait levels across all conditions, with the semi-parametric technique providing the most precise trait estimates. This study demonstrates the power of specific objectivity in Rasch measurement, as ignoring the missingness leads to generally unbiased trait estimates.

____________________

Rasch Analysis of a Behavioral Checklist for the Assessment of Pain in Critically Ill Adults

Christophe Chénier, Gilles Raîche, Nadine Talbot, Bianca Carignan, and Céline Gélinas

Abstract

Patients hospitalized in the intensive care unit (ICU) are often unable to report their pain, which is a problem since untreated pain is associated with negative health outcomes. The use of behavioral pain scales are recommended for the detection of the presence of pain in this vulnerable population. Previous validation studies have used classical techniques, and several psychometrics properties remain unknown. In this paper, data obtained from a behavioral checklist of dichotomized items was utilized to evaluate the instrument’s dimensionality, its construct validity and its capacity to distinguish between levels of pain by using Rasch measurement. A sample of 239 ICU patients was used to collect the data. Results showed that, while unidimensionality was acceptable, concerns remained about the local independence and item fit indices. A third of the items showed misfit. Finally, while items had a great reliability (0.97), persons’ measures had a rather low reliability (0.62) and only 1.28 strata of pain could be distinguished. The narrow range of pain levels in the sample could explain this poor performance and further study is needed, with a sample exhibiting a wider range of pain levels.

____________________

Scale Anchoring with the Rasch Model

Adam E. Wyse

Abstract

Scale anchoring is a method to provide additional meaning to particular scores at different points along a score scale by identifying representative items associated with the particular scores. These items are then analyzed to write statements of what types of performance can be expected of a person with the particular scores to help test takers and other stakeholders better understand what it means to achieve the different scores. This article provides simple formulas that can be used to identify possible items to serve as scale anchors with the Rasch model. Specific attention is given to practical considerations and challenges that may be encountered when applying the formulas in different contexts. An illustrative example using data from a medical imaging certification program demonstrates how the formulas can be applied in practice.

____________________

Evaluating Model-Data Fit by Comparing Parametric and Nonparametric Item Response Functions: Application of a Tukey-Hann Procedure

Jeremy Kyle Jennings and George Engelhard, Jr

Abstract

This study describes an approach for examining model-data fit for the dichotomous Rasch model using Tukey- Hann item response functions (TH-IRFs). The procedure proposed in this paper is based on an iterative version of a smoothing technique proposed by Tukey (1977) for estimating nonparametric item response functions (IRFs). A root integrated squared error (RISE) statistic (Douglas and Cohen, 2001) is used to compare the TH-IRFs to the Rasch IRFs. Data from undergraduate students at a large university are used to demonstrate this iterative smoothing technique. The RISE statistic is used for comparing the item response functions to assess model-data fit. A comparison between the residual based Infit and Outfit statistics and RISE statistics are also examined. The results suggest that the RISE statistic and TH-IRFs provide a useful analytical and graphical approach for evaluating item fit. Implications for research, theory and practice related to model-data fit are discussed.

____________________

Rasch Derived Teachers’ Emotions Questionnaire

Kristin L. K. Koskey, Renee R. Mudrey, and Wondimu Ahmed

Abstract

The purpose of this research was to estimate the reliability of the scores produced from and validity of the inferences drawn from the revised 90-item Teachers’ Emotion Questionnaire consisting of three measures: frequency of emotional expressivity, self-efficacy for regulation of emotional expressivity when teaching, and self-efficacy for regulation of context-specific emotional expressivity. A void exists in an instrument assessing teachers’ regulation and communication of their emotions. One-hundred seventeen practicing teachers participated in this study at Time 1 and 46 at Time 2. Rasch rating scale analyses indicated sufficient item and person separation and reliability and some support for the construct validity of the inferences drawn from the measures. Test re-test reliability for the person estimates was supported for all three measures over a four-week period: r =.592, p < .001, r = .473, p < .01, and r =.641, p < .001, respectively. Concurrent validity for the self-efficacy for regulation of emotional expressivity when teaching measure with the re-appraisal and suppression sub-scales on the Emotional Regulation Questionnaire (Gross and John, 2003) was supported at Time 1. Modifications to rating scales and future directions for assessing teachers’ emotions based on these results are discussed.

____________________

Measuring Alcohol Marketing Engagement: The Development and Psychometric Properties of the Alcohol Marketing Engagement Scale

Angela Robertson, David T. Morse, Kristina Hood, and Courtney Walker

Abstract

Ample evidence exists in support of the influence of media, both traditional and electronic, on perceptions and engagement with alcohol marketing. We describe the development, calibration, and evidence for technical quality and utility for a new measure, the Alcohol Marketing Engagement Scale. Using two samples of college undergraduates (n1 = 199, n2 = 732), we collected field test responses to a total of 13 items. Initial support for scale validity is presented via correlations with attributes previously shown to be related to alcohol engagement. While the joint map of estimated scale locations of items and respondents indicates the need for further scale development, the results of the present analyses are promising. Implications for use in research are discussed.

____________________

 

Vol. 18, No. 2, Summer 2017

Developing an Engineering Design Process Assessment using Mixed Methods

Stefanie A. Wind, Meltem Alemdar, Jeremy A. Lingle, Jessica D. Gale, and Roxanne A. Moore

Abstract

Recent reforms in science education worldwide include an emphasis on engineering design as a key component of student proficiency in the Science, Technology, Engineering, and Mathematics disciplines. However, relatively little attention has been directed to the development of psychometrically sound assessments for engineering. This study demonstrates the use of mixed methods to guide the development and revision of K-12 Engineering Design Process (EDP) assessment items. Using results from a middle-school EDP assessment, this study illustrates the combination of quantitative and qualitative techniques to inform item development and revisions. Overall conclusions suggest that the combination of quantitative and qualitative evidence provides an in-depth picture of item quality that can be used to inform the revision and development of EDP assessment items. Researchers and practitioners can use the methods illustrated here to gather validity evidence to support the interpretation and use of new and existing assessments.

____________________

Psychometric Validation of the 10-item Connor-Davidson Resilience Scale

John Ehrich, Angela Mornane, and Tim Powers

Abstract

Resilience is the personality trait of having positive dispositions which enable individuals to cope with stressful situations. Hence, a reliable resilience scale can provide useful information on understanding and treating individuals suffering from stress and trauma. The 10-item Connor-Davidson Resiliance Scale (CD-RISC-10) is a candidate scale. However, very little psychometric research has been conducted on this scale and, moreover, psychometric analyses to date have not been conclusive. To attain further evidence of the scale’s psychometric properties, we tested the CD-RISC-10 on 288 adult Education major students at an Australian University using both traditional (factor analyses) and modern (Rasch) measurement approaches. Factor analyses indicated good psychometric functioning of the scale. However, Rasch modelling revealed evidence of item misfit and multiple dimensions. Optimal performance was achieved after the removal of two misfitting items indicating a well-functioning 8-item scale.

____________________

The Use of Differential Item Functioning (DIF) Analysis to Distinguish Between Similar Job Roles

Nicole M. Risk and James R. Fidler

Abstract

Two primary roles in the clinical laboratory are those of Medical Technologist (MT) and Medical Laboratory Technician (MLT). Job analyses, which form the foundation of test blueprints employed for credentialing practitioners, suggest a reasonable amount of overlap in the tasks performed by MTs and MLTs. However, credentialing assessments must clearly distinguish between the two roles and ensure that they address competencies appropriate to each practice designation. Differential item functioning (DIF) analysis techniques were applied to explore and differentiate the two laboratory practitioner job roles as an aspect of examination development. Results from the analysis suggest a high degree of similarity between these two groups in terms of scope of tasks performed. Subject matter expert interpretation suggests that the assessments are more appropriately differentiated by underlying level of task knowledge rather than scope of tasks. DIF may be applicable to other exploratory investigations that seek to differentiate job roles comprised of common competencies.

____________________

PSM7 and PSM8: Validating Two Problem-solving Measures

Jonathan D. Bostic, Toni A. Sondergeld, Timothy Folger, and Lance Kruse

Abstract

New mathematics standards were adopted broadly across the United States of America between 2011-2013. Problem solving is a central facet of these new standards. Given new standards and the prominence of mathematical problem solving, there is a need for valid and reliable assessments that measure students’ abilities related to those standards. Moreover, Rasch measurement techniques support psychometric analyses during validation studies, effectively measuring students’ and items’ properties in ways not afforded by true score theory. This manuscript builds upon past research (see Bostic and Sondergeld, 2015a, 2015b) with a validity study of two related problem-solving measures for grades seven and eight. Results from this validation study indicated that the problem-solving measures for grades seven and eight had sufficient evidence for their use.

____________________

Infit and Outfit: Interpreting Statistical Significance and Magnitude of Misfit in Conjunction

Christine E. DeMars

Abstract

In many areas of statistics it is common practice to present both a statistical significance test and an effect size. In contrast, for the Infit and Outfit indices of item misfit, it has historically been common to focus on either the mean square (MS; an index of the magnitude of misfit) or the statistical significance, but not both. If the statistical significance and effect size are to be used together, it is important not only that the Type I error rate matches the nominal alpha level, but also that, for any given magnitude of misfit, the expected value of the MS is independent of sample size. This study confirmed that the average MS for several simulated misfitting items was nearly the same for large and small samples, although necessarily the variance depended on sample size. Thus, if the item fit is statistically significant, the MS appears to be a reasonable index for judging the magnitude of the misfit in the sample, although one must recognize that the estimate of the magnitude will be less stable in small samples, as is true for all effect sizes.

____________________

Measuring Health-related Transportation Barriers in Urban Settings

Sara M. Locatelli, Lisa K. Sharp, Saming T. Syed, Shikhi Bhansari, and Ben S. Gerber

Abstract

Access to reliable transportation is important for people with chronic diseases considering the need for frequent medical visits and for medications from the pharmacy. Understanding of the extent to which transportation barriers, including lack of transportation, contribute to poor health outcomes has been hindered by a lack of consistency in measuring or operationally defining “transportation barriers.” The current study uses the Rasch measurement model to examine the psychometric properties of a new measure designed to capture types of transportation and associated barriers within an urban context. Two hundred forty-four adults with type 2 diabetes were recruited from within an academic medical center in Chicago and completed the newly developed transportation questions as part of a larger National Institutes of Health funded study (ClinicalTrials.gov identifier: NCT01498159). Results suggested a two subscale structure that reflected 1) general transportation barriers and 2) public transportation barriers.

____________________

General Ability or Distinct Scholastic Aptitudes? A Multidimensional Validity Analysis of a Psychometric Higher-Education Entrance Test

Dvir Kleper and Noa Saka

Abstract

The present study explored the construct validity of the Psychometric Entrance Test (PET) for higher education in Israel, as represented by the factorial structure of the scholastic aptitudes it measures, and focused on whether the test presents a single measure of overall ability or a measure of the fields of knowledge that are being tested. In Study 1, we used Exploratory Factor Analysis to generate hypotheses regarding the factorial structure of the test. In Study 2, Confirmatory Factor Analysis was carried out to compare competing models that were constructed based on theoretical considerations and the results of Study 1. The findings indicated that a two-layered hierarchical model, encompassing both a General Ability factor and three scholastic domain-specific factors (Verbal Reasoning, Quantitative Reasoning, and English), showed the best fit. Within the framework of the CFA, several statistical procedures were applied to assess reliability (indicator and complexity) and validity (convergent and divergent.)

____________________

Factorial and Item-Level Invariance of a Principal Perspectives Survey: German and U.S. Principals

Chuang Wang, Dawson R. Hancock, and Ulrich Müller

Abstract

This study examined the factorial and item-level invariance of a survey of principals’ job satisfaction and perspectives about reasons and barriers to becoming a principal with a sample of US principals and another sample of German principals. Confirmatory factor analysis (CFA) and differential item functioning (DIF) analysis were employed at the test and item level, respectively. A single group CFA was conducted first, and the model was found to fit the data collected. The factorial invariance between the German and the US principals was tested through three steps: (a) configural invariance; (b) measurement invariance; and (c) structural invariance. The results suggest that the survey is a viable measure of principals’ job satisfaction and perspectives about reasons and barriers to becoming a principal because principals from two different cultures shared a similar pattern on all three constructs. The DIF analysis further revealed that 22 out of the 28 items functioned similarly between German and US principals.

____________________

 

Vol. 18, No. 3, Fall 2017

A Facets Analysis of Analytic vs. Holistic Scoring of Identical Short Constructed-Response Items: Different Outcomes and Their Implications for Scoring Rubric Development

Milja Curcin and Ezekiel Sweiry

Abstract

In scoring short constructed-response items it may be possible to apply different rubric types depending on the trait of achievement assessed. A rating scale and a partial credit Many-Facet Rasch Models (MFRM) were used to investigate whether levels-based (holistic) and hybrid (analytic) scoring rubrics functioned interchangeably when scoring short- response English reading comprehension test items. Whereas most research in similar contexts has focused solely on rater reliability, the use of MFRM in this study enabled examination of both the reliability and rating scale functioning aspects of scoring rubrics in parallel. It also enabled consideration of their effects on item and examinee parameters. This more comprehensive approach allowed the findingsto be linked to general notions of rubric construct-relevance and score interpretation, and to demonstrate an approach to generating evidence for a more balanced consideration of advantages and disadvantages of each rubric in terms of both reliability and validity.

____________________

Q-Matrix Optimization Based on the Linear Logistic Test Model

Lin Ma and Kathy E. Green

Abstract

This study explored optimization of item-attribute matrices with the linear logistic test model (Fischer, 1973), with optimal models explaining more variance in item difficultydue to identifieditem attributes. Data were 8th-grade mathematics test item responses of two TIMSS 2007 booklets. The study investigated three catego-ries of attributes (content, cognitive process, and comprehensive cognitive process) at two grain levels (larger, smaller) and also compared results with random attribute matrices. The proposed attributes accounted for most of the variance in item difficultyfor two assessment booklets (81% and 65%). The variance explained by the content attributes was very small (13% – 31%), less than variance explained by the comprehensive cognitive process attributes which explained much more variance than the content and cognitive process attributes. The variances explained by the grain level were similar to each other. However, the attributes did not predict the item difficulties of two assessment booklets equall.

____________________

Mapping a Data Modeling and Statistical Reasoning Learning Progression using Unidimensional and Multidimensional Item Response Models

Robert Schwartz, Elizabeth Ayers, and Mark Wilson

Abstract

Data modeling is an approach to basic concepts of data and statistics in middle school that helps students to transform their initial, and often misguided, understandings of variability and chance to forms of reasoning that coordinate chance with variability by designing learning environments that support this reasoning by allowing students to invent and revise models. The Assessing Data Modeling and Statistical Reasoning (ADMSR) proj-ect is a collaborative effort between measurement and learning specialists that has developed a curricular and embedded assessment system based on a framework of seven constructs that describe the elements of statistical learning. Taken together, the seven constructs described above form a learning progression. There are different ways to conceive and measure learning progressions. The approach used by the ADMSR project followed the “four building blocks” approach outlined by the Berkeley Evaluation and Assessment Research (BEAR) Center and the BEAR Assessment System. The finalbuilding block of this approach involves the ap-plication of a measurement model. This paper focuses on the application of unidimensional and multidimensional item response theory (IRT) measurement models to the data from the ADMSR project. Unidimensional IRT models are applied to aid in construct development and validation to see if the proposed theory of development presented by the construct map is supported by the results from an administration of the instrument. Multidi-mensional IRT measurement models are applied to examine the relationships between the seven constructs in the ADMSR learning progression. When applying the multidimensional model, specificlinks between levels of the constructs are analyzed across constructs after the application of a technique to align the seven dimensions.

____________________

Psychometric Properties of the Classroom Assessment Scoring System (Pre-K): Implications for Measuring Interaction Quality in Diverse Early Childhood Settings

Dan Cloney, Cuc Nguyen, Raymond J Adams, Collette Tayler, Gordon Cleveland, and Karen Thorpe

Abstract

The Classroom Assessment Scoring System (CLASS) is an observational instrument assessing the nature of everyday interactions in educational settings. The instrument has strong theoretical groundings; however, prior empirical validation of the CLASS has exposed some psychometric weaknesses. Further the instrument has not been the subject of psychometric analysis at the indicator level. Using a large dataset including observations of 993 Australian classrooms, confirmatoryfactor analysis is used to replicate findingsfrom the few existing validation studies. Item response modelling is used to examine individual indicator behaviour. Latent growth models are used to produce new findingsabout estimating factor scores. Findings show that the CLASS exhibits stable psychometric properties within classrooms over repeated observations. Model fitis improved and factor scores are more reliable when the repeated-observations made in administering the CLASS are accounted for statistically. It is recommended that researchers enforce a fixednumber of repeated observations to minimise bias.

____________________

Ordered Partition Model for Confidence Marking Modeling

Oliver Prosperi

Abstract

Confidencemarking is increasingly used in multiple choice testing situations, but when the Rasch measurement model is applied to the data, only the binary data is used, discarding the information given by the confidene mark-ing. This study shows how Wilson’s ordered partition model (OPM), a member of the Rasch family of models, can be used to model the confidenceinformation. The result is a model which is in strict relation to the binary Rasch model, since the Rasch ICC’s are “split” into a set of curves each representing a confidencelevel. The new model provides a set of item parameters that map the probability of being in each confidencelevel in relation to the test-taker’s ability. The study provides a powerful diagnostic tool to assess item difficult, overconfidenceor misuse of confidence levels but also the fact that a question is particularly tricky or creates a lot of doubt

____________________

Development of an Item Bank for the Assessment of Knowledge on Biology in Argentine University Students

Marcos Cupani, Tatiana Castro Zamparella, Gisella Piumatti, and Grupo Vinculado

Abstract

The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants’ test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Córdoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd ? ±2.0), diferential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment iArgentina.

____________________

 

Vol. 18, No. 4, Winter 2017

The Effects of Item Placement in the Young Schema Questionnaire

Victoria K. Moir, Christopher W. Lee, and Ida Marais

Abstract

The Young Schema Questionnaire (YSQ) was developed to measure ‘Early Maladaptive Schemas’ (EMS), a construct central to Schema Therapy (ST). Traditionally YSQ items were placed in a grouped format for each schema but in recent versions of the questionnaire, items are presented in a random order. This study investigates the effect of item placement on the psychometric properties of the questionnaire. On different occasions, participants completed two versions of the YSQ short form, one with items grouped according to schemas and another where items were placed in a random order. Responses were analysed using the polytomous Rasch model of measurement (partial credit parameterization). Results show that the two versions are not psychometrically equivalent. There were greater differences between the clinical and non-clinical group means for the grouped format than the random format and greater person separation. There was more response dependence between items in the grouped format which has been linked to inflated reliability indices.

____________________

Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting

Kari J. Hodge and Grant B. Morgan

Abstract

Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item’s INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule-of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item’s OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of-thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.

____________________

Approximate Functional Relationship between IRT and CTT Item Discrimination Indices: A Simulation, Validation, and Practical Extension of Lord’s (1980) Formula

John T. Kulas, Jeffrey A. Smith, and Hui Xu

Abstract

Lord (1980) presented a purely conceptual equation to approximate the nonlinear functional relationship between classical test theory (CTT; aka true score theory) and item response theory (IRT) item discrimination indices. The current project proposes a modification to his equation that makes it useful in practice. The suggested modification acknowledges the more common contemporary CTT discrimination index of a corrected item-total correlation and incorporates item difficulty. We simulated slightly over 768 trillion individual item responses to uncover a best-fitting empirical function relating the IRT and CTT discrimination indices. To evaluate the effectiveness of the function, we applied it to real-world test data from 16 workforce and educational tests. Our modification results in shifted functional asymptotes, slopes, and points of inflection across item difficulties. Validation with the workforce and educational tests suggests good prediction under common assumption testing conditions (approximately normal distribution of abilities and moderate item difficulties) and greater precision than Lord’s (1980) formula.

____________________

Social Desirability Amongst Chinese Teachers

Randall E. Schumacker and Cathy Ka Weng Hoi

Abstract

Research has suggested that self-reported responses on surveys can be affected by a participant’s tendency toward social desirability, which would prevent them from revealing their true feelings or behaviors. Researchers should provide evidence that their results have not been affected by socially desirable responses using the Marlowe–Crowne Social Desirability Scale (MC-SDS). Past research has used the 33-item original form and 13-item short form of the MC-SDS, although a few researchers have found questionable validation of the 13 item MC-SDS in several populations. Traditional factor analysis failed to converge on a factor structure. Therefore, the current research was conducted using a Rasch dichotomous model analysis on the original 33-item MCSDS, a 20-item MC-SDS, and the 13-item MC-SDS. Findings indicated that the 33-item MC-SDS had several overlapping items, the 20-item MC-SDS provided a somewhat meaningful measure of social desirability, and the 13-item MC-SDS did not provide sufficient item distribution or separation to produce a meaningful measure. A researcher should check on the factor structure of the MC-SDS when using it in their research, especially with different cultural populations.

____________________

I’m scared to go to School! Capturing the Effects of Chronic Daily Fears on Students’ Concept of Self

Rense Lange, Cynthia Martínez-Garrido, and Alexandre Ventura

Abstract

Students may experience considerable fear and stress in school settings, and based on Dweck’s (2006) notion of “mindset” we hypothesized that fear introduces qualitative changes in students’ self-concepts. Hypotheses were tested on 3847 third-grade students from nine Iberoamerican countries (Bolivia, Chile, Colombia, Cuba, Ecuador, Panama, Peru, Spain, and Venezuela), who completed Murillo’s (2007) adaptation of Marsh’ (1988) SDQ-I. Rasch scaling indicated that the information-content of High-Fear students’ ratings was more localized across the latent dimension than was that of Low-Fear students, and their ratings also showed less cognitive variety. The resulting measurement distortions could be captured via logistic regression over the ratings’ residuals. Also, using training and validation samples (with respectively 60 and 40% of all cases), it proved possible to predict students’ fear levels and their gender. We see the present findings as a first step towards implementing an online warning and detection system for signs of bullying among students.

____________________

Confidence to Perform in the Global Marketplace: Constructing and Validating a Survey Instrument for Community College Students

Snejana Slantcheva-Durst and Mingyang Liu

Abstract

This article discusses the construction and validation of an instrument to gauge community college students’ confidence to perform in the global marketplace. The instrument was designed to capture students’ beliefs in their own abilities to successfully carry out job-related tasks in cross-cultural work environments that are globally-interconnected and constantly at flux. The instrument items emerged from a comprehensive review of literature, nationwide workforce skills initiatives, rounds of expert panel analyses, and focus groups. Items were formulated within Bandura’s framework of self-efficacy, and the instrument was analyzed with Rasch measurement. The Rasch analysis, conducted on a sample of 741 students, provided evidence of the content validity of the items, the generalizability of the measure, and its external validity. The instrument can offer useful feedback to community college internationalization-focused staff in their efforts to assess outcomes of international initiatives for community college students, thus supporting program assessment, evaluation of student growth, and institutional decision-making.

____________________

Measuring Anger Types among Malaysian Adolescents using the Rasch Model

Ahmad Zamri Khairani, Nor Shafrin Ahmad, and Mohd Zahuri Khairani

Abstract

Adolescences is an important transitional phase in human development where they experience physiological as well as psychological changes. Nevertheless, these changes are often understood by teachers, parents, and even the adolescents themselves. Thus, conflicts exist and adolescents are affected from the conflict physically and emotionally. An important state of emotions that result from this conflict is anger. This article describes the development and validation of the 34-item Adolescent Anger Inventory (AAI) to measure types of anger among Malaysian adolescents. A sample of 2,834 adolescents in secondary school who provide responses that were analyzed using Rasch model measurement framework. The 4 response category worked satisfactorily for the scale developed. A total of 11 items did not fit to the model’s expectations, and thus dropped from the final scale. The scale also demonstrated satisfactory reliability and separation evidence. Also, items in the AAI depicted no evidence of DIF between 14- and 16-year-old adolescents. Nevertheless, the AAI did not have sufficient items to target adolescents with a high level of physical aggressive anger.

____________________

Home