Sunday, January 18, 2015

Measuring college learning outcomes: psychometry 101

Pressure is growing for outcomes testing in higher education. Already hundreds of schools allow graduating seniors to take the CLA+ (Collegiate Learning Assessment Plus) to provide evidence of important job skills. I doubt that the CLA+ adds much information concerning an applicant's abilities beyond what can be obtained from existing cognitive tests such as SAT, ACT, GRE. But those tests have plenty of enemies, creating a business opportunity for shiny new assessments. The results covered below will contain no surprises to anyone modestly familiar with modern psychometrics.
Forbes: More people than ever are asking the question: is college worth it? Take a look at the numbers: 81 percent of respondents in a 2008 survey of the general public agreed that a college education was a good investment, but that number was down to 57 percent in 2012. A recent Wells Fargo study reported that one-third of Millennials regret going to college, and instead say they would have been better off working and earning money. This rhetoric is reflected in the reality of declining enrollment: one survey of colleges showed that enrollment for spring 2013 was down 2.3 percent from spring 2012, a trend that has held for consecutive years.

Meanwhile, the wave of keeping colleges accountable for their outcomes continues to crest, even from the left. A recent Brookings Institution study concluded, “While the average return to obtaining a college degree is clearly positive, we emphasize that it is not universally so.” President Obama, a major recipient of plaudits and campaign dollars from the academic left, has called for a government-authored set of rankings for American colleges and universities that rewards performance and punishes failure: “It is time to stop subsidizing schools that are not producing good results, and reward schools that deliver for American students and our future,” he said.

President Obama’s impulse to define and reward value in higher education was correct, but a government-rankings system is not a sufficient corrective for the enormity of the problem. There is no panacea for reforming higher education, but the CLA+ exam has potential to be a very useful step. ...
More from the Wall Street Journal.
WSJ: A survey of business owners to be released next week by the American Association Colleges and Universities also found that nine out of 10 employers judge recent college graduates as poorly prepared for the work force in such areas as critical thinking, communication and problem solving.

“Employers are saying I don’t care about all the knowledge you learned because it’s going to be out of date two minutes after you graduate ... I care about whether you can continue to learn over time and solve complex problems,” said Debra Humphreys, vice president for policy and public engagement at AAC&U, which represents more than 1,300 schools.

The CLA+ [Collegiate Learning Assessment Plus] is graded on a scale of 400 to 1600. In the fall of 2013, freshmen averaged a score of 1039, and graduating seniors averaged 1128, a gain of 89 points.

CAE says that improvement is evidence of the worth of a degree. “Colleges and universities are contributing considerably to the development of key skills that can make graduates stand out in a competitive labor market,” the report said.

Mr. Arum was skeptical of the advantages accrued. Because the test was administered over one academic year, it was taken by two groups of people. A total of 18,178 freshmen took the test and 13,474 seniors. That mismatch suggested a selection bias to Mr. Arum.

What exactly are these college learning assessments? They measure general skills that employers deem important, but not narrow subject matter expertise -- some of which is economically valuable (e.g., C++ coding) and some much less so (e.g., detailed knowledge about the Reformation). Of course, narrow job-essential knowledge can be tested separately.
What Does CLA+ Measure?

CLA+ [Collegiate Learning Assessment Plus] is designed specifically to measure critical-thinking and written-communication skills that other assessments cannot. CAE has found that these are the skills that most accurately attest to a student’s readiness to enter the workforce. In the era of Google, the ability to recall facts and data is not as crucial as it once was. As our technology evolves to meet certain needs of the workplace, so too must our thinking about success and career readiness. Therefore, the skills taught in higher education are changing; less emphasis is placed on content-specific knowledge, and more is placed on critical-thinking skills, such as scientific and quantitative reasoning, analysis and problem solving, and writing effectiveness and mechanics. That is why CLA+ focuses on these skills and why CAE believes employers should use this tool during recruitment efforts.

Two important questions:

1) Are the CLA+ and related assessments measuring something other than the general cognitive ability of individuals who have had many years (K-12 plus at least some college) of education?

2) By how much does a college education improve CLA+ scores?

The study below, which involved 13 colleges (ranging from MIT, Michigan, Minnesota, to Cal-State Northridge, Alabama A&M, ...) gives some hints at answers.
Test Validity Study (TVS) Report

This study examined whether commonly used measures of college-level general educational outcomes provide comparable information about student learning. Specifically, do the students and schools earning high scores on one such test also tend to earn high scores on other tests designed to assess the same or different skills? And, are the strengths of these relationships related to the particular tests used, the skills (or “constructs”) these tests are designed to measure (e.g., critical thinking, mathematics, or writing), the format they use to assess these skills (multiple-choice or constructed-response), or the tests’ publishers? We also investigated whether the difference in mean scores between freshmen and seniors was larger on some tests than on others. Finally, we estimated the reliability of the school mean scores on each measure to assess the confidence that can be placed in the test results.

Effect sizes are modest. The result "d+" in the table below is the average increase in score between freshmen and seniors tested, in units of standard deviations. An individual's score as a freshman is probably a very good predictor of their score as a senior. (To put it crudely, additional years of expensive post-secondary education do not increase cognitive ability by very much. What cognitive tests measure is fairly stable, despite the efforts of educators.)

Note, in order to correct for the problem that weaker students drop out between freshman and senior years, and hence the senior population is academically stronger, the researchers adjusted effect sizes. The adjustment used was simply the average SAT score difference (in SD units) between seniors and freshmen in each school's sample (students who survive to senior year tend to have higher SAT scores -- go figure!). In other words, to get their final results, the researchers implicitly acknowledged that these new tests are largely measuring the same general cognitive abilities as the SAT!

Below are school-level correlations and reliabilities on various assessments, which show that cognitive constructs ("critical thinking", "mathematics", etc.) are consistently evaluated regardless of specific test used. Hint: ACT, SAT, GRE, PISA, etc. would have worked just as well ...

The results below are also good evidence for a school-level general factor of ability = "G". The researchers don't release specific numbers, but I'd guess MIT has a much higher G than some of the lower ranked schools, and that the value of G can be deduced just from average SAT score of the school.

Does the CLA have validity in predicting job and life outcomes? Again, experienced psychometricians know the answer, but stay tuned as data gradually accumulate.
Documenting Uncertain Times: Post-graduate Transitions of the Academically Adrift Cohort

Graduates who scored in the bottom quintile of the CLA were three times more likely to be unemployed than those who scored in the top quintile on the CLA (9.6 percent compared to 3.1 percent), twice as likely to be living at home (35 percent compared to 18 percent) and significantly more likely to have amassed credit card debt (51 percent compared to 37 percent).


Richard Seiter said...

Any thoughts on how the ceiling of the current SAT (e.g. see the Infoproc discussion at ) affects the utility of average SAT score as a metric for the G defined above?

Steve, one of the things I like/appreciate most about your blog is the summaries and discussion of psychometric research. Since you're more familiar with this area than I am, can you comment on this (not footnoted) statement in the TVS report? "However, when the individual student is the unit of analysis, multiple-choice measures are known to yield more reliable scores per hour of testing time than do constructed-response measures." How well established is this and what is the magnitude of the difference?

Does anyone know why Table 2a (student-level equivalent of 2b above) of the TVS report omits correlations for what appear to be different portions (subtests?) of the tests? (e.g. CLA PT vs. CLA CA or CAAP Reading/Science)

steve hsu said...

"However, when the individual student is the unit of analysis, multiple-choice measures are known to yield more reliable scores per hour of testing time than do constructed-response measures."

Don't know of specific studies, but this seems very plausible. See, for example, the writing portion of the SAT.

Blog Archive