Key URLs and Links from Talks

Brian Huot:
The Big Test by Nicholas Lemann
On a Scale: A Social History of Writing Assessment in America by Norbert Elliot
Standards For Educational And Psychological Testing 1999 by AERA
Assessing Writing: A Critical Sourcebook by Brian Huot and Peggy O'Neill

Bob Cummings/Ron Balthazor:
No Gr_du_te Left Behind by James Traub
EMMA, UGA's electronic and e-portfolio environment

Marti Singer:
GSU's Critical Thinking Through Writing Project



Wednesday, October 24, 2007

8:45 – 10:15: Understanding and Using Mandates for Writing Assessment as Opportunities for Improving Our Programs and Teaching

Brian Huot, Kent State University Department of English

Along with Peggy O'Neill, Brian Huot is the editor of a soon to be published collection of essays on assessment from Bedford/St. Martin's called Assessing Writing: A Critical Sourcebook.

To learn more about Brian and Peggy see Assesing Writing: About the Editors.

To see an annotated table of contents see Assessing Writing: Contents.

To read the introduction to the collection see Assessing Writing: The Introduction.



Brian's talk:

Raw, directly typed notes. Typos and errors are mine –Nick Carbone—not Brian's.

History
History of search for reliability, mainly inter-rater reliability. If you had a target, reliability would be how often you hit a part of the target. It's about consistency more than being dead center.

1912 Starch and Elliot on how English teachers don't agree on grades they give students.

1930's SAT's were new and could help w/ scholarship. Used for students who needed to get scholarship and info had to get in sooner.

12/1941 WWII SAT becomes part of national security zietgiest. SAT grows as CEB gives ECT (English Comprehensive Test) – they right prompts and teachers read and grade. No reliability.

Nicholas Lamont the _Big Test_ ETS out in 1947.

Norbert Elliott, _Writing on a Scale_

In 1950's common to assess writing with no writing at all.

No we have computer scoring: erater, accuplacer, and scores are more reliable than what you get w/ human readers (but at sacrifice of validity).

From moving to arena where you have people who cannot agree to a machine that always agrees.

Validity
Intelligence testing takes off at turn of century in response to laws demanding universal education. So children whose parents never went to school and others come and they're hard to teach. What worked for privileged didn't work for masses. So testing began. Intelligence testing starts to measure students to find out why they can't learn.
Don't hear validity defined. Don't hear a lot about it on validity. Trusted testmakers to affirm that their tests were valid for measuring the thing they purport to measure.

Validity: how well a particular measure correlates to another measures of the same thing. It becomes a circle of one test checking another.

Traditional approach: does a test measure what it purports to measure.

In the '50's a test is valide if it serves the purpose for which it is used. This raises a question of worth and value: what's the value of this test? Do you have a rationale for having this assessment? How will it increase

Three keys:

Content Validity -- is it on the content and skills it purports to measure. are writers writing.

Criterion Validity – is it a consistent measure and does it match other measures

Conformity of measure with phenomenon of interest – does it support theory of what good content is


Compass Test: Indirect test of writing (yeah right). Instead of grammar test, it's an untimed editing test on a computer and that information is used to place you in writing courses (750,000 students use it).

To get validity, you have a group of essays and the correlation of compass test and scores on essay test were close enough to match (criterion validity)

The degree to which an argument can be made for integrated

n it's about judgment
n it's about decision making
n it's not about the measures you use, it's about the decisions you make
n validity is partial and always ongoing
n it assesses the assessment
n it's about consequences

For example if instructors will do some things and not others, you can't force them or measure the thing they won't do.

Reliability is subsumed under validity. Any argument about validity must consider reliability. Newer assessment models where folks don't give scores, works better. So instead of wholistic scoring, use a scheme that asks teachers which classes a student should be in.

If assessment has no benefits for teaching and learning, don't do it. But assessment is part of our jobs and can impact in a positive way.

If you know a program is working well, assessment can protect it. Make your own assessment or you will be assessed upon.

_Standards for Educational and Psychological Testing: 1999_ not from NCTE, CCC, this is from measurement community says is ethical and responsible use of assessment.

Research

Kent started new program. Moved second course to sophomore year. Move from literature based CT to process based comp program with more computer use and multimodal text.

Prepare folder:

syllabus – all of assignments and handouts.
sample of student writing: Above, at, and below students should be doing.
One page self-assessment of how class went from teacher's point of view.

Pay a $100 per folder. Ten teams w/ four people in each team and every team will ready four portfolio. Pay readers a $100 to read. Cost for large program $15,000.00

This will give them snapshop of what their program is doing and will get a sense of how well courses are meeting requirements and goals of new curriculum. Removed from personnel.

People will also get a sense of cool ideas from other classes.

This will give a sense of what's going on.

Will write a report to administration on what is going on. Will see patterns if certain goals aren't being met, can address that.

No performance by students except for samples. In year five they will do deeper research on student performance.

Opportunity

Assessment can be a community building activity. They can talk about teaching and students, but need to do it so people don't feel under the gun. Research into second year is not evaluative but descriptive because they are asking people to change the way they're teaching. It's radical change for some, for example, teaching in a computer classroom. So adjustment period is required and it needs nurturing, not chain-ganging, of instructors.

2 comments:

Anonymous said...

Table one:
most interested in community that will be built out of assessment,

most hoped that curriculum will be improved

GSU in midst of huge SACs moment, so it's university wide considerations.

Table 2:
disagreement about what is measurable. Can you measure "intellectual ambition" for example. Useful to do to make sure comp. isn't about just mechanics and grammar.

How to work w/ adjuncts and TAs. Also how to help developmental students and get them into the program and success.

Donna Sewell said...

I particularly enjoyed hearing about the research Brian is doing to find out what's going on in first-year writing. It's exactly the kind of thing we're trying to do at Valdosta State (gather data), but we haven't added the research component of teams reading folders.