Burning red flags: State wants to drop norm-referenced tests
Brief background: Some members of the education establishment have an unfortunate hostility toward student testing. It should be put to rest. While it is true there are bad tests and good tests, testing itself is an invaluable tool for every education stakeholder (students, teachers, parents, school leaders, etc.).
The purpose of testing is simple: Once we’ve clearly defined what it is we expect our schools, teachers and curriculum to deliver (i.e., what students should know and be able to do), we need to measure our progress toward that goal. Broadly speaking, teachers and parents need tests to show the strengths and weaknesses of individual students; school leaders need tests to show the strengths and weaknesses of individual teachers and programs; and state policymakers need tests to show the impact of state-level policy and spending.
No single test can provide accurate measurement from all of these important angles, which is why at least three types of tests are used during a student’s academic career: norm-referenced, criterion-referenced and performance-based. Each one offers important data. In a nutshell:
Norm-referenced tests are a cost-effective way to get a broad snapshot of student mastery over foundational skills/knowledge and a reliable comparison between students nationwide. Scoring is generally represented in percentiles on a scale of 1-100. If a student scores in the 63rd percentile, it means he or she scores as well or better than 63 percent of the students taking the test. Norm-referenced tests are particularly valuable for state policymakers who don’t need (and can’t make much use of) detailed student- and classroom-level data.
Criterion-referenced tests measure a student’s ability to demonstrate knowledge and understanding of academic content (math, history, spelling, etc.). Scores on these tests represent an absolute value and typically show percentages of right and wrong answers. These are particularly valuable for teachers and parents who need to know what specific skills and content each student has mastered or needs to improve. Criterion-referenced tests can also be used to measure the academic value each student receives from a year’s worth of instruction (through pre- and post-testing), which is important for teacher evaluation.
Performance-based tests require students to show their ability to understand and perform specific tasks (completing a science experiment, running a mile, painting a picture, etc.). These are valuable tools for teachers.
As Washington’s Superintendent of Public Instruction, Terry Bergeson, notes on her website: “No single test can tell you everything about a child’s performance. Looking at information from a variety of tests and assessment tools remains the best way for parents and classroom teachers to really see how well individual students are learning. . . . [A] series of norm-referenced tests and the National Assessment of Educational Progress are important to the goal of having a balanced and valid system for the measurement of student achievement.”
We agree with Superintendent Bergeson.
The matter at hand: Unfortunately, Superintendent Bergeson’s clear understanding of the need for multiple tests has not stopped her from teaming up with former Governor Gary Locke in drafting two legislative bills (HB 1068 and SB 5071) to end Washington’s participation in the norm-referenced Iowa Test of Basic Skills (ITBS). If the bills succeed, the legislature will be burning some of the red flags signaling serious problems in our state’s public schools.
Scuttling the state’s norm-referenced tests isn’t a new idea. Proponents claim it will save the state money and free more hours for classroom instruction. Many members of the education establishment declared the tests unnecessary when the Washington Assessment of Student Learning (WASL) was designed and adopted in the mid-1990s. At the time, legislators refused to trust the state’s evaluation of student achievement to a single “untested” test that many critics decried as invalid and unreliable.
Concerns about the WASL have not been put to rest nearly ten years later. Test designers attempted to incorporate all three kinds of testing (norm-referenced, criterion-referenced and performance-based) into one high stakes evaluation—an impossible feat that leaves us with an unwieldy, expensive and unreliable (subjectively graded) test that many students, teachers and parents have come to dread.
As for saving money and classroom time, the state spends an average of $54 per student to administer the WASL, which generally takes the better part of a week. Meanwhile, administering the ITBS costs between two and three dollars per student and can be completed in a couple of afternoons.
Further, legislators are right to want a safeguard when it comes to consistent and accurate data about student achievement. EFF recently issued a report showing how Superintendent Bergeson’s decision to lower standards on the reading and math portions of the WASL has allowed schools around the state to claim dramatic “improvements” in student achievement with no actual change in student performance.
A quick comparison between the state’s ITBS and WASL trends shows that, while WASL scores have risen significantly in all grades since the test was adopted, ITBS scores have seen only modest gains in grade three, remained flat in grade six, and dropped slightly in grade nine. This is a red flag.
WASL Scores, Percent of Students Proficient
Grade
Subject
1996-97
1997-98
1998-99
1999-00
2000-01
2001-02
2002-03
2003-04*
Grade 4
Reading
Math
Writing
Grade 5
Science
Grade 7
Reading
Math
Writing
Grade 8
Science
Grade 10
Reading
Math
Writing
Science
47.9%
21.4%
42.8%
55.6%
31.2%
36.7%
38.4%
20.1%
31.3%
59.1%
37.3%
32.6%
40.8%
24.2%
37.1%
51.4%
33.0%
41.1%
65.8%
41.8%
39.4%
41.5%
28.2%
42.6%
59.8%
35.0%
31.7%
66.1%
43.4%
43.3%
39.8%
27.4%
48.5%
62.4%
38.9%
46.9%
65.6%
51.8%
49.5%
44.5%
30.4%
53.0%
59.2%
37.3%
54.3%
66.7%
55.2%
53.6%
47.9%
36.8%
54.7%
35.8%
60.0%
39.4%
60.5%
31.8%
74.4%
59.9%
55.8%
28.2%
60.4%
46.3%
58.0%
39.4%
64.4%
43.9%
65.2%
32.2%
Source: Office of the Superintendent of Public Instruction.
* Proficiency standards lowered, resulting in up to 12% more students meeting standard in some subjects/grades.
Iowa Test of Basic Skills Scores
(NPR** Rank)
1998-99
1999-00
2000-01
2001-02
2002-03
2003-04
Grade 3
Reading
Math
55
60
56
63
57
64
57
66
58
67
58
67
Grade 6
Reading
Math
Language
54
56
56
53
56
54
54
58
56
55
58
56
55
58
55
Grade 9
Reading
Expression
Quantitative
Thinking
54
55
60
53
54
59
54
55
59
53
54
59
53
54
59
* ITBS = Iowa Test of Basic Skills
** NPR = National Percentile Rank
The story of student achievement as told by the ITBS is not as flattering, but it matches the trends on Washington’s other student test, the National Assessment of Educational Progress (NAEP).
National Assessment of Educational Progress Scores
Percent proficient
1994
1996
1998
2002
2003
Math
4th grade
8th grade
Reading
4th grade
8th grade
27%
21%
26%
30%
32%
35%
37%
36%
32%
38%
33%
* NAEP = National Assessment of Educational Progress
Conclusion: Good tests are invaluable tools for education stakeholders. We strongly recommend that legislators maintain Washington’s participation in the norm-referenced ITBS, or adopt an equally valid and reliable diagnostic or value-added test to ensure accurate data about student academic achievement. Dropping the tests would mean burning red flags. We need to heed them instead.
At a March 23, 2005, House Appropriations hearing on a bill to gut the voter-approved I-601 spending limit, Rep. Jim McIntire (D) asked a supporter of I-601’s two-third supermajority requirement for the legislature to raise taxes the following question:
"Can you name a time when we [legislators] have actually not just set it [supermajority requirement] aside by majority vote? I mean, this is in many respects a procedural motion that has no bearing. It’s a statutory constraint that cannot constrain any legislature that chooses as a majority to set it aside . . . have we ever used a supermajority [to raise taxes]?"