|
Validating Assessments for Students With Disabilities
Defining What Students Learn
The most important and fundamental step in any testing program to ensure a valid assessment of student learning is to define the academic content (American Educational Research Association et al., 1999, chap. 1). Understanding what students learn (or are supposed to learn) in school is fundamental to designing tests that help assess student learning. A basic education typically includes acquiring both knowledge and skills. A convenient way to think about achievement in this context is to imagine a domain of objectives that students learn to accomplish. In elementary mathematics, for example, this domain might include adding, subtracting, multiplying, and dividing whole numbers, fractions, and decimals. Other parts of this domain are linked to student learning objectives based on state content standards. A test score represents a level of achievement in that domain, for example, how many tasks students can perform or how well they can perform them. Levels of learning, for example beginning, intermediate, and advanced, can be set, based on the assumption that students go through natural stages of learning. For purposes of accountability, we tend to use terms to describe achievement levels such as the ones shown in Table 1.
Table 1
Academic Achievement Standards
| Levels1 |
| Advanced—Well above the minimum acceptable level of mastery of the material in the state's academic content standards |
| Proficient—Mastery of the material in the state's academic content standards |
| Basic—Progress of lower achieving students toward mastering the proficient and advanced levels of achievement |
These levels are defined by a panel of subject matter experts whose experience with students should enable them to make valid determinations of the cut scores that separate achievement into these levels. Typically, the process for establishing these achievement standards includes a number of steps that provide some of the procedural evidence for validation and eventually results in an impact analysis that provides the statistical evidence for validation. Both types of evidence are further discussed below.
Traditional criterion-referenced and domain-referenced testing, which were popular in the past, featured tests designed to measure a student's status with regard to a large domain of knowledge and skills for which every bit of knowledge and every skill had a reference to a student learning objective. We could isolate the knowledge and skills well, and we could teach them effectively. The results of each achievement test were intended to be a representative sample of student learning from that domain. A test score would signify the level of achievement in the domain. For example, a score of 75 percent might lead to the conclusion that the student could perform proficiently on 75 percent of all items in that domain even though not all items had been presented to the student.
Most state content standards contain objectives that identify knowledge and skills that fit into this view of achievement. Table 2 shows examples of student learning objectives for three subjects that reflect knowledge and skills needed by all students to succeed in school.
Table 2
Objectives for Knowledge and Skills
| Subject Matter |
Example of a Student Learning Objective |
| Reading |
- Identify main characters in a story.
- Identify a cause and effect relationship in a story.
- Identify root words.
- Distinguish fact from opinion.
|
| Writing |
- Proofread a text.
- Place commas correctly.
- Use active voice as appropriate to purpose.
- Spell correctly.
|
| Mathematics |
- Identify examples of mathematics terms (e.g., logic, manipulative, pi, integer, scatter plot).
- Construct a bar graph with data provided
- Order numbers from low to high.
|
For many reasons, the most desirable format for measuring knowledge and skills is multiple choice or selected response (Haladyna, 2004). The main reason is that the multiple-choice format provides the best chance for very good sampling from a domain, which usually allows tests to be more reliable. Other reasons include logistics and costs. Multiple choice tests are usually substantially less expensive than other formats. Nevertheless, for some important skills (e.g., reading skills such as phonemic awareness and reading fluency), multiple choice tests are not suitable. Instead, brief constructed tasks are necessary.
1 These terms are used in U.S. Department of Education, Standards and assessment peer review guidance, p.2.
Previous | Next 
|