NEETS and the Multi-dimensional Aspects of Educational Assessment

Abraham I. Felipe
abefelipe@yahoo.com

Background

Nap Imperial’s suggestion to examine the meanings and practices of assessment in education in the Philippines is very timely. The Presidential Task Force for Education (PTFE) has just finished its work, a legislative agenda on its recommendations adopted, and one agenda item related to assessment is being fleshed out.

doctorThis item is about the establishment of NEETS (National Educational Evaluation and Testing System, also called NERETS lately for National Educational Research, Evaluation and Testing System). The idea was for NEETS to be an autonomous body concerned with quality at the basic, vocational-technical and higher education levels which it was mandated to assess, “coordinate” and “harmonize”.

Consistent with its proposed name, NEETS made known its plan to use test data as its main raw materials for conducting assessment. From NEETS viewpoint, the tests to be so used would have to be centrally or nationally developed, not just loaned from this or that source. For this reason, NEETS initially proposed to develop many kinds of tests — achievement, diagnostic, aptitude, creativity and other kinds, for children and students at different levels of education, in the hope of being relevant to their varied needs.

This plan predictably threatened many colleges and universities, private and state alike, which had been satisfied all along with their home-grown testing systems. It also threatened private outfits providing test services to schools, for it would render their products irrelevant.

The Presidential Assistant (PA) for Education[1], however, has kept open alternative plans for NEETS. The PA took notice of the old plan’s possible effect on the competitiveness of players in the educational testing field, possibly tilting the field to favor some against others. This paper intends to help players find their respective roles in educational assessment, consistent with their professional histories and competences but within a framework based on the right of government to assess the effects of schooling and of education in general. The hope is for NEETS, when finally implemented, to be a welcomed development.

The Many Dimensions of “Assessment”

“Assessment” is often contentious for two related reasons. First, it invariably leads to classification – good or bad, passing or failing, acceptable or not acceptable, and various degrees in between. Second, the outcome is often affected by the assessment methods used. We seek a less contentious system, wherein outcomes are less dependent on methods. We think this is possible if we classify outcomes and show that some outcomes are relevant to some people and some needs, while other outcomes are relevant to others. This hopefully will reduce touchiness. A unified system is possible because all assessment practices have one common denominator — they all involve a comparison of an observation (for example, a measurement) with a standard. After recognizing that, it is easier to show that what makes assessment traditions different merely starts from differences in the measures they use.[2]

Before continuing, let me first comment on one popular belief and one very common practice in the Philippines. The common belief is that testing in and by itself is assessment. It is not. It is only a technology. Testing is only one technology of taking measurements. Testing leads to assessment only when standards are applied on the measures afterwards.

The common practice is about the rating of “75” for a passing mark. The reason why testing has been able to masquerade here as assessment is a common practice of evaluating, rating or judging a performance or an output as “OK” or “PASSING” if on a numerical scale it tipped the magical number “75” (or, conversely, “FAILING” if it did not reach “75”). In other words, a tipping point of “75” was considered inherent in the test measure. This local practice is odd because teacher training institutions have a science-based[3] course in their curricula often labeled “measurement, or assessment, and evaluation” which, when properly taught, would give no special room for “75”.  The unexamined meaning of “75” has lived on, abetted by the silence of both specialists and high education officials on the matter. No one can yet be quoted for pointing out the arbitrariness of that rating by relating it to the ease or difficulty of a test, or to the stinginess or generosity of a teacher. The type of assessment involving the special use of “75” will not be the type of assessment I will discuss here.

The purpose of education is learning, and assessment seeks to determine the degree to which this purpose is achieved. Many assessment practices have been developed as a result. Some practices utilize samples of individual behavior or performance that can be reasonably interpreted as learning. The foremost example of this practice is the personal, often un-standardized, eyeball-to-eyeball classroom assessment by teachers based on recitation and other classroom participation. Another example is related to this first example; its difference is that it is more formal. This second example is the test-based type of assessment conducted by an office higher than the teacher (perhaps by the school or even by DepEd). This is the type that is usually called assessment. Unlike the first case, test-based assessment here is no longer eyeball-to-eyeball between teacher and students.

A different type of assessment practice does not use samples of individual behavior. Instead they use aggregations of certain elements in the school environment relevant to the management of schooling; these aggregations are then used in (or, as) ratios. Examples are the student/teacher ratio, student/classroom ratio, the pupil/book ratio, survival rate, passing rate, participation rate, and others. Test-based practitioners do not usually recognize ratio users as doing assessment work. Many fail to appreciate the assumption that unfavorable ratios are generally inimical to learning. An integrated view of assessment must assign a place for those who use these ratios. For a start, these ratios could be studied in relation to traditional measures of learning.

In order to have a unified system of assessment, I suggest a functional view. This view is one based on aspects of every assessment practice. Four aspects are considered important for this purpose: (a) the measures used in the assessment, (b) the entity that has use for the results of assessment, (c) the level of the educational system at which assessment will be conducted, and (d) the frequency/timing of the assessment. One can view these four as four dimensions[4] of assessment, a suggestion given so that NEETS would have a practical scheme for clarifying responsibilities with other players. The first two of the four dimensions will be discussed in more detail because of their novelty as dimensions of assessment, in contrast to the last two which are already familiar.

The Four Dimensions of Assessment

  1. A. The measures used in the assessment

Measures are the first of the four dimensions proposed. A postulate in education is that its purpose is learning. Since learning, however, is not directly measurable, indirect measures are used instead. Here in the choice of measures is where differences start.

The most common measures are indications of learning such as correct answers to questions in recitations, quizzes and formal class examinations. These cognitive or intellective measures are easy to get. Intuitively, they indicate learning. Their main drawback is a lack of standardization, which makes it difficult to have fair inter-individual comparisons. Their one plus factor is a possible use of a teacher’s “clinical eye” for details and nuances in behavior.

A close kin of classroom tests is the formalized tests in large-scale testing. Large-scale testing usually standardizes procedures for collecting and processing data and interprets them using standardized parameters. It therefore leads to a fairer inter-individual comparison.

An interesting tool for getting measures of what one knows is the so-called “portfolio” of performance. Here, the individual presents a collection of what he considers to be his accomplishments – grades, awards, musical talents, participation in a basketball league, invention of an efficient fish hook, and other real world achievements. Doubtless, portfolios contain more in-depth information about individuals but they are inefficient for large-scale comparisons and they suffer from a need to authenticate ownership of claimed achievements.

In the literature are found other measures. Of these, access measures are common (proxy indices of access include participation rate, survival rate, retention rate, drop-out rate, etc.) Often unstated is their assumption that without access, there will be negligible learning.

Most access measures are themselves accessed within DepEd. An exception is participation rate, the percentage of school-age children in school. It is accessed only with the help of the Census and Statistics Office.

Except for access measures, other measures are collected and studied for certain information about learning. The favorite information sought is quality of learning. In education quality is the grail. A few disagree on the ground that quality is too mushy and the science in education needs something more solid. Their alternatives are more quantitative indices – efficiency (what were learned out of what were taught – a ratio) and effectiveness (what were retained after a time frame of what were taught – another ratio). Between efficiency and effectiveness, more technical problems beset the latter because of the qualification “retained after a time frame”.

Many assessment projects deal with the efficiency of schooling. However, this conclusion is made with tongue in cheek because it is rarely made clear that students were tested on what they were taught, a requirement of the concept of efficiency.

Determining what students learned of the things taught them is a very respectable objective. As a matter of fact, it is the type of assessment that should be first done. When left unclear, one could not tell whether a specific assessment is an assessment of education or merely that of schooling. Past studies should be critically reviewed so that future assessments can be properly guided.

In the not too distant future, we will have to prepare to assess if education helps attain goals that are more societal in nature. Does our system develop manpower with marketable competences, whether in or outside the Philippines? Does our system develop a citizenry that is maka-bayan, maka-tao, maka-kalikasan at maka-dios? Based on constitutional foundations of our system, these are legitimate questions.

  1. B. The user of the assessment.

The user, the entity that benefits from the assessment, is the second dimension. For practical purposes, three users are important – the teacher as individual, the school as institution and the systems’ “owner” for the whole system. The “system’s owner” is that institution administering or managing the whole system at the national level (i.e., DepEd for basic education, CHED for higher education, TESDA for voc-tech education). For each user, assessment is useful because of the information or feedback it provides.

  1. 1. The teacher as user. For a teacher, assessment is useful when it provides a feedback on how he is doing in his task. What is useful to him is information that (a) tells him of any discrepancy between his accomplishment and his teaching objectives, and (b) other things he could do to improve performance.

The objective of the teacher is to teach effectively or efficiently specific subject matter. This objective usually addresses quality. What will be useful to him will be feedbacks on his practices (e.g., how frequently he gives quizzes, whether or not the results of quizzes are discussed with the students concerned, the references he uses, the learning aids and technologies he uses, the number of pages of reading materials he assigns, time length assigned to a topic, and so on). Information on participation rate, drop-out rate, retention rate, etc. in the system will not help him in his work; they will not be material to him.

  1. 2. The school as user. The institution may set its own objectives such as a certain level of academic performance compared with the previous year or compared with other schools, a level of religious devotion of its students, an increased in civic participation, an improvement in its drop-out rate, and so on. Information on the degree of accomplishment of its objectives would be relevant. Information on teacher practices matter only to the extent that they affect institutional objectives. In general, individual institutions will not be concerned with the results of system-level assessments
  1. 3. The “owner” as user. Any feedback on the status of system-wide goals such as educational quality and access, or manpower marketability and absorption are useful to the owner.

Rationale for the “Users” dimension:

The user dimension is incorporated in order to call attention to certain variables affecting assessment outcomes but which are often overlooked. Each type of user has effective control over certain variables. The purpose of a user dimension is to remind each user what he could do to further improve the assessment results.

For example, it is teacher who has immediate and direct control over the teaching and learning process at the classroom level. Therefore, he should be conscious that his teaching methods, practices, incentive schemes, grading system, demeanor, training, motivations, personal goals, competences and other attributes are variables that directly affect his effectiveness. By doing something with variables at his command, he will be able to influence learning in the classroom.

Nowadays, classroom-level assessments are no longer initiatives of individual teachers only. Lately there have been classroom-level assessments designed for large-scale implementation.  Project BEAM (Basic Education Assistance for Mindanao) is an example. Learning improvements purportedly traceable to the assessment process in BEAM have been reported.

In the users dimension, the level next to the classroom is the school. Like the classroom teacher, the school affects the teaching-learning process albeit at another level. It has effective control over several variables — policies on student admission and faculty hiring, relevant student and faculty incentives, salaries, grading system, academic loads, attendance policies, policies on supervision of instruction, policies on reviews for tests, textbook policies, texts required, library policies and stocks, reading and reference materials, availability of technologies, class hours, type (semestral, trimestral or quarterly) and number of contact hours per school term, practices about storms and floods, policies on extra-curricular activities, incentives for meritorious performance, penalty systems and many more.

Examples of miniature systems that set minimum acceptable  practices/standards for groups (not individual schools only) are accreditation and quality assurance. Examples of assessors of student characteristics (such as academic preparedness, aptitudes and other student attributes) are test service outfits like the Asian Psychological Services and Assessment Corporation (APSA) and the Center for Educational Measurement (CEM).

As assessment tools, tests have been criticized for giving very little information. This charge is true in the past but not true anymore. As tools to assess individual attributes, tests transmit little information about the examinee when only one number (such as, the test score) is given. Even when this one number is broken down into two or three components, the information is still sparse.

Recently, there have been some breakthroughs in extracting information and reporting them. Now, it is possible to compare one student with a specified group on each test item used (and not only on the total test score), a boon to guidance and admissions workers. With the help of new technologies, it is possible to assess one whole batch of admissions applicants relative to last year’s batch, or any specific batch, or several past batches, on all or a combination of items in the test. The single test score has already been made obsolete, although it continues to be used by some practitioners. With these new developments, test data can now contribute more to planning and decision-making. This use of test data at the level of institutions can be exported easily to the national level.

As it is with institutions, some variables for improving the system may be under the primary or even exclusive control of the system’s owner. Examples are the school calendar, medium of instruction, grading system, minimum teacher qualifications, promotions, hiring and salaries, incentives, school contributions, policies on uniforms, policies on student guidance, system of permissible penalties and many more.

  1. C. Level of the education system at which assessment is being done

This is the third dimension of assessment. As explained earlier, this dimension will be discussed only briefly because it is already quite familiar.

“Level” here refers to the three levels of basic, voc-tech, and higher education. The purpose of classifying the levels this way is to recognize who has the rights to receive the assessment results and the attendant responsibility to follow-up the assessment results.

The third dimension of assessment means that classroom assessments where the teacher is user, may be done at all three levels of the education system — the basic, voc-tech or higher education levels. The scenario is classroom assessment in all schools at all levels.

Likewise, institutional assessments where the school is user may be done at the same three levels — basic, voc-tech and higher education. The scenario is institutional assessment for all schools.

Finally, system-wide assessment may be done for the same three levels. The scenario is system-wide assessments in basic, voc-tech and higher education. System-wide assessments in higher education will likely be designed along program lines (e.g., electronics technician training, auto-mechanics, accountancy, engineering, law, nursing, etc.).

  1. D. The timing and frequency dimension.

This dimension refers to the popular classification of assessments as formative or summative. The terms formative and summative refer to the purpose of the assessment.

The purpose in formative assessment is to influence the rate and direction of learning.  Information about learning achievements is usually collected time-wise. Done this way, both user (usually the teacher) and student are given data at different points in time to calculate whatever adjustments they need to make in order to best achieve their ends.

A quasi-formative assessment is diagnostic testing. As the term suggests, the purpose of testing is diagnostic. Usually, testing is not repeated unless there is doubt about the way the test was administered or the technical suitability of the test. The assumption in diagnostic testing is that the qualities sought are not on-and-off but are relatively enduring qualities. Examples are cognitive styles, aptitudes and personality attributes relevant to learning. These qualities may be sought on the assumption that they could help the teacher design better teaching strategies.

When information are collected at the end of a timeframe, the only purpose served is to estimate the final position in relation to a reference point (i.e., how much of the objective was attained).

The formative type is interactive between user and student. The most inter-active formative type usually takes place at the classroom level where the teacher may use many types of devices to get feedbacks — recitation, long tests, short quizzes, home assignments, portfolios of accomplishments, group work contribution.  As discussed earlier, formative assessment is on the ground, in the here and now.

In contrast, summative assessment is more formal; the user and the student may be distant, not eyeball-to-eyeball with each other. The common form of summative assessment is the final examination.

Conclusion

Assessment is one tool used to address quality education. The first issue about quality is what children learn from the school curriculum and the quality of what they learn. This becomes urgent in the light of the expansion of our school system and its attendant demand on resources. As a result of population growth, assessing the system has become more complex. This complexity calls for government to acknowledge the existence of various initiatives to assess it, and then to encourage and support them. The game here is addition, not subtraction. Each player can apply his unique competencies to his part of the totality. Each should enjoy the right to develop appropriate assessment tools. He is only limited by the right of government to assess the system of education.

There is urgency in having professional assessments of the effects of instruction in our schools using measures of quality and access. We suffer from having only a little, not a surplus, of reliable and validated information about our educational system.

There is need to encourage teachers and institutions to use their control over variables at their command, in order to improve quality at their respective levels.

Government must start addressing education’s assessment functions for the larger society such as those contained in the notions of the “responsible citizen”, “the productive Filipino who can compete and survive in the global economy”, or “one who is maka-tao, maka-bayan, maka-kalikasan at maka-dios”. The required expertise for these topics may be more scarce but the topics are legitimate and the questions important.

[1] Dr. Mona D. Valisno

[2] “Assessment” is often used interchangeably with “evaluation”. A stricter usage limits the latter to cases wherein the standard has a more affective (i.e., “good-bad”) tone. Shorn of this tone, the terms are synonymous.
[3] Statistics provides the theoretical basis of testing.

[4] The concept of “dimension” is as “vectors” in mathematics. In this paper, the assumption is for the dimensions to be mutually independent. To visualize, a 2-dimensional space gives a plane; a 3-dimensioonal space, a solid. A 4-dimensional space implied by the present proposal does not have a corresponding visual representation. My suggestion is to represent the 4-D space by matrices. Alternatively, taxonomy can be used to conceptualize assessment. Here, the taxonomic attributes will be the dimensions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s