The emphasis in previous chapters centered on learning, communicating, and the teaching process. In this chapter, we will discuss the instructor's role as a critic, describe several methods of evaluation, and show how to conduct effective evaluations.

Since every student is different and each learning situation is unique, the actual outcome may not be entirely as expected. The instructor must be able to appraise student performance and convey this information back to the student. This is an informal critique, which is a part of each lesson. The critique should be used by the instructor to summarize and close out one lesson, and prepare the student for the next lesson. Formal evaluations are used periodically throughout a course, and at the end of course, to measure and document whether or not the course objectives have been met.

The Instructor as a Critic

Although this chapter deals with the critique primarily from the standpoint of the instructor in the classroom, the techniques and methods described also apply to the aircraft maintenance instructor in the shop and to the flight instructor in the aircraft or in the briefing area. No skill is more important to an instructor than the ability to analyze, appraise, and judge student performance. The student quite naturally looks to the instructor for guidance, analysis, appraisal, as well as suggestions for improvement and encouragement. This feedback from instructor to student is called a critique.

A critique may be oral, written, or both. It should come immediately after a student's performance, while the details of the performance are easy to recall. An instructor may critique any activity which a student performs or practices to improve skill, proficiency, and learning. A critique may be conducted in private or before the entire class. A critique presented before the entire class can be beneficial to every student in the classroom as well as to the student who performed the exercise or assignment. In this case, however, the instructor should be judicious and avoid embarrassing the student in front of the whole class.

Two common misconceptions about the critique should be corrected at the outset. First, a critique is not a step in the grading process. It is a step in the learning process. Second, a critique is not necessarily negative in content. It considers the good along with the bad, the individual parts, relationships of the individual parts, and the overall performance. A critique can, and usually should, be as varied in content as the performance being critiqued.

Purpose of a Critique

A critique should provide the students with something constructive upon which they can work or build. It should provide direction and guidance to raise their level of performance. Students must understand the purpose of the critique; otherwise, they will be unlikely to accept the criticism offered and little improvement will result.

A critique also can be used as a tool for reteaching. Although not all critiques lend themselves to reteaching, the instructor should be alert to the possibility and take advantage of the opportunity when it arises. If, for example, several students falter when they reach the same step in a weight-and-balance problem, the instructor might recognize the need for a more detailed explanation, another demonstration of the step, or special emphasis in the critiques of subsequent performance.

Characteristics of an Effective Critique

In order to provide direction and raise the students' level of performance, the critique must be factual and be aligned with the completion standards of the lesson. This, of course, is because the critique is a part of the learning process. Some of the requirements for an effective critique are shown in figure 6-1.


The effective critique is focused on student performance. It should be objective, and not reflect the personal opinions, likes, dislikes, and biases of the instructor. For example, if a student accomplishes a complicated flight planning problem, it would hardly be fair for the instructor to criticize the student's personality traits unless they interfered with the performance itself. Instructors sometimes permit their judgment to be influenced by their -general impression of the student, favorable or unfavorable. Sympathy or over-identification with a student, to such a degree that it influences objectivity, is known as "halo error." A conflict of personalities can also distort an opinion. If a critique is to be objective, it must be honest; it must be based on the performance as it was, not as it could have been, or as the instructor and student wished that it had been.


The instructor needs to examine the entire performance of a student and the context in which it is accomplished. Sometimes a good student will turn in a poor performance and a poor student will turn in a good one. A friendly student may suddenly become hostile, or a hostile student may suddenly become friendly and cooperative. The instructor must fit the tone, technique, and content of the critique to the occasion, as well as the student. A critique should be designed and executed so that the instructor can allow for variables. Again and again, the instructor is faced with the problem of what to say, what to omit, what to stress, and what to minimize. The challenge of the critique for an instructor is to determine what to say at the proper moment. An effective critique is one that is flexible enough to satisfy the requirements of the moment.


Before students willingly accept their instructor's criticism, they must first accept the instructor. Students must have confidence in the instructor's qualifications, teaching ability, sincerity, competence, and authority. Usually, instructors have the opportunity to establish themselves with their students before the formal critiquing situation arises. If this is not the case, however, the instructor's manner, attitude, and readily apparent familiarity with the subject at hand must serve instead. Critiques do not have to be all sweetness and light, nor do they have to curry favor with students. If a critique is presented fairly, with authority, conviction, sincerity, and from a position of recognizable competence, the student probably will accept it as such. Instructors should not rely on their position to make a critique more acceptable to their students. While such factors usually operate to the instructor's advantage, acceptability depends on more active and demonstrable qualities than on simply being the instructor.


A comprehensive critique is not necessarily a long one, nor must it treat every aspect of the performance in detail. The instructor must decide whether the greater benefit will come from a discussion of a few major points or a number of minor points. The instructor might critique what most needs improvement, or only what the student can reasonably be expected to improve. An effective critique covers strengths as well as weaknesses. How to balance the two is a decision that only the instructor can make. To dwell on the excellence of a performance while neglecting the portion that should be improved is a disservice to the student.


A critique is pointless unless the student profits from it. Praise for praise's sake is of no value, but praise should be included to show how to capitalize on things that are done well. The praise can then be used to inspire the student to improve in areas of lesser accomplishment. By the same token, it is not enough to identify a fault or weakness. The instructor should give positive guidance for correcting the fault and strengthening the weakness. Negative criticism that does not point toward improvement or a higher level of performance should be omitted from a critique altogether.


Unless a critique follows some pattern of organization, a series of otherwise valid comments may lose their impact. Almost any pattern is acceptable as long as it is logical and makes sense to the student as well as to the instructor. An effective organizational pattern might be the sequence of the performance itself. Sometimes a critique can profitably begin at the point where a demonstration failed and work backward through the steps that led to the failure. A success can be analyzed in similar fashion. Sometimes a defect is so glaring or the consequences so great that it overshadows the rest of the performance and can serve as the core of a critique. Breaking the whole into parts or building the parts into a whole has strong possibilities. Whatever the organization of the critique, the instructor should be flexible enough to change so the student can follow and understand it.


An effective critique reflects the instructor's thoughtfulness toward the student's need for self-esteem, recognition, and approval from others. The instructor should never minimize the inherent dignity and importance of the individual. Ridicule, anger, or fun at the expense of the student have no place in a critique. On occasion, an instructor may need to criticize a student in private. In some cases, discretion may rule out any criticism at all. For example, criticism does not help a student whose performance is impaired by a physiological defect. While being straightforward and honest, the instructor should always respect the student's personal feelings.


The instructor's comments and recommendations should be specific, rather than general. The student needs to focus on something concrete. A statement such as, "Your second weld wasn't as good as your first," has little constructive value. Instead, tell the student why it was not as good and how to improve the weld. If the instructor has a clear, well-founded, and supportable idea in mind, it should be expressed with firmness and authority in terms that cannot be misunderstood. Students cannot act on recommendations unless they know specifically what the recommendations are. At the conclusion of a critique, students should have no doubt what they did well and what they did poorly and, most importantly, specifically how they can improve.

Webmaster's Note: Typically, one uses the acronym "COWFACTS" or some other to memorize these characteristics, the "W" being (Well Organized) as I learned it, you may learn another from your respective CFI.

Methods of Critique

The critique of student performance is always the instructor's responsibility, and it can never be delegated in its entirety. The instructor can add interest and variety to the criticism through the use of imagination and by drawing on the talents, ideas, and opinions of others. There are several useful methods of conducting a critique.

Instructor/Student Critique

The instructor leads a group discussion in which members of the class are invited to offer criticism of a performance. This method should be controlled carefully and directed with a firm purpose. It should be organized and not allowed to degenerate into a random free-for-all.

Student-Led Critique

The instructor asks a student to lead the critique. The instructor can specify the pattern of organization and the techniques or can leave it to the discretion of the student leader. Because of the inexperience of the participants in the lesson area, student-led critiques may not be efficient, but they can generate student interest and learning and, on the whole, be effective.

Small Group Critique

For this method, the class is divided into small groups and each group is assigned a specific area to analyze. These groups must present their findings to the class. Frequently, it is desirable for the instructor to furnish the, criteria and guidelines. The combined reports from the groups can result in a comprehensive critique.

Individual Student Critique by Another Student

The instructor also may require another student to present the entire critique. A variation is for the instructor to ask a number of students questions about the manner and quality of performance. Discussion of the performance, and of the critique, can often allow the group to accept more ownership of the ideas expressed. As with all critiques incorporating student participation, it is important that the instructor maintain firm control over the process.


A student is required to critique personal performance. Like all other methods, a self-critique must be controlled and supervised by the instructor. Whatever the methods employed, the instructor must not leave controversial issues unresolved, nor erroneous impressions uncorrected. The instructor must make allowances for the student's relative inexperience. Normally, the instructor should reserve time at the end of the student critique to cover those areas that might have been omitted, not emphasized sufficiently, or considered worth repeating.

Written Critique

Written critiques have three advantages. First, the instructor can devote more time and thought to it than to an oral critique in the classroom. Second, the students can keep written critiques and refer to them whenever they wish. Third, when the instructor requires all the students to write a critique of a performance, the student-performer has the permanent record of the suggestions, recommendations, and opinions of all the other students. The disadvantage of a written critique is that other members of the class do not benefit.

Ground Rules for Critiquing

There are a number of rules and techniques to keep in mind when conducting a critique. The following list can be applied, regardless of the type of critiquing activity.

Except in rare and unusual instances, do not extend the critique beyond its scheduled time and into the time allotted for other activities. A point of diminishing returns can be reached quickly.
Avoid trying to cover too much. A few well-made points will usually be more beneficial than a large number of points that are not developed adequately.
Allow time for a summary of the critique to reemphasize the most important things a student should remember.
Avoid dogmatic or absolute statements, remembering that most rules have exceptions.
Avoid controversies with the class, and do not get into the delicate position of taking sides with group factions.
Never allow yourself to be maneuvered into the unpleasant position of defending criticism. If the criticism is honest, objective, constructive, and comprehensive, no defense should be necessary.
If part of the critique is written, make certain that it is consistent with the oral portion.

Although, at times, a critique may seem like an evaluation, it is not. Both student and instructor should consider it as an integral part of the lesson. It normally is a wrap-up of the lesson. A good critique closes the chapter on the lesson and sets the stage for the next lesson. Since the critique is a part of the lesson, it should be limited to what transpired during that lesson. In contrast, an evaluation is more far reaching than a critique because it normally covers several lessons.


Whenever learning takes place, the result is a definable, observable, measurable change in behavior. The purpose of an evaluation is to determine how a student is progressing in the course. Evaluation is concerned with defining, observing, and measuring or judging this new behavior. Evaluation normally occurs before, during, and after instruction; it is an integral part of the learning process. During instruction, some sort of evaluation is essential to determine what the students are learning and how well they are learning it. The instructor's evaluation may be the result of observations of the students' overall performance, or it may be accomplished as either a spontaneous or planned evaluation, such as an oral quiz, written test, or skill performance test.

Oral Quizzes

The most used means of evaluation is the direct or indirect oral questioning of students by the instructor. Questions may be loosely classified as fact questions and thought questions. The answer to a fact question is based on memory or recall. This type of question usually concerns who, what, when, and where. Thought questions usually involve why or how, and require the student to combine knowledge of facts with an ability to analyze situations, solve problems, and arrive at conclusions. Proper quizzing by the instructor can have a number of desirable results.

Reveals the effectiveness of the instructor's training procedures.
Checks the student's retention of what has been learned.
Reviews material already covered by the student.
Can be used to retain the student's interest and stimulate thinking.
Emphasizes the important points of training.
Identifies points that need more emphasis.
Checks the student's comprehension of what has been learned.
Promotes active student participation, which is important to effective learning.

Characteristics of Effective Questions

An effective oral quiz requires some preparation. The instructor should devise and write pertinent questions in advance. One method is to place them in the lesson plan. Prepared questions merely serve as a framework, and as the lesson progresses, should be supplemented by such impromptu questions as the instructor considers appropriate. Usually an effective question has only one correct answer. This is always true of good questions of the objective type and generally will be true of all good questions, although the one correct answer to a thought question may sometimes be expressed in a variety of ways. To be effective, questions must apply to the subject of instruction. Unless the question pertains strictly to the particular training being conducted, it serves only to confuse the students and divert their thoughts to an unrelated subject. An effective question should be brief and concise, but also clear and definite. Enough words must be used to establish the conditions or significant circumstances exactly, so that instructor and students will have the same mental picture.

To be effective, questions must be adapted to the ability, experience, and stage of training of the students. Effective questions center on only one idea. A single question should be limited to who, what, when, where, how, or why, not a combination. Effective questions must present a challenge to the students. Questions of suitable difficulty serve to stimulate learning. Effective questions demand and deserve the use of proper English.

Types of Questions to Avoid

Asking, "Do you understand?" or "Do you have any questions?" has no place in effective quizzing. Assurance by the students that they do understand or that they have no questions provides no evidence of their comprehension, or that they even know the subject under discussion. Other typical types of questions that must be avoided are provided in the following list.

Puzzle- "What is the first action you should take if a conventional gear airplane with a weak right brake is swerving left in a right crosswind during a fullflap, power-on wheel landing?"
Oversize- "What do you do before beginning an engine overhaul?"
Toss-up- "In an emergency, should you squawk 7700 or pick a landing spot?"
Bewilderment- "In reading the altimeter-you know you set a sensitive altimeter for the nearest station pressure-if you take temperature into account, as when flying from a cold air mass through a warm front, what precaution should you take when in a mountainous area?"
Trick questions-These questions will cause the students to develop the feeling that they are engaged in a battle of wits with the instructor, and the whole significance of the subject of the instruction involved will be lost.

An example of a trick question would be where the alternatives are 1, 2, 3, and 4, but they are placed in the following form.

A. 4
B. 3
C. 2
D. 1

The only reason for reversing the order of choices is to trick the student to inadvertently answering incorrectly. Instructors often justify use of trick questions as testing for attention to detail. If attention to detail is an objective, detailed construction of alternatives is preferable to trick questions.

Irrelevant questions-The teaching process must be an orderly procedure of building one block of learning upon another in logical progression, until a desired goal is reached. Diversions, which introduce unrelated facts and thoughts, will only obscure this orderly process and slow the student's progress. Answers to unrelated questions are not helpful in evaluating the student's knowledge of the subject at hand. An example of an irrelevant question would be to ask a question about tire inflation during a test on the timing of magnetos.

Answering Questions from Students

Responses to student questions must also conform with certain considerations if answering is to be an effective teaching method. The question must be clearly understood by the instructor before an answer is attempted. The instructor should display interest in the student's question and frame an answer that is as direct and accurate as possible. After the instructor completes a response, it should be determined whether or not the student's reque st for information has been completely answered, and if the student is satisfied with the answer.

Sometimes it may be unwise to introduce the more complicated or advanced considerations necessary to completely answer a student's question at the current point in training. In this case, the instructor should carefully explain to the student that the question was good and pertinent, but that a detailed answer would, at this time, unnecessarily complicate the learning tasks. The instructor should advise the student to reintroduce the question later at the appropriate point in training, if it does not become resolved in the normal course of instruction.

Occasionally, a student asks a question that the instructor cannot answer. In such cases, the instructor should freely admit not knowing the answer, but should promise to get the answer or, if practicable, offer to help the student look it up in available references.

In all quizzing conducted as a portion of the instruction process, "yes" and "no" answers should be avoided. Questions should be framed so that the desired answers are specific and factual. Questions should also be constructed to avoid one-word answers, since such answers might be the product of a good guess and not be truly representative of student learning or ability. If a one-word answer is received, the instructor should follow up with additional questions to get a better idea of the student's comprehension of the material.

Written Tests

As evaluation devices, written tests are only as good as the knowledge and proficiency of the test writer. This section is intended to provide the aviation instructor with only the basic concepts of written test design. There are many excellent publications available to the aviation instructor on test administration, test scoring, grade assignment, whole test analysis, and test item analysis. Refer to the reference section at the end of this handbook for testing and test writing publications.

Characteristics of a Good Test

A test is a set of questions, problems, or exercises for determining whether a person has a particular knowledge or skill. A test can consist of just one test item, but it usually consists of a number of test items. A test item measures a single objective and calls for a single response. The test could be as simple as the correct answer to an essay question or as complex as completing a knowledge or practical test. Regardless of the underlying purpose, effective tests share certain characteristics.

Reliability is the degree to which test results are consistent with repeated measurements. If identical measurements are obtained every time a certain instrument is applied to a certain dimension, the instrument is considered reliable. An unreliable instrument cannot be depended upon to yield consistent results. An altimeter that has worn moving parts, a steel tape that expands and contracts with temperature changes, or cloth tapes that are affected by humidity cannot be expected to yield reliable measurements. While no instrument is perfectly reliable, it is obvious that some instruments are more reliable than others. For example, a laboratory balance is more reliable than a bathroom scale for measuring weight.

The reliability of an instrument can be estimated by numerous measurements of the same object. For example, a rough measure of the reliability of a thermometer can be obtained by taking several, consecutive readings of the temperature of a fluid held at a constant temperature. Except for the errors made by the person taking the readings, the difference between the highest and lowest readings can be considered a range of unreliability in the thermometer.

Reliability has the same meaning whether applied to written tests or to balances, thermometers, and altimeters. The reliability of a written test is judged by whether it gives consistent measurement to a particular individual or group. Measuring the reliability of a written test is, however, not as straightforward as it is for the measuring devices we have discussed. In an educational setting, knowledge, skills, and understanding do not remain constant. Students can be expected to improve their scores between attempts at taking the same test because the first test serves as a learning device. The student gains new knowledge and understanding. If a written test consistently rates the members of a group in a certain rank order, the reliability is probably acceptable, even though the scores of the students have increased overall.

Validity is the extent to which a test measures what it is supposed to measure. If a maintenance technician intends to measure the diameter of a bearing with a micrometer, it must be determined~,that the contacting surfaces of the bearing and the mwirometer are free of grease and dirt. Otherwise, the measurement will include the diameter of the bearing and the thickness of the extraneous matter, and it will be invalid.

A test used in educational evaluation follows the same principles of validity. Evaluations used in the classroom are valid only to the extent that they measure achievement of the objectives of instruction.

A rough estimate of the content validity of a classroom test may be obtained from the judgments of several competent instructors. To estimate validity, they should read the test critically and consider its content relative to the stated objectives of the instruction. Items that do not pertain directly to the objectives of the course should be modified or eliminated. Validity is the most important consideration in test evaluation. The instructor must carefully consider whether the test actually measures what it is supposed to measure.

Usability refers to the functionality of tests. A usable written test is easy to give if it is printed in a type size large enough for the students to read easily. The wording of both the directions for taking the test and of the test items themselves needs to be clear and concise. Graphics, charts, and illustrations, which are appropriate to the test items, must be clearly drawn, and the test should be easily graded.

Objectivity describes singleness of scoring of a test; it does not reflect the biases of the person grading the test. Later in the discussion, you will find that supplytype test items are very difficult to grade with complete objectivity. An example of this is essay questions. It is nearly impossible to prevent an instructor's own knowledge and experience in the subject area, writing style, or grammar from affecting the grade awarded. Selection-type test items, such as true-false or multiple-choice, are much easier to grade objectively.

Comprehensiveness is the degree to which a test measures the overall objectives. Suppose, for example, an aircraft maintenance technician wants to measure the compression of an aircraft engine. Measuring the compression on a single cylinder would not provide an indication of the entire engine. Only by measuring the compression of every cylinder would the test be comprehensive enough to indicate the compression condition of the engine.

In classroom evaluation, a test must sample an appropriate cross-section of the objectives of instruction. The comprehensiveness of a test is the degree to which the scope of the course objectives is tested. Sometimes it will not be possible to have test questions measuring all objectives of the course. At these times, the evaluation is but a sample of the entire course. Just as the owner of the wheat has to select samples of wheat from scattered positions in the car, the instructor has to make certain that the evaluation includes a representative and comprehensive sampling of the objectives of the course. In both instances, the evaluators must deliberately take comprehensive samples in order to realistically measure the overall achievement of the course objectives.

Discrimination is the degree to which a test distinguishes the difference between students. For example, a machinist wishes to measure six bearings that are slightly graduated in size. If a ruler is used to measure the diameters of the bearings, little difference will be found between the smallest bearing and the second smallest one. If the machinist compares the third bearing with the first bearing, slight differences in size might be detected, but the ruler could not be depended on for accurately assorting the six bearings. However, if the machinist measures with a micrometer, which can measure very fine graduations, the diameters of the first and second bearing, the second and third bearing, and so on, can be easily differentiated.

In classroom evaluation, a test must be able to measure small differences in achievement in relation to the objectives of the course. When a test is constructed to identify the difference in the achievement of students, it has three features.

There is a wide range of scores.
All levels of difficulty are included.
Each item distinguishes between the students who are low and those who are high in achievement of the course objectives.

Test Development

When testing aviation students, the instructor is usually concerned more with criterion-referenced testing than norm-referenced testing. Norm-referenced testing measures a student's performance against the performance of other students. Criterion-referenced testing evaluates each student's performance against a carefully written, measurable, standard or criterion. There is little or no concern about the student's performance in relation to the performance of other students. The FAA knowledge and practical tests for pilots and aircraft maintenance technicians are all criterion referenced because in aviation training, it is necessary to measure student performance agaitist a high standard of proficiency consistent with safety.

The aviation instructor constructs tests to measure progress toward the standards that will eventually be measured at the conclusion of the training. For example, during an early stage of flight training, the flight instructor must administer a presolo written exam to student pilots. Since tests are an integral part of the instructional process, it is important for the aviation instructor to be well informed about recommended testing procedures.

Aviation instructors can follow a four-step process when developing a test. This process is useful for tests that apply to the cognitive and affective domains of learning, and also can be used for skill testing in the psychomotor domain. The development process for criterion-referenced tests follows a general-to-specific pattern.

Determine Level-of-Learning Objectives

The first step in developing a test is to state the individual objectives as general, level-of-learning objectives. The objectives should measure one of the learning levels of the cognitive, affective, or psychornotor domains described in Chapter 1. The levels of cognitive learning include knowledge, comprehension, application, analysis, synthesis, and evaluation. For the comprehension or understanding level, an objective could be stated as, "Describe how to perform a compression test on an aircraft reciprocating engine." This objective requires a student to explain how to do a compression test, but not necessarily perform a compression test (application level). Further, the student would not be expected to compare the results of compression tests on different engines (analysis level), design a compression test for a different type of engine (synthesis or correlation level), or interpret the results of the compression test (evaluation level). A general level-of-learning objective is a good starting point for developing a test because it defines the scope of the learning task.

List Indicators/Samples of Desired Behavior

The second step is to list the indicators or samples of behavior that will give the best indication of the achievement of the objective. Some level-of-learning objectives often cannot be directly measured. As a result, behaviors that can be measured are selected in order to give the best evidence of learning. For example, if the instructor is expecting the student to display the comprehension level-of-learning on compression testing, some of the specific test question answers should describe appropriate tools and equipment, the proper equipment setup, appropriate safety procedures, and the steps used to obtain compression readings. The overall test must be comprehensive enough to give a true representation of the learning to be measured. It is not usually feasible to measure every aspect of a levelof-learning objective, but by carefully choosing samples of behavior, the instructor can obtain adequate evidence of learning.

Establish Criterion Objectives

The next step in the test development process is to define criterion (performance-based) objectives. In addition to the behavior expected, criterion objectives state the conditions under which the behavior is to be performed and the criteria that must be met. If the instructor developed performancebased objectives during the creation of lesson plans, criterion objectives have already been formulated. The criterion objective provides the framework for developing the test items used to measure the level-of-learning objectives. In the compression test example, a criterion objective to measure the comprehension level of learning might be stated as, "The student will demonstrate comprehension of compression test procedures for reciprocating aircraft engines by completing a quiz with a minimum passing score of 70%."

Develop Criterion-Referenced Test Items

The last step is to develop criterion-referenced test items. The actual development of the test questions is covered in the remainder of this chapter. While developing questions, the instructor should attempt to measure the behaviors described in the criterion objective(s). The questions in the exam for the compression test example should cover all of the areas necessary to give evidence of comprehending the procedure. The results of the test (questions missed) identify areas that were not adequately covered.

Performance-based objectives serve as a reference for the development of test items. If the test is the presolo knowledge test, the objectives are for the student to comprehend the regulations, the local area, the aircraft type, and the procedures to be used. The test should measure the student's knowledge in these specific areas. Individual instructors should develop their own tests to measure the progress of their students. If the test is to measure the readiness of a student to take a knowledge test, it should be based on the objectives of all the lessons the student has received.

Another source of test items includes FAA knowledge test guides for a particular knowledge test. These sample questions are designed to measure the level-of-leaming desired for pilots or aviation maintenance technicians. As a result, they are a good source of example questions to be used in measuring a student's preparedness to take the knowledge test.

However, care must be taken not to teach questions to ensure the student does not merely memorize answers or the letter of the answer. When using questions from any source, whether from a publisher or developed by individual instructors, periodically revising the questions used and changing the letters and positions of the answers will encourage learning the material rather than learning the test.

Written Test Items

Written questions include two general categories, the supply-type item and the selection-type item. Supplytype test items require the student to furnish a response in the form of a word, sentence, or paragraph. Selection-type test items require the student to select from two or more alternatives. See Appendix A for sample test items.

Supply Type

The supply-type item may be required where a selection-type cannot be devised to properly measure student knowledge. The supply-type requires the students to organize their knowledge. It demands an ability to express ideas that is not required for a selection-type item. This type item is valuable in measuring the students' generalized understanding of a subject.

On the other hand, a supply-type item may evaluate the students' ability to write rather than their specific knowledge of the subject matter. It places a premium on neatness and penmanship. The main disadvantage of supply-type tests is that they cannot be graded with uniformity. There is no assurance that the grade assigned is the grade deserved by the student. The same test graded by different instructors would probably be assigned different scores. Even the same test graded by the same instructor on consecutive days might be assigned altogether different scores. Still another disadvantage of a supply-type test is the time required by the student to complete it and the time required by the instructor to grade it. Everything considered, the disadvantages of the supply-type test appear to exceed the advantages to such an extent that instructors prefer to use the selection-type test. It should be noted that although selection-type tests are best in many cases, there are times where the supply-type is desirable. This would be when there is a need to thoroughly determine the knowledge of a person in a particular subject area. An example of this would be the presolo knowledge exam where it would be difficult to determine knowledge of procedures strictly with selection-type test items.

Selection Type

Written tests made up of selection-type items are highly objective. That is, the results of such a test would be graded the same regardless of the student taking the test or the person grading it. Tests that include only selection-type items make it possible to directly compare student accomplishment. For example, it is possible to compare the performance of students within one class to students in a different class, or students under one instructor with those under another instructor. By using selection-type items, the instructor can test on many more areas of knowledge in a given time than could be done by requiring the student to supply written responses. This increase in comprehensiveness can be expected to increase validity and discrimination. Another advantage is that selection-type tests are well adapted to statistical item analysis.


The true-false test item consists of a statement followed by an opportunity for the student to determine whether the statement is true or false. This itemtype, with all its variations, has a wide range of usage. It is well adapted for testing knowledge of facts and details, especially when there are only two possible answers. The chief disadvantage is that true-false questions create the greatest probability of guessing.

True-false test items are probably used and misused more than any other selection-type item. Frequently, instructors select sentences more or less at random from textual material and make half of them false by inserting negatives. When tests are constructed in this way, the principal attribute being measured is rote memory rather than knowledge of the subject. Such test construction has aroused antagonism toward selection tests in general and truefalse questions in particular. It has also decreased the validity of educational evaluations. Some of the principles that should be followed in the construction of true-false items are contained in the accompanying list.

Include only one idea in each statement.
Use original statements rather than verbatim text.
Statements should be entirely true or entirely false.
Avoid the unnecessary use of negatives. They tend to confuse the reader.
If negatives must be used, underline or otherwise emphasize the negative.
Avoid involved statements. Keep wording and sentence structure as simple as possible. Make statements both definite and clear.
Avoid the use of ambiguous words and terms (some, any, generally, most times, etc.)
Whenever possible, use terms which mean the same thing to all students.
Avoid absolutes (all, every, only, no, never, etc.) These words are known as determiners and provide clues to the correct answer. Since unequivocally true or false statements are rare, statements containing absolutes are usually false.
Avoid patterns in the sequence of correct responses because students can often identify the patterns. Instructors sometimes deliberately use patterns to make hand scoring easier. This is a poor practice.
Make statements brief and about the same length. Some instructors unconsciously make true statements longer than false ones. Students are quick to take advantage of this tendency.
If a statement is controversial (sources have differing information), the source of the statement should be listed.

Multiple Choice

A multiple-choice test item consists of two parts; the stem which includes the question, statement, or problem, and a list of alternatives or responses. Incorrect answers are called distractors. When properly devised and constructed, multiple-choice items offer several advantages that make this type more widely used and versatile than either the matching or the true-false items.

Multiple-choice test questions may be used to determine student achievement, ranging from acquisition of facts to understanding, reasoning, and ability to apply what has been learned. It is appropriate to use when the question, statement, or problem has the following characteristics.

Has a built-in and unique solution such as a specific application of laws or principles.
May be clearly limited by the wording of the item so that the student must choose the best of several offered solutions rather than a universal solution.
Is such that several options are plausible, or even scientifically accurate, but the student may be asked to identify the one most pertinent.
Has several pertinent solutions, and the student may be asked to identify the most appropriate solution.

Three major difficulties are common in the construction of multiple-choice test items. One is the development of a question or an item stem that must be expressed clearly and without ambiguity. Another requirement is that the statement of an answer or correct response cannot be refuted. Finally, the distractors must be written in such a way that they will be attractive to those students who do not possess the knowledge or understanding necessary to recognize the keyed response.

As mentioned previously, a multiple-choice item stem may take several basic forms.

It may be a direct question followed by several possible answers.
It may be an incomplete sentence followed by several possible phrases that complete the sentence.
It may be a stated problem based on an accompanying graph, diagram, or other artwork followed by the correct response and the distractors.

The student may be asked to select the one choice which is the correct answer or completion, the one choice that is an incorrect answer or completion, or the one choice which is best of the answers presented in the test item. Beginning test writers find it easier to write items in the question form. In general, the form with the options as answers to a question is preferable to the form that uses an incomplete statement as the stem. It is more easily phrased and is more natural for the student to read. Less likely to contain ambiguities, it usually results in more similarity between the options and gives fewer clues to the correct response. Samples of multiple-choice questions can be found in Appendix A.

When multiple-choice questions are used, three or four alternatives are generally provided. It is usually difficult to construct more than four convincing responses; that is, responses which appear to be correct to a person who has not mastered the subject matter.

Students are not supposed to guess the correct option; they should select an alternative only if they know it is correct. Therefore it is considered ethical to mislead the unsuccessful student into selecting an incorrect alternative. An effective and valid means of diverting the student from the correct response is to use common student errors as distractors. For example, if writing a question on the conversion of degrees Celsius to degrees Fahrenheit, providing alternatives derived by using incorrect formulas would be logical, since using the wrong formula is a common student error.

Items intended to measure the knowledge level of learning should have only one correct alternative; all other alternatives should be clearly incorrect. When items are to measure achievement at a higher level of learning, some or all of the alternatives should be acceptable responses-but one should be clearly better than the others. In either case, the instructions given should direct the student to select the best alternative. Some of the principles that should be followed in the construction of multiple-choice items are contained in the following list.

Make each item independent of every other item in the test. Do not permit one question to reveal, or depend on, the correct answer to another question. If items are to be interrelated, it becomes impossible to pinpoint specific deficiencies in either students or instructors.
Design questions that call for essential knowledge rather than for abstract background knowledge or unimportant facts.
State each question in language appropriate to the students. Failure to do so can result in decreased validity of the test, since the ability to understand the language will be measured as well as the subject-matter knowledge or achievement.
Include sketches, diagrams, or pictures when they can present a situation more vividly than words. They generally speed the testing process, add interest, and help to avoid reading difficulties and technical language. A common criticism of written tests is the reliance placed on the reading ability of the student. The validity of the examination may be decreased unless reading ability is an objective of the course or test.
When a negative is used, emphasize the negative word or phrase by underlining, bold facing, italicyzing, or printing in a different color. A student who is pressed for time may identify the wrong response simply because the negative form is overlooked. To whatever extent this occurs, the validity of the test is decreased.
Questions containing double negatives invariably cause confusion. If a word, such as "not" or "false ' " appears in the stem, avoid using another negative word in the stem or any of the responses.
Trick questions, unimportant details, ambiguities, and leading questions should be avoided, since they do not contribute to effective evaluation in any way. Instead, they tend to confuse and antagonize the student. Instructors often justify use of trick questions as testing for attention to detail. If attention to detail is an objective, detailed construction of alternatives is preferable to trick questions.


In preparing the stem of a multiple-choice item, the following general principles should be applied. These principles will help to ensure that the test item is valid.

The stem of the question should clearly present the central problem or idea. The function of the stem is to set the stage for the alternatives that follow.
The stem should contain only material relevant to its solution, unless the selection of what is relevant is part of the problem.
The stem should be worded in such a way that it does not give away the correct response. Avoid the use of determiners such as clue words or phrases.
Put everything that pertains to all alternatives in the stem of the item. This helps to avoid repetitious alternatives and saves time.
Generally avoid using "a" or "an" at the end of the stem. They may give away the correct choice. Every altemative should grammatically fit with the stem of the item.


The alternatives in a multiple-choice test item are as important as the stem. They should be formulated with care; simply being incorrect should not be the only criterion for the distracting alternatives. Some distractors which can be used are listed below.

An incorrect response which is related to the situation and which sounds convincing to the untutored.
A common misconception.
A statement which is true but does not satisfy the requirements of the problem.
A statement which is either too broad or too narrow for the requirements of the problem.

Research of instructor-made tests reveals that, in general, correct alternatives are longer than incorrect ones. When alternatives are numbers, they should generally be listed in ascending or descending order of magnitude or length.


A matching test item consists of two lists which may include a combination of words, terms, illustrations, phrases, or sentences. The student is asked to match alternatives in one list with related alternatives in a second list. In reality, matching exercises are a collection of related multiple-choice items. In a given period of time, more samples of a student's knowledge usually can be measured with matching rather than multiplechoice items. The matching item is particularly good for measuring a student's ability to recognize relationships and to make associations between terms, parts, words, phrases, clauses, or symbols listed in one column with related items in another column. Matching reduces the probability of guessing correct responses, especially if alternatives may be used more than once. The testing time can also be used more efficiently. Some of the principles that should be followed in the construction of matching items are included below.

Give specific and complete instructions. Do not make the student guess what is required.
Test only essential information; never test unimportant details.
Use closely related materials throughout an item. If students can divide the alternatives into distinct groups, the item is reduced to several multiplechoice items with few alternatives, and the possibility of guessing is distinctly increased.
Make all alternatives credible responses to each element in the first column, wherever possible, to minimize guessing by elimination.
Use language the student can understand. By reducing language barriers, both the validity and reliability of the test will be improved.
Arrange the alternatives in some sensible order. An alphabetical arrangement is common.

Matching-type test items are either equal column or unequal column. An equal column test item has the same number of alternatives in each column. When using this form, always provide for some items in the response column to be used more than once, or not at all, to preclude guessing by elimination. Unequal column type test items have more alternatives in the second column than in the first and are generally preferable to equal columns. Samples of the two forms of matching-item questions can be found in Appendix A.

Developing a Test Item Bank

Developing a test item bank is one of the instructor's most difficult tasks. Besides requiring considerable time and effort, this task demands a mastery of the subject, an ability to write clearly, and an ability to visualize realistic situations for use in developing problems. Because it is so difficult to develop good test items, a semipermanent record of items that have been developed is desirable. One way of preserving test items is to record the test item, along with the analysis of each question, on a set of cards. If questions are maintained on a computer, provisions could be made to include appropriate analysis gathered, thus creating a useful database. In either case, a pool of test questions is created after a large group of questions has been assembled. As long as precautions are taken to safeguard the security of items in the pool, the existence of the pool lightens the instructor's burden of continuously preparing new items.

Principles to Follow

Regardless of item type or form, the following principles should be followed in writing new items. The list also applies to reviewing and revising existing items.

Each item should test a concept or idea that is important for the student to know, understand, or be able to apply.
Each item must be stated so that everyone who is competent in the subject-matter area would agree on the correct response.
Each item should be stated in language the student will understand.
The wording of the item should be simple, direct, and free of ambiguity. The wording should be edited for brevity. Unnecessary words merely delay the student.
Sketches, diagrams, or pictures should be included when they are necessary for the student to visualize the problem correctly or when they will add realism.
Each item should present a problem that demands knowledge of the subject or course. No item that can be responded to solely on the basis of general knowledge should be included in an achievement test.

Presolo Knowledge Tests

Title 14 of the Code of Federal Regulations (14 CFR) part 61 requires the satisfactory completion of a presolo knowledge test prior to solo flight. The presolo knowledge test is required to be administered, graded, and all incorrect answers reviewed by the instructor providing the training prior to endorsing the student pilot certificate and logbook for solo flight. The regulation states that the presolo knowledge test must include questions applicable to 14 CFR parts 61 and 91 and on the flight characteristics and operational limitations of the make and model aircraft to be flown. This allows the flight instructor the flexibility to develop a presolo written test which not only evaluates the student's knowledge on general operating rules, but on the specific environment in which the student will be operating and on the particular make and model of aircraft to be flown.

The content and number of test questions are to be determined by the flight instructor. An adequate sampling of the general operating rules should be included. In addition, a sufficient number of specific questions should be asked to ensure the student has the knowledge to safely operate the aircraft in the local environment.

The regulation requires a presolo knowledge test for each make and model of aircraft to be soloed. Because of the varying complexity of aircraft and operating environments, the flight instructor will have to use good judgment in developing the test. For instance, a student who would be operating from a controlled airport located near a terminal control area or airport radar service area should have adequate knowledge to operate safely in the environment prior to solo. Likewise, a student operating from a high elevation airport might need emphasis placed on the effects of density altitude. Specific questions should be asked to fit the situation.

The specific procedures for developing test questions have been covered earlier in this chapter, but a review of some items as they apply to the presolo knowledge test are in order. Though selection-type test items are easier to grade, it is recommended that supply-type test items be used for the portions of the presolo knowledge test where specific knowledge is to be tested. One problem with supply-type test items is difficulty in assigning the appropriate grade. Since the purpose of this test is to determine if a student pilot is ready to solo, no specific grade is assigned. The purpose of the test is to determine fitness for solo and not to assign a grade relative to a student's peers. Since solo flight requires a thorough working knowledge of the different conditions likely to be encountered on the solo flight, it is important that the test properly evaluate this area. In this way, the instructor can see any areas that are not adequately understood and can then cover them in the review of the test. Selection-type test items do not allow the instructor to evaluate the student's knowledge beyond the immediate scope of the test items. An example of a supply-type test question would be to ask the student to, "Explain the procedures for entering the traffic pattern for Runway 26." The supply-type test item measures much more adequately the knowledge of the student, and lends itself very well to presolo testing.

Though supply-type test items allow broad questions to be asked, it is probably not possible to cover every conceivable circumstance to be encountered on a solo flight. The instructor must devise the test so the general operating rules are adequately sampled to ensure the overall objective of a safe solo flight is measured. The test also should ask a sufficient number of specific questions to determine that the student has the knowledge to safely operate the aircraft in the local area.

The instructor should keep a record of the test results for at least three (3) years. The record should at least include the date, name of the student, and the results of the test.

Performance Tests

The flight instructor does not administer the practical test for a pilot certificate, nor does the aviation maintenance instructor administer the oral and practical exam for certification as an aviation maintenance technician. Aviation instructors do get involved with the same skill or performance testing that is measured in these tests. Performance testing is desirable for evaluating training that involves an operation, a procedure, or a process. The job of the instructor is to prepare the student to take these tests. Therefore, each element of the practical test will have been evaluated prior to an applicant taking the practical exam.

Practical tests for maintenance technicians and pilots are criterion-referenced tests. The practical tests are criterion-referenced because the objective is for all successful applicants to meet the high standards of knowledge, skill, and safety required by the Federal Aviation Regulations.

The purpose of the practical test standards (PTS) is to delineate the standards by which FAA inspectors and designated pilot examiners conduct tests for ratings and certificates. The standards are in accordance with the requirements of 14 CFR parts 61, 91, and other FAA publications including the Aeronautical Information Manual and pertinent advisory circulars and handbooks. The objective of the PTS is to ensure the certification of pilots at a high level of performance and proficiency, consistent with safety.

The practical test standards for aeronautical certificates and ratings include AREAS OF OPERATION and TASKS that reflect the requirements of the FAA publications mentioned above. Areas of operation define phases of the practical test arranged in a logical sequence within each standard. They usually begin with Preflight Preparation and end with Postflight Procedures. Tasks are titles of knowledge areas, flight procedures, or maneuvers appropriate to an area of operation. Included are references to the applicable regulations or publications. Private pilot applicants are evaluated in all tasks of each area of operation. Flight instructor applicants are evaluated on one or more tasks in each area of operation. In addition, certain tasks are required to be covered and are identified by notes immediately following the area of operation titles.

An instructor is responsible for training the applicants to acceptable standards in all subject matter areas, procedures, and maneuvers included in the TASKS within each AREA OF OPERATION in the appropriate practical test standard. Because of the impact of their teaching activities in developing safe, proficient pilots, flight instructors should exhibit a high level of knowledge, skill, and the ability to impart that knowledge and skill to the students.

Since every task in the PTS may be covered on the check ride, the instructor must evaluate all of the tasks before certifying the applicant to take the practical test. While this evaluation will not be totally formal in nature, it should adhere to criterion -referenced testing. Practical test standards are available from several aviation publishers and are a good reference to use when preparing a student for the practical test. Although the instructor should always train the student to the very highest level possible, the evaluation of the student is only in relation to the standards listed in the PTS. The instructor, and the examiner, should also keep in mind that the standards are set at a level that is already very high. They are not minimum standards and they do not represent a floor of acceptability. In other words, the standards are the acceptable level that must be met and there are no requirements to exceed them.

Return to Flight Instructors Handbook
Return to Dynamic Flight

Copyright 1999-2007 Dynamic Flight, Inc. All rights reserved.
Page Last Updated on: Nov-06-2017