0706-Assessment Methods for Integrated Undergraduate Medical Education

Paper presented at a Medical Education Workshop held at the Faculty of Medicine University of Science and Technology Sanaa Yaman 14-27 June 2007 by Professor Omar Hasan Kasule Sr. MB ChB (MUK), MPH (Harvard), DrPH (Harvard) Professor of Epidemiology and Islamic Medicine at the Institute of Medicine Universiti Brunei Darussalam. EM: omarkasule@yahoo.com WEB: http://omarkasule.tripod.com


This presentation describes undergraduate medical examinations. It discusses the strengths and weaknesses of various examination methods. The methods covered are those that the author thinks are the best: multiple choice questions, short answer questions, the objective structured clinical examination, and the log book. There are other methods that have deliberately not been discussed. The presentation also covers writing and validating examination questions, appointment of examiners, examination regulations, statistical adjustment of scores, transparency of examination results, and student appeals.




Description of continuous assessment: Continuous assessment occurs throughout the semester. The results of the consecutive assessments are cumulated and are used either singly or in combination with another method of assessment to grade the student as performing satisfactorily or unsatisfactorily. The assessment can be in the form of observing and recording student participation in and contribution to classroom activities. It may take the form of a short test or quiz at the end of a module of study. The test could also be given by time periods for example weekly, monthly, or quarterly. Another form of assessment could be a project undertaken throughout the semester and its various stages are assessed as they are accomplished.


Strengths and weaknesses of continuous assessment: Continuous assessment is the best form of assessment because it has several strengths. It keeps the student on his toes all the time instead of relaxing and waiting for the end of the semester to study hard and prepare for an examination. It takes away the stress of the examinations given in a short period of time at the end of the semester. Another advantage of continuous assessment is the immediate feedback that the teacher receives about the results of the assessment. The teacher can then take measures to address any deficiencies in student understanding of the material taught. The results of continuous assessment represent the work done in the whole semester because students are tested on each unit as it is taught. A major disadvantage of continuous assessment is that students do not have an opportunity to integrate knowledge acquired in the whole semester.



End of semester assessment is the traditional form of assessment. Students put in extra effort towards the end of the semester to prepare for examinations. It is the opposite of continuous assessment. Its weaknesses are the strengths of continuous assessment. Its strengths are the weaknesses of continuous assessment. End of semester assessment has a big potential for biased assessment because it is selective. Only a few items of what was covered in the semester are examined. A weak student who happens to be good only on the items that were asked could appear to perform well. A strong student who for some reason failed to prepare adequately for the items selected by the examination could be assessed as weak. Students can, using cumulative experience of previous examinations, spot or guess what is likely to be asked and prepare for that while ignoring the rest of the curriculum. End of semester assessment has less possibility of biased assessment by the examiner in contrast to continuous assessment especially if it is not in objective form like multiple choice questions.




Oral examinations can be very good in assessment but are not popular. The reason for this is the suspicion of examiner bias. I however think that we can develop the oral examination to overcome these biases. Many examination questions can be written on index cards and a candidate is asked to pick a certain number of cards at random. He is then given some time to prepare the answer. The marking carried out by 2 examiners can be based on pre-determined model answers. When the candidate mentions the points expected he gets credit and loses credit for failure to mention expected points. The examiners have an opportunity to cross examine the candidate if the response is vague and it is difficult to determine whether the candidate knows the facts or not.  To have a permanent record of the examination in case a third examiner will be needed especially on appeal by a failing candidate, the examination can be recorded on video tape or audio tape.



Written examinations are the most popular form of assessment. They enjoy several advantages. The candidate has time to think about the questions. If the first question is not clear at the start, the candidate can leave it and try the next question while thinking about the first one. The candidate also has the opportunity to reread previous answers and may be correct or improve them. The written examination provides a permanent record that can be reviewed later in case of disputes or for purposes of quality control. The disadvantage of a written examination is that it assesses both the literacy of the student as well as well as knowledge of subject matter. Students who are knowledgeable may not be able to demonstrate their knowledge fully because of limitations in expressing themselves in writing.



Assessment of skills acquired is best done by asking the candidate to perform or demonstrate skills taught or acquired. Likelihood of examiner bias is decreased by using a structured assessment. Points of assessment are written in advance and as the student performs according to expectation credit is given.




The multiple choice question method of assessment is in my view one of the best assessment methods if used correctly. Its major advantage is that it enables separation of ability to express oneself in written or spoken language from knowledge of facts. The facts are written and the student need only separate the true from the untrue. MCQ examinations are easy to mark with results being available a few minutes after the examination by using optical scanning technology. A disadvantage of the MCQ technique is that students can identify the right alternative by using logical exclusions and sometimes by ‘gut decisions’ when they actually do not have full knowledge or understanding.


MCQ is avoided by many examiners because it is difficult to write good MCQ questions. Some avoid MCQ questions because of lack of familiarity. Others do not like them because they are used to more traditional forms of examination.



PBQ is the ideal in clinical or quasi-clinical medical examinations because they simulate the actual diagnostic and management processes that occur in the clinical situation. In this form of assessment a candidate is given background information and he is required to identify the problem. The initial information is not adequate to define and describe the problem fully and the candidate has to formulate various hypotheses. Then more information is released progressively. With each new information release the candidate is able to eliminate some hypotheses until he remains with the most likely hypothesis. The examination is not confined only to formulating and eliminating hypotheses. Questions on facts related to the problem under study may be asked.



Short answer questions have largely replaced the traditional essay form of assessment. The candidate is asked several short questions each requiring an answer of about a paragraph. The questions could be free-standing and unrelated to one another. A preferred approach is to provide the candidate with a background to provide a context. Then 5-10 questions are asked all related to that one context.



OSCE has largely replaced the traditional clinical examination involving giving a candidate a long or a short case to take history, undertake a physical examination, reach a provisional diagnosis and be able to discuss the findings with the examiner. A major disadvantage of the traditional method was that candidates did not get comparable cases. Some would pass by doing well on relatively easy cases whereas others would fail by being given more difficult cases. The examiners for different candidates were also not the same raising issues of objectivity, comparability, and fairness. The OSCE approach seeks to overcome these disadvantages by presenting all candidates with exactly the same clinical problem and being examined by the same examiner on each item in the examination.


An OSCE examination consists of 10-20 stations each lasting 5-10 minutes. The purpose of OSCE is to test clinical skills as well as communication skills. The candidate is given clear written instructions of what to do. The examiner observes the candidate with minimal interference. If there are any questions they must be standardized for all candidates. The station should cover as few skills as possible. A check list of items to be marked is provided to the examiner. Any necessary equipment and supplies are made available. The patient may be real, a simulated patient, or a mannequin. Simulated patients are given clear written instructions and background information. An attempt is made to make them as real as possible. Simulated patients have an advantage over real patients in that all candidates are presented with the same standard situation. Items are scored 2, 1, or 0.



Students engaged in clinical attachments are asked to maintain a record of all cases seen and what they did with them. The log book is then assessed by the examiners during and after the period of clinical attachment. They can reach an opinion if the student obtained sufficient clinical experience. The log book has to be an authentic record. Some bad students and these are usually very few, can cheat by recording cases they did not see or ‘creating’ details about cases seen that are not true. It is therefore necessary that the log books be written up immediately and be available for inspection at random. If the log book is examined as soon as the record is made, the examiner has the option to go to the ward and see that the facts written about a particular case are correct.  In this way cheating can be discouraged. Another way of discovering cheating is to question the student about the case. Inconsistencies can be discovered very easily by an experienced examiner.



Writing good examination questions is not easy. Lecturers need to attend regular workshops during the year to upgrade skills in question item writing. These workshops should be practical and hands-on. Participating lecturers should write question items that are critiqued by colleagues during the workshop. The workshop should be moderated by a person with experience in question item writing but the level of expertise needed may not be too high because the workshop is essentially learning from one another. Besides critiquing questions of fellow lecturers the workshop can also critique examination questions of other universities and examination bodies that are readily available on the internet. This critique will give the workshop participants a benchmark against which to work.



There are many approaches to writing questions and we cannot prescribe any one approach. The author’s preferred method is that the lecturer writes questions immediately after teaching a topic while the material taught is still fresh in the mind. It is even preferred that he has the teaching material in front of him to make sure that all what is asked  was actually taught. The question should also mirror the way the material was presented. These questions are accumulated so that at the end of the semester the lecturer has a wide range of questions to choose from. This type of examination setting is thus very individualized and quite customized and differs from that of public examinations (in schools or professional bodies) in which the writers of the questions are not the same as the teachers.



The best practice is to have a bank of examination questions from which questions for each examination are selected randomly. Building up a questions bank takes time about 5 years before a faculty has a respectable question bank. Questions for each examination are discussed thoroughly by members of that department. The discussion should include trying to answer them from the students’ point of view to discover inconsistencies and vagueness. The questions should then be tested on students either in continuous examinations or in end-semester examinations. The questions are graded in terms of ease and hardness according to student performance. Questions that are consistently answered wrongly by many students should be identified and reasons found for the failure of the students. The questions could be changes or modified and after that they are put in the bank. The question bank must be renewed continuously as the curriculum changes and as the type of material taught changes with growth of scientific knowledge.




Teachers at the university level enjoy academic freedom. This means that they have the right to teach what they want and in the way they want it. They also have the right to examine or assess their students in the way they like. In practice there are policies and procedures for examinations set by each faculty. These however do not violate the principle of academic freedom because the teachers took part in formulating those policies and procedures. Each examination should have internal examiners who are the lecturers who taught the subject and it is they who must set the examination questions. The questions should be vetted by colleagues in the department. The process of vetting should essentially be feedback to the person who formulated the questions so that he can go back and improve them. The process of critiquing and rewriting questions can be done several times until the questions are perfected. At no stage in this process should the lecturer be marginalized. When the question is formulated in the final version, the lecturer responsible is asked to write a list of points that would be accepted as a correct response. The purpose of the model answer is to cross check on the suitability of the question. If the model answer is incongruent with the question then we can know that there is something wrong with the question. The lecturer who wrote the question is also the one to mark it unless there are overwhelming numbers of students necessitating more than one lecturer marking the question. In any case all those who mark must be lecturers who teach that specific course to the candidates. It is a major mistake to take the model answer and give it to anyone who is not a teacher of the course to correct. This could result into unfair assessment of the candidates.



As part of quality control and benchmarking, lecturers who teach the subject from other universities can be asked to be external examiners. The external examiner must be involved in the process of writing the questions. He must review the questions and give his input before they are finalized. He can then after that marks the questions alongside the internal examiners. The marks awarded per item for the internal and external examiners are compared. Where a wide divergence is seen, the 2 should have a conference to establish the cause of the difference. They can through discussions be able to reach a compromise. The external examiner is also expected to submit a written evaluation of the examination process with recommendations for improvement.



Each faculty should have written examination policies and procedures. These could cover, inter alia, the following matters:

  • Appointment of examiners
  • Minimum attendance
  • Absence from examinations
  • Allocation of grades and passing mark
  • Supplementary examinations
  • Instructions to candidates
  • Breach of examination regulations
  • Release of results
  • Appeal process




Despite the best efforts to make examinations comparable in hardness from year to year, examinations in one year may turn out to be more difficult than other years.  For comparability adjustments may be made to the passmark and the actual scores. This has to be done to ensure justice and fairness for the students.



Setting the passmark decides the proportions of false negative and false positive results as shown in the table below:





Clear pass


False positive


False negative


Clear fail


There are several ways. A fixed percentage could be fixed for passing let us 90%. Candidates marks are then arranged in descending order and the bottom 10% fail irrespective of their score. This method is used by certain professional examinations but is patently not fair.


A second method is to set a criterion for passing by fixing a mark above which a candidate passes and below which a candidate fails. The criterion of 50% has traditionally been used. In some cases faculties of medicine have set a higher criterion like 60% to ensure higher standards. There is yet another system in which the passmark remains 50% but any score below B (80-89%) or C (70-79%) requires that the examination be repeated.


There are other methods of determining the pass mark but I do not recommend them because they are laborious and involve subjective judgment that could create unintended bias. Other methods used are the Angoff method (pass mark is judged as the score of a borderline student) and the Ebel method (pass mark for each question based on level of difficulty).  I do not recommend these because of their subjective nature and lack of consistency from year to year. They are also very laborious to implement.



The marks could also be statistically adjusted using a normal curve. The normal curve is the most objective and reliable methods. Using the mean and standard score for each candidate, the marks can be fitted on a normal curve whose mean is either higher or lower than that of the candidate exam marks.



Quality control and assuring fairness for all involved in the assessment process (lecturers and students), transparency must exist at all levels of the examination. The examination questions must be vetted by a committee as explained above. One of the issues checked is whether the questions reflect what was taught. In some cases the teaching material may be examined to ensure this. When the examination is marked, the results are presented to a departmental or faculty meeting at which the breakdown of marks for each candidate is available. All members of the meeting are free to raise any questions and seek clarifications. The answer scripts should also be available in case some one wants to cross check. After endorsing the results at the department or faculty levels, the results are next submitted to the university senate for endorsement. Only then can the results of the examination be released officially.



As a further measure of transparency candidates should be given an opportunity to make appeals about their examination score. For this reason the answer scripts or any recording of the oral examination should be kept for a period of not less than 3 years to enable remarking. In case of an appeal the lecturer who marked the answer script is asked to recheck because he may find a mistake that can be corrected easily. If he cannot resolve the matter another examiner or two who teach that subject are asked to recheck the script. Opening the door to student appeals can lead to a flood gate that cannot be controlled easily because students tend to think that they performed better than they were awarded. Some of the ways of restricting the appeal process to only genuine cases is to allow only those who failed to appeal. Another method is to tell the students that an appeal if allowed will nullify the first grade and that the paper will be remarked with the possibility of either raising or lowering the mark. In some cases the students may be asked to pay a small amount of money to submit their appeal.

ŠProfessor Omar Hasan Kasule, Sr. June 2007