- Published 2010

Knowledge acquisition is when we ask experts questions and put the answers into the computer system Since this is a very time consuming task it is desirable to minimize the e ort of an expert As a crude estimate for this e ort we can take a number of binary yes no questions that we ask The procedure that minimizes this number is binary search This approach does not take into account that people often feel more comfortable answering yes than answering no So to make our estimates more realistic we will take into consideration that for a negative answer the e ort is bigger This paper describes a procedure that minimizes the e ort of an expert We also estimate the e ort of this optimal search procedure Introduction to the Problem Informal Introduction Knowledge acquisition is when we ask experts questions and put the answers into the computer system It is a very time consuming and therefore expensive task Thus it is desirable to minimize the e ort of an expert How do we estimate this e ort A reasonable way to do it is to estimate the e ort by a number of questions Of course we can always ask just one question like Please explain everything you know A reasonable idea is to estimate the e ort by the total number of binary questions i e yes no questions for which there are exactly two answers yes and no We will consider only the case when we know all possible alternatives and we want to know which one of them is correct For example suppose we know that we need to prescribe an analgesic to a patient but we do not know which one to prescribe for this particular patient If we have four alternatives what is the right sequence of questions to ask in order to minimize the number of questions There exists a methodology called the binary search that helps to choose the minimal number of questions if initially we had N mutually exclusive alternatives then we ask the rst yes no question so that for half of these alternatives the answer is yes and for the other half the answer is no This way on each step we halve the set of alternatives and in log N questions we narrow it down to one It has been proved that by using this sequence of questions we ask the smallest possible number of questions This is explained in practically any book on algorithms and complexity see e g The main problem with this approach is that it does not take into consideration a well known psycholog ical fact that most people feel more comfortable answering yes than answering no see e g One of the reasons for this phenomenon is the following The expert s time is valuable because of this the expert is usually asked to help only in most complex situations For example a medical expert would be normally called when an unusual situation happens In this case the expert expects to nd competent people who generally know the answers to typical questions in his area of expertise but who are puzzled by this particular unusual problem In such situations an expert is usually informed about the previous decisions and ideas of future decisions and usually he approves most of these decisions If it so happens that half of the previous decisions were wrong it usually means that the previous decision makers were incompetent in such situations the expert feels that his valuable time was wasted because the appropriate solution is not to call a highly skilled expert but rather to replace the existing decision makers with more competent people Similarly When a knowledge engineer who interviews the expert asks him questions for which most answers are yes this shows that the knowledge engineer already has some preliminary knowledge of the area and he is appropriately asking these questions to improve this knowledge If on the other hand the knowledge engineer would start asking random questions this would indicate that this engineer did not even bother to get some preliminary knowledge and therefore the highly skilled expert is inappropriately used to answer questions some of which could be answered by simply consulting a textbook or a less skilled professional The larger the number of negative answers the more discomfort the expert will experience and the larger e ort he will have to make to continue this interview In view of this phenomenon instead of minimizing the total number of questions it is more reasonable to minimize the e ort of an expert and in calculating this e ort to assign more weight to no answers than to yes ones In this paper we will formalize and solve this problem Comment Our preliminary results rst appeared in Towards Formalizing the Problem In order to formalize the problem of selecting the best search procedure we must rst formalize the notion of a search procedure Initially we have some nite set of alternatives between which we will choose We will denote this set by S S for set If this set contains more than one alternative then we must ask an expert a question and the question is supplied by this search procedure The e ect of this question is that the original set of alternatives is separated into two subsets S A A the set of all alternatives for which the answer is yes we will denote this set by A stands for yes just like in the majority of computers the set of all alternatives for which the answer is no we will denote this set by A After asking this question we thus know whether the initially unknown alternative a S belongs to the set A or to the set A This is the only result of asking the question so for our purposes it does not matter how exactly this question was formulated what matters is how the answer to this question divides the set of possible alternatives i e what are the sets A and A In principle it is possible to ask a question in such an ambiguous way that for certain alternatives a S both answers yes and no are possible i e in mathematical terms A A However we are looking for an optimal fastest ways of eliciting knowledge So if we ask instead a new question in which A A and A S A then since A A this new questions narrows down an alternative even better Therefore since we are interested in nding the fastest elicitation method it is su cient to consider unambiguous questions for which A a A It is also possible in principle to have trivial questions to which the answer is always yes or always no i e for which either A S and A or A S and A Such trivial question does not add any information and can therefore be skipped Therefore since we are interested in the fastest knowledge elicitation it makes sense to consider only pairs hA A i for which both sets A and A are non empty If each of the sets A and A contains only one alternative then there is no need to ask any further questions If one or both of the resulting sets A and A contains more than one alternative then we must continue asking questions If the answer to the rst question was yes i e if we are in the set A then after the second question the set A is divided into two subsets The set of all alternatives that correspond to answer yes to both questions we will denote this set by A The set of all alternatives for which the answers to the rst two questions are correspondingly yes and no we will denote this set by A In general for every sequence k of s and s A k will denote the set of all alternatives which are possible after we received answers k to the rst k questions In particular for an empty sequence we have A S Similarly to the above text we can argue that since we select an optimal search procedure it is su cient to consider for every only subdivisions A A A for which A A A and A For each search procedure P and for every alternative a S there exists a sequence k for which A fag we will denote this sequence by a P We will assign to each no answer a weight W and to each yes answer a weight W W Then for each alternative a we can compute the total e ort by adding the weights corresponding to all the answer from the sequence a P For di erent alternatives the e ort may be di erent As a numerical characteristic of the quality of a search procedure we will take the worst case e ort i e the largest of the e orts corresponding to di erent alternatives Our goal is to nd the search procedure for which this e ort is the smallest possible Now we are ready for the formal de nitions

@inproceedings{Nguyen2010AsymmetricIM,
title={Asymmetric Information Measures How to Extract Knowledge From an Expert},
author={Hung T. Nguyen and Elizabeth N Kamoro and Vladik Kreinovich},
year={2010}
}