[HCI] 8. Evaluation Techniques
[toc]
Evaluation Techniques
- Evaluation
- Tests usability and functionality of system
- Occurs in laboratory, field and/or in collaboration with users
- Evaluates both design and implementation
- Should be considered at all stages in the design life cycle
Goals of Evaluation
- Assess extent of system functionality
- Assess effect of interface on user
- Identify specific problems
Evaluating Designs
- Expert-based Evaluation
- Cognitive Walkthrough
- Heuristic Evaluation
- Review-based Evaluation
- Model-based Evaluation
- User-based Evaluation
Cognitive Walkthrough
- Proposed by Polson et al.
- Evaluates design on how well it supports user in learning task
- Usually performed by expert in cognitive psychology
- Expert ‘walks though’ design to identify potential problems using psychological principles
- 가이드 제공
- 심리학적 원리를 사용하여 잠재적인 문제를 식별하기 위한 전문가 ‘자세한 설명’ 설계
- Forms used to guide analysis
Cognitive Walkthrough (cont)
- For each task walkthrough considers
- What impact will interaction have on user?
- What cognitive processes are required?
- What learning problems may occur?
- Analysis focuses on goals and knowledge: does the design lead the user to generate the correct goals?
Heuristic Evaluation
의식의 흐름보다 이런 건 꼭 하면 좋겠다.(좀 더 형식적인)
- Proposed by Nielsen and Molich.
- Usability criteria (heuristics) are identified
- Design examined by experts to see if these are violated
- Example heuristics
- System behaviour is predictable (예측가능성)
- System behaviour is consistent (일관성)
- Feedback is provided (피드백 제공 여부)
- Heuristic evaluation ‘debugs’ design.
Ten Usability Heuristics for UI Design
- Visibility of system status
- 현재 내가 진행하는 프로세스가 보이는지
- Match between system and the real world
- 유저 입장에서 현실세계와 시스템을 매칭할 수 있는지
- User control and freedom
- Consistency and standards
- Error prevention
- Recognition rather than recall
- 메뉴를 보고 바로 무슨 행동을 하는 지 알수 있도록
- Flexibility and efficiency of use
- Aesthetic and minimalist design
- Help users recognize, diagnose, and recover from errors
- Help and documentation
Review-based evaluation
- Experimental results and empirical evidence from the literature (e.g., from psychology, HCI, etc) can be used to support or refute parts of design.
- It is expensive to repeat experiments continually and therefore a review of relevant literature can save resources (e.g., effort, time, finances, etc).
- 실험을 지속적으로 반복하는 것은 비용이 많이 들기 때문에 관련 문헌을 검토하면 자원을 절약할 수 있다.
- However, care should be taken to ensure results are transferable to the new design (e.g., note the design in consideration, the user audience, the assumptions made, etc).
- 그러나 결과가 새로운 설계로 이전될 수 있도록 주의해야 한다.
Model-based evaluation
- Cognitive models can be used to filter design options e.g. GOMS (Goals, Operators, Methods and Selection) model can be used to predict user performance with a user interface, keystroke-level model (KLM) can be used to predict performance for low-level tasks.
- Dialog models (e.g. STN: state transition model) can be used to evaluate dialog problems in a user interface e.g. unreachable states, circular dialogs, etc.
GOMS & KLM
- KLM or KLM-GOMS
- Belongs to the family of GOMS models
- Find more efficient ways to complete a task by analyzing
- the steps required in the process
- The keystroke-level model consists of six operators
- K: Press a Key or button time varies with user skill
- P: Point with a mouse
- H: Home to/from keyboard or other device
- D(n,l): Draw n straight lines of total length l
- M: Mentally prepare
- R(t): Response by system
User-based Evaluation
- User-based evaluation basically is evaluation through user participation, i.e. evaluation that involves the people for whom the system is intended; the users.
- User-based evaluation techniques include: experimental methods, observational methods, query techniques (e.g., questionnaires and interviews), physiological monitoring methods (e.g., eye tracking, measuring skin conductance, measuring heart rate).
- User-based methods can be conducted in the laboratory and/or in the field.
- Two Cases
- Laboratory Studies
- Field Studies
Laboratory Studies
- Advantages:
- Specialist & equipment available
- Uninterrupted environment
- Disadvantages:
- Lack of context
- Difficult to observe several users cooperating
- Appropriate
- If system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use
Field Studies
- Advantages:
- Natural environment
- Context retained (though observation may alter it)
- Longitudinal(장기적인) studies possible
- Disadvantages:
- Distractions
- Noise
- Appropriate
- Where context is crucial for longitudinal studies
Evaluating Implementations
- Requires an artefact:
- Simulation
- Prototype
- Full implementation
Experimental Evaluation
- Controlled evaluation of specific aspects of interactive behaviour
- Evaluator chooses hypothesis to be tested
- A number of experimental conditions are considered which differ only in the value of some controlled variable.
- Changes in behavioural measure are attributed to different conditions
Experimental Factors
- Subjects(실험대상)
- Who – representative, sufficient sample
- Variables
- Things to modify and measure
- Hypothesis
- What you’d like to show
- Experimental design
- How you are going to do it
Variables
- Independent Variable (IV) - 입력
- Characteristic changed to produce different conditions
- 다른 조건을 생성하도록 변경된 특성
- e.g. interface style, number of menu items
- Characteristic changed to produce different conditions
- Dependent Variable (DV) - 출력
- Characteristics measured in the experiment(실험에서 측정된 특성)
- e.g. time taken, number of errors.
Hypothesis
- Prediction of Outcome
- Framed in terms of IV and DV
- e.g. “error rate will increase as font size decreases”
- Null hypothesis:
- States no difference between conditions
- Aim is to disprove this (이것을 반증하는 것이 목표)
- e.g. null hyp. = “no change with font size”
Experimental Design
- Within groups design
- Each subject performs experiment under each condition.
- Transfer of learning possible (학습 이전 가능)
- Less costly and less likely to suffer from user variation.
- Between groups design
- Each subject performs under only one condition
- No transfer of learning
- More users required
- Variation can bias results.
Analysis of Data
- Before you start to do any statistics:
- Look at data
- Save original data
- Choice of statistical technique depends on
- Type of data
- Information required
- Type of data
- Discrete - finite number of values
- Continuous - any value
Analysis - Types of Test
- Parametric
- 가정이 좀 있는 경우
- Assume normal distribution
- Robust
- Powerful
- Non-parametric
- Do not assume normal distribution
- chi-square test
- Less powerful
- More reliable
- Do not assume normal distribution
- Contingency(보정) table
- Classify data by discrete attributes
- Count number of data items in each group
Analysis of Data (cont.)
- What information is required?
- Is there a difference?
- How big is the difference?
- How accurate is the estimate?
- Parametric and non-parametric tests mainly address first of these
Experimental Studies on Groups
- More difficult than single-user experiments
- Problems with:
- Subject groups
- Choice of task
- Data gathering
- Analysis
Subject Groups
- Larger number of subjects
- more expensive
- Longer time to `settle down’ … even more variation!
- Difficult to timetable
- So … often only three or four groups
The Task
- Must encourage cooperation(협력을 지향해야 함.)
- Perhaps involve multiple channels
- Options:
- Creative task e.g. ‘write a short report on …’
- Decision games e.g. desert survival task
- Control task e.g. ARKola bottling plant
Data Gathering
- Several video cameras + Direct logging of application
- Problems:
- Synchronisation
- Sheer volume!(엄청난 양의 데이터)
- One solution:
- Record from each perspective(필요한 상황에 따라 하나씩 조정)
Analysis
- N.B. Vast variation between groups(그룹 간 엄청난 변동)
- e.g. democratic / dominandt’
- Solutions:
- Within groups experiments
- Micro-analysis (e.g., gaps in speech)
- Anecdotal and qualitative analysis(입증되지 않고 질적인 분석)
- Look at interactions between group and media
- Controlled experiments may ‘waste’ resources!
- if the number of experimental groups is limited
Field Studies
- Experiments dominated by group formation
- Field studies more realistic:
- Distributed cognition => work studied in context
- Real action is situated action
- Physical and social environment both crucial
- Contrast:
- Psychology – controlled experiment(통제 실험)
- Sociology and anthropology – open study and rich data
- 사회학과 인류학 – 개방형 연구와 풍부한 데이터
Observational Methods
- Think Aloud
- Cooperative evaluation
- Protocol analysis
- Automated analysis
- Post-task walkthroughs
Think Aloud
- 유저가 자기가 하는 생각을 계속 description
- 사용자는 실험자가 준 과제를 수행하면서 드는 생각을 바로바로 말하고, 실험자는 그런 사용자를 관찰하는 실험중의 약속을 말합니다
- User observed performing task(사용자가 작업 수행 중임을 확인함)
- User asked to describe what he is doing and why, what he thinks is happening etc.
- 사용자는 자신이 무엇을 하고 있는지, 왜 그런지, 무슨 일이 일어나고 있다고 생각하는지 등을 설명하도록 요청된다.
- Advantages
- Simplicity - requires little expertise (적은 전문성을 요구)
- Can provide useful insight
- Can show how system is actually used
- Disadvantages
- Subjective
- Selective
- Act of describing may alter task performance
- 기술하는 행위는 작업 성과를 변화시킬 수 있다.
Cooperative Evaluation
- Variation on think aloud
- User collaborates in evaluation(사용자가 평가에 공동 작업)
- Both user and evaluator can ask each other questions throughout
- 사용자와 평가자 모두 내내 서로에게 질문을 할 수 있다.
- Additional advantages
- Less constrained and easier to use
- User is encouraged to criticize(비판) system
- Clarification possible(명확화 가능)
Protocol Analysis
- Paper and pencil – cheap, limited to writing speed
- Audio – good for think aloud, difficult to match with other protocols
- Video – accurate and realistic, needs special equipment, obtrusive(눈에 띄는) sometimes
- Computer logging – automatic and unobtrusive, large amounts of data difficult to analyze
- User notebooks – coarse and subjective, useful insights, good for longitudinal studies
- Mixed use in practice.
- Audio/video transcription difficult and requires skill.
- Some automatic support tools available
Automated Analysis – Video Annotation
- Interface
- How many objects should we annotate(주석을 달다) at once?
- How do we visualize space and time?
- How do we select key frames?
- Crowdsourcing
- How do we split up work?
- How do we do quality control?
- Interpolation / Tracking
- How do we learn the appearance of an object?
- How do we interpolate to minimize effort?
Post-task Walkthroughs
- User reacts on action after the event
- Useful to identify reasons for actions and alternatives considered
- 검토된 조치 및 대안에 대한 이유를 식별하는 데 유용함
- Necessary in cases where think aloud is not possible
- think aloud가 불가능한 경우에 필요하다.
- Transcript played back to participant for comment
- 설명을 위해 참가자에게 대화 내용을 재생했습니다.
- Immediately => fresh in mind
- Delayed => evaluator has time to identify questions
- Advantages
- Analyst has time to focus on relevant incidents
- Avoid excessive interruption of task
- 과도한 작업 중단 방지
- Disadvantages
- Lack of freshness
- May be post-hoc interpretation of events
- 이벤트에 대한 사후 해석일 수 있습니다.
- 이벤트 끝나고 나서 자기가 다시 평가하는 것에 묶임?(bias 되어 버림)
Query Techniques
- Interviews
- Questionnaires
Interviews
- Analyst questions user on one-to-one basis usually based on prepared questions
- Informal, subjective and relatively cheap
- Advantages
- Can be varied to suit context(상황에 맞게 변경 가능)
- Issues can be explored more fully(문제를 보다 완벽하게 탐색할 수 있습니다.)
- Can elicit user views and identify unanticipated problems
- 사용자 뷰를 유도하고 예상치 못한 문제를 식별할 수 있습니다.
- Disadvantages
- Very subjective(매우 주관적)
- Time consuming
Questionnaires
- Set of fixed questions given to users
- Advantages
- Quick and reaches large user group
- Can be analyzed more rigorously(엄격하게)
- Disadvantages
- Less flexible
- Less probing
Questionnaires (cont.)
- Need careful design
- What information is required?
- How are answers to be analyzed?
- Styles of question
- General
- Open-ended
- Scalar
- Multi-choice
- Ranked
Physiological Methods
- Eye tracking
- Physiological measurement
Eye Tracking
- Head or desk mounted equipment tracks the position of the eye
- Eye movement reflects the amount of cognitive processing a display requires
- Measurements include
- Fixations: eye maintains stable position. Number and duration indicate level of difficulty with display
- Saccades: rapid eye movement from one point of interest to another
- 청각 또는 시자극에 의해 무의식적으로 시 선을 옮기는 것
- Scan paths: moving straight to a target with a short fixation at the target is optimal
- 표적에 짧은 고정으로 표적으로 바로 이동하는 것이 최적입니다.
Physiological Measurements
- Emotional response linked to physical changes
- These may help determine a user’s reaction to an interface
- Measurements include:
- Heart activity, including blood pressure, volume and pulse.
- Activity of sweat glands(땀샘의 활동): Galvanic Skin Response (GSR)
- Electrical activity in muscle: electromyogram (EMG)
- Electrical activity in brain: electroencephalogram (EEG)
- Some difficulty in interpreting these physiological responses - more research needed
Choosing an Evaluation Method
- When in process: design vs. implementation
- Style of evaluation: laboratory vs. field
- How objective: subjective vs. objective
- Type of measures: qualitative vs. quantitative
- Level of information: high level vs. low level
- Level of interference: obtrusive vs. unobtrusive
- Resources available: time, subjects, equipment, expertise
댓글남기기