[HCI] 8. Evaluation Techniques

9 분 소요

[toc]

Evaluation Techniques

Evaluation
- Tests usability and functionality of system
- Occurs in laboratory, field and/or in collaboration with users
- Evaluates both design and implementation
- Should be considered at all stages in the design life cycle

Goals of Evaluation

Assess extent of system functionality
Assess effect of interface on user
Identify specific problems

Evaluating Designs

Expert-based Evaluation
- Cognitive Walkthrough
- Heuristic Evaluation
- Review-based Evaluation
Model-based Evaluation
User-based Evaluation

Cognitive Walkthrough

Proposed by Polson et al.
- Evaluates design on how well it supports user in learning task
- Usually performed by expert in cognitive psychology
- Expert ‘walks though’ design to identify potential problems using psychological principles
  - 가이드 제공
  - 심리학적 원리를 사용하여 잠재적인 문제를 식별하기 위한 전문가 ‘자세한 설명’ 설계
- Forms used to guide analysis

Cognitive Walkthrough (cont)

For each task walkthrough considers
- What impact will interaction have on user?
- What cognitive processes are required?
- What learning problems may occur?
Analysis focuses on goals and knowledge: does the design lead the user to generate the correct goals?

Heuristic Evaluation

의식의 흐름보다 이런 건 꼭 하면 좋겠다.(좀 더 형식적인)

Proposed by Nielsen and Molich.
Usability criteria (heuristics) are identified
Design examined by experts to see if these are violated
Example heuristics
- System behaviour is predictable (예측가능성)
- System behaviour is consistent (일관성)
- Feedback is provided (피드백 제공 여부)
Heuristic evaluation ‘debugs’ design.

Ten Usability Heuristics for UI Design

Visibility of system status
- 현재 내가 진행하는 프로세스가 보이는지
Match between system and the real world
- 유저 입장에서 현실세계와 시스템을 매칭할 수 있는지
User control and freedom
Consistency and standards
Error prevention
Recognition rather than recall
- 메뉴를 보고 바로 무슨 행동을 하는 지 알수 있도록
Flexibility and efficiency of use
Aesthetic and minimalist design
Help users recognize, diagnose, and recover from errors
Help and documentation

Review-based evaluation

Experimental results and empirical evidence from the literature (e.g., from psychology, HCI, etc) can be used to support or refute parts of design.
It is expensive to repeat experiments continually and therefore a review of relevant literature can save resources (e.g., effort, time, finances, etc).
- 실험을 지속적으로 반복하는 것은 비용이 많이 들기 때문에 관련 문헌을 검토하면 자원을 절약할 수 있다.
However, care should be taken to ensure results are transferable to the new design (e.g., note the design in consideration, the user audience, the assumptions made, etc).
- 그러나 결과가 새로운 설계로 이전될 수 있도록 주의해야 한다.

Model-based evaluation

Cognitive models can be used to filter design options e.g. GOMS (Goals, Operators, Methods and Selection) model can be used to predict user performance with a user interface, keystroke-level model (KLM) can be used to predict performance for low-level tasks.
Dialog models (e.g. STN: state transition model) can be used to evaluate dialog problems in a user interface e.g. unreachable states, circular dialogs, etc.

GOMS & KLM

KLM or KLM-GOMS
- Belongs to the family of GOMS models
- Find more efficient ways to complete a task by analyzing
- the steps required in the process
The keystroke-level model consists of six operators
- K: Press a Key or button time varies with user skill
- P: Point with a mouse
- H: Home to/from keyboard or other device
- D(n,l): Draw n straight lines of total length l
- M: Mentally prepare
- R(t): Response by system

User-based Evaluation

User-based evaluation basically is evaluation through user participation, i.e. evaluation that involves the people for whom the system is intended; the users.
User-based evaluation techniques include: experimental methods, observational methods, query techniques (e.g., questionnaires and interviews), physiological monitoring methods (e.g., eye tracking, measuring skin conductance, measuring heart rate).
User-based methods can be conducted in the laboratory and/or in the field.
Two Cases
- Laboratory Studies
- Field Studies

Laboratory Studies

Advantages:
- Specialist & equipment available
- Uninterrupted environment
Disadvantages:
- Lack of context
- Difficult to observe several users cooperating
Appropriate
- If system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use

Field Studies

Advantages:
- Natural environment
- Context retained (though observation may alter it)
- Longitudinal(장기적인) studies possible
Disadvantages:
- Distractions
- Noise
Appropriate
- Where context is crucial for longitudinal studies

Evaluating Implementations

Requires an artefact:
- Simulation
- Prototype
- Full implementation

Experimental Evaluation

Controlled evaluation of specific aspects of interactive behaviour
Evaluator chooses hypothesis to be tested
A number of experimental conditions are considered which differ only in the value of some controlled variable.
Changes in behavioural measure are attributed to different conditions

Experimental Factors

Subjects(실험대상)
- Who – representative, sufficient sample
Variables
- Things to modify and measure
Hypothesis
- What you’d like to show
Experimental design
- How you are going to do it

Variables

Independent Variable (IV) - 입력
- Characteristic changed to produce different conditions
  - 다른 조건을 생성하도록 변경된 특성
- e.g. interface style, number of menu items
Dependent Variable (DV) - 출력
- Characteristics measured in the experiment(실험에서 측정된 특성)
- e.g. time taken, number of errors.

Hypothesis

Prediction of Outcome
- Framed in terms of IV and DV
- e.g. “error rate will increase as font size decreases”
Null hypothesis:
- States no difference between conditions
- Aim is to disprove this (이것을 반증하는 것이 목표)
- e.g. null hyp. = “no change with font size”

Experimental Design

Within groups design
- Each subject performs experiment under each condition.
- Transfer of learning possible (학습 이전 가능)
- Less costly and less likely to suffer from user variation.
Between groups design
- Each subject performs under only one condition
- No transfer of learning
- More users required
- Variation can bias results.

Analysis of Data

Before you start to do any statistics:
- Look at data
- Save original data
Choice of statistical technique depends on
- Type of data
- Information required
- Type of data
  - Discrete - finite number of values
  - Continuous - any value

Analysis - Types of Test

Parametric
- 가정이 좀 있는 경우
- Assume normal distribution
- Robust
- Powerful
Non-parametric
- Do not assume normal distribution
  - chi-square test
- Less powerful
- More reliable
Contingency(보정) table
- Classify data by discrete attributes
- Count number of data items in each group

Analysis of Data (cont.)

What information is required?
- Is there a difference?
- How big is the difference?
- How accurate is the estimate?
Parametric and non-parametric tests mainly address first of these

Experimental Studies on Groups

More difficult than single-user experiments
Problems with:
- Subject groups
- Choice of task
- Data gathering
- Analysis

Subject Groups

Larger number of subjects
more expensive
Longer time to `settle down’ … even more variation!
Difficult to timetable
So … often only three or four groups

The Task

Must encourage cooperation(협력을 지향해야 함.)
Perhaps involve multiple channels
Options:
- Creative task e.g. ‘write a short report on …’
- Decision games e.g. desert survival task
- Control task e.g. ARKola bottling plant

Data Gathering

Several video cameras + Direct logging of application
Problems:
- Synchronisation
- Sheer volume!(엄청난 양의 데이터)
One solution:
- Record from each perspective(필요한 상황에 따라 하나씩 조정)

Analysis

N.B. Vast variation between groups(그룹 간 엄청난 변동)
- e.g. democratic / dominandt’
Solutions:
- Within groups experiments
- Micro-analysis (e.g., gaps in speech)
- Anecdotal and qualitative analysis(입증되지 않고 질적인 분석)
Look at interactions between group and media
Controlled experiments may ‘waste’ resources!
- if the number of experimental groups is limited

Field Studies

Experiments dominated by group formation
Field studies more realistic:
- Distributed cognition => work studied in context
- Real action is situated action
- Physical and social environment both crucial
Contrast:
- Psychology – controlled experiment(통제 실험)
- Sociology and anthropology – open study and rich data
  - 사회학과 인류학 – 개방형 연구와 풍부한 데이터

Observational Methods

Think Aloud
Cooperative evaluation
Protocol analysis
Automated analysis
Post-task walkthroughs

Think Aloud

유저가 자기가 하는 생각을 계속 description
- 사용자는 실험자가 준 과제를 수행하면서 드는 생각을 바로바로 말하고, 실험자는 그런 사용자를 관찰하는 실험중의 약속을 말합니다
User observed performing task(사용자가 작업 수행 중임을 확인함)
User asked to describe what he is doing and why, what he thinks is happening etc.
- 사용자는 자신이 무엇을 하고 있는지, 왜 그런지, 무슨 일이 일어나고 있다고 생각하는지 등을 설명하도록 요청된다.
Advantages
- Simplicity - requires little expertise (적은 전문성을 요구)
- Can provide useful insight
- Can show how system is actually used
Disadvantages
- Subjective
- Selective
- Act of describing may alter task performance
  - 기술하는 행위는 작업 성과를 변화시킬 수 있다.

Cooperative Evaluation

Variation on think aloud
User collaborates in evaluation(사용자가 평가에 공동 작업)
Both user and evaluator can ask each other questions throughout
- 사용자와 평가자 모두 내내 서로에게 질문을 할 수 있다.
Additional advantages
- Less constrained and easier to use
- User is encouraged to criticize(비판) system
- Clarification possible(명확화 가능)

Protocol Analysis

Paper and pencil – cheap, limited to writing speed
Audio – good for think aloud, difficult to match with other protocols
Video – accurate and realistic, needs special equipment, obtrusive(눈에 띄는) sometimes
Computer logging – automatic and unobtrusive, large amounts of data difficult to analyze
User notebooks – coarse and subjective, useful insights, good for longitudinal studies
Mixed use in practice.
- Audio/video transcription difficult and requires skill.
- Some automatic support tools available

Automated Analysis – Video Annotation

Interface
- How many objects should we annotate(주석을 달다) at once?
- How do we visualize space and time?
- How do we select key frames?
Crowdsourcing
- How do we split up work?
- How do we do quality control?
Interpolation / Tracking
- How do we learn the appearance of an object?
- How do we interpolate to minimize effort?

Post-task Walkthroughs

User reacts on action after the event
Useful to identify reasons for actions and alternatives considered
- 검토된 조치 및 대안에 대한 이유를 식별하는 데 유용함
Necessary in cases where think aloud is not possible
- think aloud가 불가능한 경우에 필요하다.
Transcript played back to participant for comment
- 설명을 위해 참가자에게 대화 내용을 재생했습니다.
- Immediately => fresh in mind
- Delayed => evaluator has time to identify questions
Advantages
- Analyst has time to focus on relevant incidents
- Avoid excessive interruption of task
  - 과도한 작업 중단 방지
Disadvantages
- Lack of freshness
- May be post-hoc interpretation of events
  - 이벤트에 대한 사후 해석일 수 있습니다.
- 이벤트 끝나고 나서 자기가 다시 평가하는 것에 묶임?(bias 되어 버림)

Query Techniques

Interviews
Questionnaires

Interviews

Analyst questions user on one-to-one basis usually based on prepared questions
Informal, subjective and relatively cheap
Advantages
- Can be varied to suit context(상황에 맞게 변경 가능)
- Issues can be explored more fully(문제를 보다 완벽하게 탐색할 수 있습니다.)
- Can elicit user views and identify unanticipated problems
  - 사용자 뷰를 유도하고 예상치 못한 문제를 식별할 수 있습니다.
Disadvantages
- Very subjective(매우 주관적)
- Time consuming

Questionnaires

Set of fixed questions given to users
Advantages
- Quick and reaches large user group
- Can be analyzed more rigorously(엄격하게)
Disadvantages
- Less flexible
- Less probing

Questionnaires (cont.)

Need careful design
- What information is required?
- How are answers to be analyzed?
Styles of question
- General
- Open-ended
- Scalar
- Multi-choice
- Ranked

Physiological Methods

Eye tracking
Physiological measurement

Eye Tracking

Head or desk mounted equipment tracks the position of the eye
Eye movement reflects the amount of cognitive processing a display requires
Measurements include
- Fixations: eye maintains stable position. Number and duration indicate level of difficulty with display
- Saccades: rapid eye movement from one point of interest to another
  - 청각 또는 시자극에 의해 무의식적으로 시 선을 옮기는 것
- Scan paths: moving straight to a target with a short fixation at the target is optimal
  - 표적에 짧은 고정으로 표적으로 바로 이동하는 것이 최적입니다.

Physiological Measurements

Emotional response linked to physical changes
These may help determine a user’s reaction to an interface
Measurements include:
- Heart activity, including blood pressure, volume and pulse.
- Activity of sweat glands(땀샘의 활동): Galvanic Skin Response (GSR)
- Electrical activity in muscle: electromyogram (EMG)
- Electrical activity in brain: electroencephalogram (EEG)
Some difficulty in interpreting these physiological responses - more research needed

Choosing an Evaluation Method

When in process: design vs. implementation
Style of evaluation: laboratory vs. field
How objective: subjective vs. objective
Type of measures: qualitative vs. quantitative
Level of information: high level vs. low level
Level of interference: obtrusive vs. unobtrusive
Resources available: time, subjects, equipment, expertise

Twitter Facebook LinkedIn

Kang Chang Ryong

Evaluation Techniques

Goals of Evaluation

Evaluating Designs

Cognitive Walkthrough

Cognitive Walkthrough (cont)

Heuristic Evaluation

Ten Usability Heuristics for UI Design

Review-based evaluation

Model-based evaluation

GOMS & KLM

User-based Evaluation

Laboratory Studies

Field Studies

Evaluating Implementations

Experimental Evaluation

Experimental Factors

Variables

Hypothesis

Experimental Design

Analysis of Data

Analysis - Types of Test

Analysis of Data (cont.)

Experimental Studies on Groups

Subject Groups

The Task

Data Gathering

Analysis

Field Studies

Observational Methods

Think Aloud

Cooperative Evaluation

Protocol Analysis

Automated Analysis – Video Annotation

Post-task Walkthroughs

Query Techniques

Interviews

Questionnaires

Questionnaires (cont.)

Physiological Methods

Eye Tracking

Physiological Measurements

Choosing an Evaluation Method

공유하기

댓글남기기

참고

[OS] Virtual Memory

[HCI] 10. Universal Design-Multi-Sensory Systems

[HCI] 9. Dialogue Notations and Design 2

[HCI] 9. Dialogue Notations and Design 1