9 분 소요

[toc]

Evaluation Techniques

  • Evaluation
    • Tests usability and functionality of system
    • Occurs in laboratory, field and/or in collaboration with users
    • Evaluates both design and implementation
    • Should be considered at all stages in the design life cycle


Goals of Evaluation

  • Assess extent of system functionality
  • Assess effect of interface on user
  • Identify specific problems


Evaluating Designs

  • Expert-based Evaluation
    • Cognitive Walkthrough
    • Heuristic Evaluation
    • Review-based Evaluation
  • Model-based Evaluation
  • User-based Evaluation


Cognitive Walkthrough

  • Proposed by Polson et al.
    • Evaluates design on how well it supports user in learning task
    • Usually performed by expert in cognitive psychology
    • Expert ‘walks though’ design to identify potential problems using psychological principles
      • 가이드 제공
      • 심리학적 원리를 사용하여 잠재적인 문제를 식별하기 위한 전문가 ‘자세한 설명’ 설계
    • Forms used to guide analysis


Cognitive Walkthrough (cont)

  • For each task walkthrough considers
    • What impact will interaction have on user?
    • What cognitive processes are required?
    • What learning problems may occur?
  • Analysis focuses on goals and knowledge: does the design lead the user to generate the correct goals?


Heuristic Evaluation

의식의 흐름보다 이런 건 꼭 하면 좋겠다.(좀 더 형식적인)

  • Proposed by Nielsen and Molich.
  • Usability criteria (heuristics) are identified
  • Design examined by experts to see if these are violated
  • Example heuristics
    • System behaviour is predictable (예측가능성)
    • System behaviour is consistent (일관성)
    • Feedback is provided (피드백 제공 여부)
  • Heuristic evaluation ‘debugs’ design.


Ten Usability Heuristics for UI Design

  1. Visibility of system status
    • 현재 내가 진행하는 프로세스가 보이는지
  2. Match between system and the real world
    • 유저 입장에서 현실세계와 시스템을 매칭할 수 있는지
  3. User control and freedom
  4. Consistency and standards
  5. Error prevention
  6. Recognition rather than recall
    • 메뉴를 보고 바로 무슨 행동을 하는 지 알수 있도록
  7. Flexibility and efficiency of use
  8. Aesthetic and minimalist design
  9. Help users recognize, diagnose, and recover from errors
  10. Help and documentation


Review-based evaluation

  • Experimental results and empirical evidence from the literature (e.g., from psychology, HCI, etc) can be used to support or refute parts of design.
  • It is expensive to repeat experiments continually and therefore a review of relevant literature can save resources (e.g., effort, time, finances, etc).
    • 실험을 지속적으로 반복하는 것은 비용이 많이 들기 때문에 관련 문헌을 검토하면 자원을 절약할 수 있다.
  • However, care should be taken to ensure results are transferable to the new design (e.g., note the design in consideration, the user audience, the assumptions made, etc).
    • 그러나 결과가 새로운 설계로 이전될 수 있도록 주의해야 한다.


Model-based evaluation

  • Cognitive models can be used to filter design options e.g. GOMS (Goals, Operators, Methods and Selection) model can be used to predict user performance with a user interface, keystroke-level model (KLM) can be used to predict performance for low-level tasks.
  • Dialog models (e.g. STN: state transition model) can be used to evaluate dialog problems in a user interface e.g. unreachable states, circular dialogs, etc.

image-20221010143954829


GOMS & KLM

  • KLM or KLM-GOMS
    • Belongs to the family of GOMS models
    • Find more efficient ways to complete a task by analyzing
    • the steps required in the process
  • The keystroke-level model consists of six operators
    • K: Press a Key or button time varies with user skill
    • P: Point with a mouse
    • H: Home to/from keyboard or other device
    • D(n,l): Draw n straight lines of total length l
    • M: Mentally prepare
    • R(t): Response by system


User-based Evaluation

  • User-based evaluation basically is evaluation through user participation, i.e. evaluation that involves the people for whom the system is intended; the users.
  • User-based evaluation techniques include: experimental methods, observational methods, query techniques (e.g., questionnaires and interviews), physiological monitoring methods (e.g., eye tracking, measuring skin conductance, measuring heart rate).
  • User-based methods can be conducted in the laboratory and/or in the field.
  • Two Cases
    • Laboratory Studies
    • Field Studies


Laboratory Studies

  • Advantages:
    • Specialist & equipment available
    • Uninterrupted environment
  • Disadvantages:
    • Lack of context
    • Difficult to observe several users cooperating
  • Appropriate
    • If system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use


Field Studies

  • Advantages:
    • Natural environment
    • Context retained (though observation may alter it)
    • Longitudinal(장기적인) studies possible
  • Disadvantages:
    • Distractions
    • Noise
  • Appropriate
    • Where context is crucial for longitudinal studies


Evaluating Implementations

  • Requires an artefact:
    • Simulation
    • Prototype
    • Full implementation


Experimental Evaluation

  • Controlled evaluation of specific aspects of interactive behaviour
  • Evaluator chooses hypothesis to be tested
  • A number of experimental conditions are considered which differ only in the value of some controlled variable.
  • Changes in behavioural measure are attributed to different conditions


Experimental Factors

  • Subjects(실험대상)
    • Who – representative, sufficient sample
  • Variables
    • Things to modify and measure
  • Hypothesis
    • What you’d like to show
  • Experimental design
    • How you are going to do it


Variables

  • Independent Variable (IV) - 입력
    • Characteristic changed to produce different conditions
      • 다른 조건을 생성하도록 변경된 특성
    • e.g. interface style, number of menu items
  • Dependent Variable (DV) - 출력
    • Characteristics measured in the experiment(실험에서 측정된 특성)
    • e.g. time taken, number of errors.


Hypothesis

  • Prediction of Outcome
    • Framed in terms of IV and DV
    • e.g. “error rate will increase as font size decreases”
  • Null hypothesis:
    • States no difference between conditions
    • Aim is to disprove this (이것을 반증하는 것이 목표)
    • e.g. null hyp. = “no change with font size”


Experimental Design

  • Within groups design
    • Each subject performs experiment under each condition.
    • Transfer of learning possible (학습 이전 가능)
    • Less costly and less likely to suffer from user variation.
  • Between groups design
    • Each subject performs under only one condition
    • No transfer of learning
    • More users required
    • Variation can bias results.


Analysis of Data

  • Before you start to do any statistics:
    • Look at data
    • Save original data
  • Choice of statistical technique depends on
    • Type of data
    • Information required
    • Type of data
      • Discrete - finite number of values
      • Continuous - any value


Analysis - Types of Test

  • Parametric
    • 가정이 좀 있는 경우
    • Assume normal distribution
    • Robust
    • Powerful
  • Non-parametric
    • Do not assume normal distribution
      • chi-square test
    • Less powerful
    • More reliable
  • Contingency(보정) table
    • Classify data by discrete attributes
    • Count number of data items in each group


Analysis of Data (cont.)

  • What information is required?
    • Is there a difference?
    • How big is the difference?
    • How accurate is the estimate?
  • Parametric and non-parametric tests mainly address first of these


Experimental Studies on Groups

  • More difficult than single-user experiments
  • Problems with:
    • Subject groups
    • Choice of task
    • Data gathering
    • Analysis


Subject Groups

  • Larger number of subjects
  • more expensive
  • Longer time to `settle down’ … even more variation!
  • Difficult to timetable
  • So … often only three or four groups


The Task

  • Must encourage cooperation(협력을 지향해야 함.)
  • Perhaps involve multiple channels
  • Options:
    • Creative task e.g. ‘write a short report on …’
    • Decision games e.g. desert survival task
    • Control task e.g. ARKola bottling plant


Data Gathering

  • Several video cameras + Direct logging of application
  • Problems:
    • Synchronisation
    • Sheer volume!(엄청난 양의 데이터)
  • One solution:
    • Record from each perspective(필요한 상황에 따라 하나씩 조정)


Analysis

  • N.B. Vast variation between groups(그룹 간 엄청난 변동)
    • e.g. democratic / dominandt’
  • Solutions:
    • Within groups experiments
    • Micro-analysis (e.g., gaps in speech)
    • Anecdotal and qualitative analysis(입증되지 않고 질적인 분석)
  • Look at interactions between group and media
  • Controlled experiments may ‘waste’ resources!
    • if the number of experimental groups is limited


Field Studies

  • Experiments dominated by group formation
  • Field studies more realistic:
    • Distributed cognition => work studied in context
    • Real action is situated action
    • Physical and social environment both crucial
  • Contrast:
    • Psychology – controlled experiment(통제 실험)
    • Sociology and anthropology – open study and rich data
      • 사회학과 인류학 – 개방형 연구와 풍부한 데이터


Observational Methods

  • Think Aloud
  • Cooperative evaluation
  • Protocol analysis
  • Automated analysis
  • Post-task walkthroughs


Think Aloud

  • 유저가 자기가 하는 생각을 계속 description
    • 사용자는 실험자가 준 과제를 수행하면서 드는 생각을 바로바로 말하고, 실험자는 그런 사용자를 관찰하는 실험중의 약속을 말합니다
  • User observed performing task(사용자가 작업 수행 중임을 확인함)
  • User asked to describe what he is doing and why, what he thinks is happening etc.
    • 사용자는 자신이 무엇을 하고 있는지, 왜 그런지, 무슨 일이 일어나고 있다고 생각하는지 등을 설명하도록 요청된다.
  • Advantages
    • Simplicity - requires little expertise (적은 전문성을 요구)
    • Can provide useful insight
    • Can show how system is actually used
  • Disadvantages
    • Subjective
    • Selective
    • Act of describing may alter task performance
      • 기술하는 행위는 작업 성과를 변화시킬 수 있다.


Cooperative Evaluation

  • Variation on think aloud
  • User collaborates in evaluation(사용자가 평가에 공동 작업)
  • Both user and evaluator can ask each other questions throughout
    • 사용자와 평가자 모두 내내 서로에게 질문을 할 수 있다.
  • Additional advantages
    • Less constrained and easier to use
    • User is encouraged to criticize(비판) system
    • Clarification possible(명확화 가능)


Protocol Analysis

  • Paper and pencil – cheap, limited to writing speed
  • Audio – good for think aloud, difficult to match with other protocols
  • Video – accurate and realistic, needs special equipment, obtrusive(눈에 띄는) sometimes
  • Computer logging – automatic and unobtrusive, large amounts of data difficult to analyze
  • User notebooks – coarse and subjective, useful insights, good for longitudinal studies
  • Mixed use in practice.
    • Audio/video transcription difficult and requires skill.
    • Some automatic support tools available


Automated Analysis – Video Annotation

  • Interface
    • How many objects should we annotate(주석을 달다) at once?
    • How do we visualize space and time?
    • How do we select key frames?
  • Crowdsourcing
    • How do we split up work?
    • How do we do quality control?
  • Interpolation / Tracking
    • How do we learn the appearance of an object?
    • How do we interpolate to minimize effort?


Post-task Walkthroughs

  • User reacts on action after the event
  • Useful to identify reasons for actions and alternatives considered
    • 검토된 조치 및 대안에 대한 이유를 식별하는 데 유용함
  • Necessary in cases where think aloud is not possible
    • think aloud가 불가능한 경우에 필요하다.
  • Transcript played back to participant for comment
    • 설명을 위해 참가자에게 대화 내용을 재생했습니다.
    • Immediately => fresh in mind
    • Delayed => evaluator has time to identify questions
  • Advantages
    • Analyst has time to focus on relevant incidents
    • Avoid excessive interruption of task
      • 과도한 작업 중단 방지
  • Disadvantages
    • Lack of freshness
    • May be post-hoc interpretation of events
      • 이벤트에 대한 사후 해석일 수 있습니다.
    • 이벤트 끝나고 나서 자기가 다시 평가하는 것에 묶임?(bias 되어 버림)


Query Techniques

  • Interviews
  • Questionnaires


Interviews

  • Analyst questions user on one-to-one basis usually based on prepared questions
  • Informal, subjective and relatively cheap
  • Advantages
    • Can be varied to suit context(상황에 맞게 변경 가능)
    • Issues can be explored more fully(문제를 보다 완벽하게 탐색할 수 있습니다.)
    • Can elicit user views and identify unanticipated problems
      • 사용자 뷰를 유도하고 예상치 못한 문제를 식별할 수 있습니다.
  • Disadvantages
    • Very subjective(매우 주관적)
    • Time consuming


Questionnaires

  • Set of fixed questions given to users
  • Advantages
    • Quick and reaches large user group
    • Can be analyzed more rigorously(엄격하게)
  • Disadvantages
    • Less flexible
    • Less probing


Questionnaires (cont.)

  • Need careful design
    • What information is required?
    • How are answers to be analyzed?
  • Styles of question
    • General
    • Open-ended
    • Scalar
    • Multi-choice
    • Ranked


Physiological Methods

  • Eye tracking
  • Physiological measurement


Eye Tracking

  • Head or desk mounted equipment tracks the position of the eye
  • Eye movement reflects the amount of cognitive processing a display requires
  • Measurements include
    • Fixations: eye maintains stable position. Number and duration indicate level of difficulty with display
    • Saccades: rapid eye movement from one point of interest to another
      • 청각 또는 시자극에 의해 무의식적으로 시 선을 옮기는 것
    • Scan paths: moving straight to a target with a short fixation at the target is optimal
      • 표적에 짧은 고정으로 표적으로 바로 이동하는 것이 최적입니다.


Physiological Measurements

  • Emotional response linked to physical changes
  • These may help determine a user’s reaction to an interface
  • Measurements include:
    • Heart activity, including blood pressure, volume and pulse.
    • Activity of sweat glands(땀샘의 활동): Galvanic Skin Response (GSR)
    • Electrical activity in muscle: electromyogram (EMG)
    • Electrical activity in brain: electroencephalogram (EEG)
  • Some difficulty in interpreting these physiological responses - more research needed


Choosing an Evaluation Method

  • When in process: design vs. implementation
  • Style of evaluation: laboratory vs. field
  • How objective: subjective vs. objective
  • Type of measures: qualitative vs. quantitative
  • Level of information: high level vs. low level
  • Level of interference: obtrusive vs. unobtrusive
  • Resources available: time, subjects, equipment, expertise

댓글남기기