Link Search Menu Expand Document

Tutorial Outline

Introduction to TAR

  • What is a TAR task
  • Comparison to other IR tasks
  • Application areas
    • Law: litigation, antitrust, investigations
    • Systematic reviews in medicine
    • Content moderation
    • Data set annotation
    • Sunshine laws, declassification, and archival tasks
    • Patent search and other high recall review tasks
  • History

Dimensions of TAR Tasks

  • Volume and temporal characteristics of data
  • Time constraints
  • Reviewer characteristics
  • Cost structure and constraints
  • Nature of classification task (single, multiple, and cascaded classifications)

TAR Workflows

  • Importance of workflow design in HITL system
  • One-phase vs. two-phase workflows
  • Quality vs. quantity of training
  • Pipeline workflows
  • Collection segmentation and multi-technique workflows
  • When to stop?

Technology: Basics

  • Review software and traditional review workflows
  • Duplicate detection, aggregation, and propagation
  • Search and querying
  • Unsupervised learning and visual analytics

Technology: Supervised Learning

  • Basics of text classification
  • Data modeling and task definition
  • Prioritization vs. classification
  • Reviewing and labeling in TAR workflows
  • Relevance feedback and other active learning approaches
  • Implications of transductive context
  • Classifier reuse and transfer learning
  • Research questions

Evaluation

  • Effectiveness measures
  • Sample-based estimation of effectiveness
  • Impact of category prevalence
  • Cost measures
  • Evaluating progress within a TAR project
  • Collection segmentation and evaluation
  • Choosing and tuning methods across multiple projects
  • Research questions

Stopping rules

  • Stopping rules, cutoffs, and workflow design
  • Cost targets and effectiveness targets
  • The cost landscape
  • Distinctions among stopping rules
    • Interventional, standoff, and hybrid rules
    • Gold standard vs. self-evaluation rules
    • Certification vs. heuristic rules
  • Example stopping rules
  • Research questions

Societal context

  • TAR and the ethical obligations of attorneys
  • Bias and ethics issues in TAR for monitoring and surveillance
  • Implications of TAR for evidence-based medicine
  • Controversies in automated content moderation
  • Research questions

Summary and Future

  • TAR research and industry practice
  • Challenges in access to data
  • The potential for interdisciplinary TAR research