CS561 Topics in Data Privacy

Fall 2024


Course Description

This course will explore both threats to privacy and solutions to the data privacy problem. In the first half of the course, we will demonstrate that traditional approaches to protecting privacy, such as anonymization, are subject to powerful attacks that reveal individuals’ sensitive data. We will cover more recent approaches for protecting privacy, including k-anonymity and l-diversity. We will also discuss a variety of privacy enhancing technologies such as secure multi-party computation, zero-knowledge proofs, homomorphic encryption and Tor. The second half of the course will focus on the current de facto standard for data privacy—differential privacy (DP). We will cover fundamentals of DP, including its formal definitions, composition theorems and basic algorithms to satisfy DP. We will also cover the local model of DP which is used by industry to collect sensitive data from users. Time permitting, we will cover more advanced applications of DP including synthetic data generation and machine learning on private data. Coursework will include implementing privacy algorithms and running simulations using popular languages for data science such as Python and Julia. Prior background is not assumed.

  • Class Time: T&R 2:50 – 4:15 PM
  • Location: Library North 1120
  • Instructor: Zeyu Ding
    • Email: dding1@binghamton.edu
    • Office: EB N34
    • Office hours: Wed 1:00 – 3:00 PM or by appointment
  • TA: TBA
    • Email: TBA

Learning Objectives

By the end of this course, you will be able to:

  • Identify and demonstrate risks to privacy in data science settings
  • Define and apply formal notions of privacy, including k-anonymity and differential privacy
  • Correctly match differential privacy technology with an application
  • Safely implement privacy solutions, and experimentally validate the performance and utility of algorithms
  • Understand differential privacy at a level sufficient to engage in research about best practices in implementation, apply the material in practice, and/or connect it to other areas

Prerequisites

  • Programming language: Python or another language for data science
  • Data Structures and Algorithms: CS 240 or equivalent
  • Linear Algebra: MATH 304 or equivalent
  • Probability and Statistics: MATH 327 or MATH 448 or equivalent

Textbooks and Other Materials

  • No textbook is required. The following is the main reference book for differential privacy:
  • The following resources may also be useful for additional reading:
  • Lecture notes and supplemental materials
    • Lecture notes for each chapter, in PDF format, as well as some relevant supplemental materials will be posted on Brightspace before lectures. Lecture notes do not substitute for class attendance, since (i) they do not contain all the details in terms of explanations and analyses and (ii) significant parts of lectures, including discussions and presentations, may not come from the lecture notes
  • Additional materials will be added as appropriate