Teaching Responsible Data Science: Charting New Pedagogical Territory


Julia Stoyanovich, director of the Center for Responsible AI (R/AI) at NYU Tandon, and assistant professor of computer science and engineering and of data science, co-authored this paper with Armanda Lewis, a graduate student pursuing her master’s at the NYU Center for Data Science.

The authors detail their development of and pedagogy for a technical course focused on responsible data science, which tackles the issues of ethics in AI, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection.

The ability to interpret machine-assisted decision-making is an important component of responsible data science that gives a good lens through which to see other responsible data science topics, including privacy and fairness. The researchers’ study includes best practices for teaching technical data science and AI courses that focus on interpretability, and tying responsible data science to current learning science and learning analytics research.

The work also explores the use of “nutritional labels” — a family of interpretability tools that are gaining popularity in responsible data science research and practice — for interpreting machine learning models.

  • In the paper, the investigators offer a description of a unique course on responsible data science that is geared toward technical students, and incorporates topics from social science, ethics and law.
  • The work connects theories and advances within the learning sciences to the teaching of responsible data science, specifically, interpretability — allowing humans to understand, trust and, if necessary, contest the computational process and its outcomes. The study asserts that interpretability is central to the critical study of the underlying computational elements of machine learning platforms.
  • The collaborators assert that they are among the first to consider the pedagogical implications of responsible data science, creating parallels between cutting-edge data science research and cutting-edge educational research within the fields of learning sciences, artificial intelligence in education, and learning analytics and knowledge.

Additionally, the authors propose a set of pedagogical techniques for teaching the interpretability of data and models, positioning interpretability as a central integrative component of responsible data science.