Resources and Help

Google Group

We have created a Google Group Data Science for Business for discussions of the book.  This group is a place to share figures, slides, assignments, project ideas, data, code, and so on.  Foster and I both read it. If you have a question, that's probably the best place to ask.

This link should take you to the Google group page in a browser.  You'll have to request access, but you should be approved quickly.

Google Group for Instructors

There is a companion group, Data Science for Business (INSTRS) intended for instructors only.  You'll be asked to provide some basic info about your course before you're admitted.  That private group contains teaching-related material such as slides, assignments, projects, exam questions, etc.  That group was created because some instructors want to share teaching material among other instructors without making it public.  If you've adopted Data Science for Business as a course text, or are considering adopting it, please join that group too.

Companion Hands-on Course (Python)

The book mentions “material for learning additional skills and concepts (for example, scripting in Python, Unix command-line processing, datafiles, ..., and other related topics).”  That material comes from a separate course taught at NYU/Stern by Josh Attenberg and Foster Provost called Practical Data Science. It is a hands-on companion course to Data Science for Business.  The Fall 2012 course notes are available online here.

Naive-Bayes spreadsheet

Toward the end of Chapter 9 ("Evidence and Probabilities") we mention a Naive Bayes spreadsheet.  We currently consider it damaged, as it has problems of presentation that will have to be re-thought.  Fixing it is on our list, but in the meantime we discourage people from downloading it.

Case Study for Teaching Data Science

An extensively documented case study for teaching data science, with data and Jupyter notebooks, is available as an article in Big Data journal.


In our book we try to illustrate what we can, but sometimes animations or interactive visualizations are better.  Here are pointers to useful visualizations we've found:

- k-means clustering process

- Conditional probability and Bayes Rule

- Decision trees

- Bias-variance trade-off