Resources and Help

On this page, you will find Google Groups for discussions of the book, companion hands-on courses, example spreadsheets, as well as case studies and visualizations.

We have created a Google Group Data Science for Business for discussions of the book. This group is a place to share figures, slides, assignments, project ideas, data, code, and so on. Foster and I both read it. If you have a question, that’s probably the best place to ask.

This link should take you to the Google group page in a browser. You’ll have to request access, but you should be approved quickly.

There is a companion group, Data Science for Business (INSTRS) intended for instructors only. You’ll be asked to provide some basic info about your course before you’re admitted. That private group contains teaching-related material such as slides, assignments, projects, exam questions, etc. That group was created because some instructors want to share teaching material among other instructors without making it public. If you’ve adopted Data Science for Business as a course text, or are considering adopting it, please join that group too.

The book mentions “material for learning additional skills and concepts (for example, scripting in Python, Unix command-line processing, datafiles, …, and other related topics).” That material comes from a separate course taught at NYU/Stern by Josh Attenberg and Foster Provost called Practical Data Science. It is a hands-on companion course to Data Science for Business. The Fall 2012 course notes are available online here.

Naive-Bayes spreadsheet

Toward the end of Chapter 9 (“Evidence and Probabilities”) we mention a Naive Bayes spreadsheet. We currently consider it damaged, as it has problems of presentation that will have to be re-thought. Fixing it is on our list, but in the meantime we discourage people from downloading it.

An extensively documented case study for teaching data science, with data and Jupyter notebooks, is available as an article in Big Data journal.

This post corresponds to the business understanding and high-level product formulation (and how the model relates to the product), as well as the initial data understanding.

This post starts to talk about the details of problem formulation, feature engineering, modeling setup and it shows a first look at evaluation. Most interestingly, it has a set of cool (I think) examples of the non-linear, black-box model in action, using SHAP to understand why the model gives the scores that it does for a bunch of actual predictions.

Visualizations

In our book we try to illustrate what we can, but sometimes animations or interactive visualizations are better. Here are pointers to useful visualizations we’ve found: