# Statistics Data Science Curriculum (2023-24)

This focused MS track is developed within the structure of the current MS in Statistics and new trends in data science and analytics. Upon the successful completion of the Data Science MS degree students will be prepared to continue to a related doctoral program or as a data science professional in industry. Completing the MS degree is not a direct path for admission to the PhD program in Statistics.

This program is not an online degree program.

Coursework

The Data Science track develops strong mathematical, statistical, computational and programming skills, in addition to providing fundamental data science education through general and focused electives requirement from courses in data sciences and other areas of interest.

As defined in the general Graduate Student Requirements, students have to maintain a grade point average (GPA) of 3.0 or better and classes must be taken at the 200 level or higher. Students satisfying the course requirements of the Data Science track do not satisfy the other course requirements for the M.S. in Statistics

The total number of units in the degree is 45, 36 of which must be taken for a letter grade.

Submission of approved Master's Program Proposal, signed by the master's advisor, to the student services officer by the end of the first quarter of the master's degree program. A revised program proposal is required to be filed whenever there are changes to a student's previously approved program proposal.

There is no thesis requirement.

### Data Science Proposal Forms

Students must demonstrate breadth of knowledge in the field by completing courses in these core areas.

- Mathematical & Statistical Foundations (15 - 16 units)
- Experimentation (3 units)
- Scientific Computing (includes software development & large-scale computing) (6 units minimum)
- Machine Learning Methods & Applications (6 units minimum)
- Practical Component (3 units)
- Elective course in the data sciences (remainder of 45 units)

### Mathematical and Statistical Foundations (15 - 16 units)

Students must demonstrate foundational knowledge in the field by completing the following courses. Courses in this area must be taken for letter grades.

##
Introduction to Statistical Inference (STATS 200)

**Prerequisite: STATS 116.**

OR

##
Theory of Statistics I (STATS 300A)

Finite sample optimality of statistical procedures; Decision theory: loss, risk, admissibility; Principles of data reduction: sufficiency, ancillarity, completeness; Statistical models: exponential families, group families, nonparametric families; Point estimation: optimal unbiased and equivariant estimation, Bayes estimation, minimax estimation; Hypothesis testing and confidence intervals: uniformly most powerful tests, uniformly most accurate confidence intervals, optimal unbiased and invariant tests.

**Prerequisites: Real analysis, introductory probability (at the level of STATS 116), and introductory statistics.**

##
Introduction to Regression Models and Analysis of Variance (STATS 203)

Modeling and interpretation of observational and experimental data using linear and nonlinear regression methods. Model building and selection methods. Multivariable analysis. Fixed and random effects models. Experimental design. Prerequisites: A post-calculus introductory probability course, e.g. STATS 116, basic computer programming knowledge, some familiarity with matrix algebra, and a pre- or co-requisite post-calculus mathematical statistics course, e.g. STATS 200.

Or STATS 203V (Su)

OR

##
Applied Statistics I (STATS 305A)

**Terms: Aut | Units: 3**

##
Modern Applied Statistics: Learning (STATS 315A)

**Prerequisites: STATS 305A, 305B, 305C or consent of instructor.**

**Terms: Win | Units: 3**

Data Science students who started Fall 2022 may substitute STATS 315A for CS229.

Those who started Fall 2023 must enroll in STATS 315A.

##
Numerical Linear Algebra (CME 302)

**Terms: Aut | Units: 3**

**(Permissible to enroll in Year 2.)**

##
Stochastic Methods in Engineering (CME 308)

**Prerequisites: exposure to probability and background in analysis.**

**Terms: Spr | Units: 3**

OR

##
Stochastic Processes (STATS 219/MATH 136)

Introduction to measure theory, Lp spaces and Hilbert spaces. Random variables, expectation, conditional expectation, conditional distribution. Uniform integrability, almost sure and Lp convergence. Stochastic processes: definition, stationarity, sample path continuity. Examples: random walk, Markov chains, Gaussian processes, Poisson processes, Martingales. Construction and basic properties of Brownian motion.

**Prerequisite:** **STATS 116**** or ****MATH 151**** or equivalent. Recommended:** **MATH 115**** or equivalent.**

**Terms: Win | Units: 4**

### Experimentation Elective (3 units)

Experimental method and causal considerations are fundamental to data science. The course chosen from this area must be taken for letter grades.

Courses in this area must be taken for letter grades.

##
Introduction to Causal Inference (STATS 209)

**Prerequisites: basic probability and statistics, familiarity with R.**

**Terms: Aut | Units: 3**

##
Design of Experiments (STATS 263/363)

##
Applied Causal Inference with Machine Learning and AI (MS&E 228)

**Prerequisites: basic knowledge of probability and statistics. Recommended: 226 or equivalent.**

### Scientific Computing (6 units)

Software Development (3 units)

Large Scale Computing (3 units)

### Software Development (3 units)

**2022-23 and 2023-24 - CME 212 will not be offered. See instructions below.**

**In lieu of CME 212 (not offered 2022-23), students must take an additional 3-units from the list of ****Scientific Computing.**

Courses in this area must be taken for letter grades.

To ensure that students have a strong foundation in programming, 3 units of software development (CME212) and minimum 3 units of scientific computing.

- Students who do not start the program with a strong computational and/or programming background will take an extra 3 units to prepare themselves by taking:
**NOT OFFERED 2023-24: CME211 Programming in C/C++ for Scientists and Engineers**(placement exam in Summer Quarter)**,**or equivalent course with advisor's approval.

**Software Development: (3 units)**

**In lieu of CME 212 (not offered 2022-23), students must take an additional 3-units from the list of ****Scientific Computing.**

**Large Scale Computing: (3 units)**

An additional course (3 units) in Large Scale Computing in lieu of the *Software Development* requirement.

6 units total

##
Introduction to parallel computing using MPI, openMP, and CUDA (CME 213)

**Pre-requisites include C++, templates, debugging, UNIX, makefile, numerical algorithms (differential equations, linear algebra).**

**Terms: Spr | Units: 3**

##
Discrete Mathematics and Algorithms (CME 305)

**Prerequisites: CS 261 is highly recommended, although not required.**

**Terms: Win | Units: 3**

##
Optimization (CME 307)

**Prerequisites: MATH 113, 115, or equivalent.**

**Terms: Win | Units: 3**

##
Distributed Algorithms and Optimization (CME 323)

**Recommended prerequisites: Discrete math at the level of CS 161 and programming at the level of CS 106A.**

**Terms: Spr | Units: 3**

##
Convex Optimization I (CME 346A)

**Prerequisite: linear algebra such as EE263, basic probability.**

**Terms: Win, Sum | Units: 3**

##
Mining Massive Data Sets (CS 246)

**Prerequisites: At least one of CS107 or CS145.**

**Terms: Win | Units: 3-4**

##
NOT OFFERED 2023-24 -Principles of Data-Intensive Systems (CS 245)

**Terms: Win | Units: 3-4**

### Machine Learning Methods & Applications (6–9 units)

Courses in this area must be taken for letter grades. *Courses outside this list are subject to approval.*

##
NOT OFFERED 2023-24 - Modern Applied Statistics: Data Mining (STATS 315B)

**Terms: Spr | Units: 3**

##
Artificial Intelligence: Principles and Techniques (CS 221)

**Prerequisites: CS 103 or CS 103B/X, CS 106B or CS 106X, CS 109, and CS 161 (algorithms, probability, and object-oriented programming in Python). We highly recommend comfort with these concepts before taking the course, as we will be building on them with little review.**

**Terms: Aut, Spr | Units: 3-4**

##
Natural Language Processing with Deep Learning (CS 224N)

**Terms: Win | Units: 3-4**

##
Machine Learning (CS/STATS 229)

**Terms: Aut, Spr, Sum | Units: 3-4**

##
Deep Learning (CS 230)

**Prerequisites: Familiarity with programming in Python and Linear Algebra (matrix / vector multiplications). CS 229 may be taken concurrently.**

**Terms: Aut, Spr | Units: 3-4**

##
Deep Learning for Computer Vision (CS 231N)

**Prerequisites: Proficiency in Python; CS131 and CS229 or equivalents; MATH21 or equivalent, linear algebra.**

**Terms: Spr | Units: 3-4**

##
Reinforcement Learning (CS 234)

**Prerequisites: proficiency in python, CS 229 or equivalents or permission of the instructor; linear algebra, basic probability.**

**Terms: Win | Units: 3**

##
Deep Generative Models (CS 236)

**Prerequisites: Basic knowledge about machine learning from at least one of CS 221, 228, 229 or 230. Students will work with computational and mathematical models and should have a basic knowledge of probabilities and calculus. Proficiency in some programming language, preferably Python, required.**

**Terms: Aut | Units: 3**

### Practical Component of Capstone project

Students are required to take minimum of 3 units of practical component that may include any combination of:

A capstone project, supervised by a faculty member and approved by the student's advisor. The capstone project should be computational in nature. Students should submit a one- page proposal, supported by the faculty member and sent to the student's Data Science advisor for approval (at least one quarter prior to start of project).

- Master's Research:
**STATS 299 Independent Study**. In consultation with your advisor, independent study/directed reading with permission of statistics faculty. (repeatable). **BIODS 232: Consulting Workshop on Biomedical Data**Science Units: 1–2 units- Gain practical industry experience and exposure to the organization, its industry, and the space in which it operates, Build relationships in the organization and industry, and gain an understanding of related career paths.
**Applied Data Science**with ICME and industry partners (CME 218).- Industry research teams to tackle complex computational challenges.

**Stanford ML group -****AI for Health Care****Bootcamp**- Students collaborate closely with Postdocs, PhD students from Professor Andrew Ng's lab, the AIMI Center, and faculty members in medicine.
- Consider the bootcamp as their primary academic engagement (30 hours per week) outside of 1 or 2 courses. We encourage students to sign up for research credits (CS 199, CS 399, etc).
- Application will open soon in Fall 2023.
- Pre-reqs:
- Students with a background in artificial intelligence, software engineering or medicine are encouraged to apply. The bootcamp is suited for students who have taken machine learning and software engineering courses.

- [Have not confirmed 2023-24 course will be offered]
**Stanford ML group****AI for Climate Change****Bootcamp**- Students work closely with PhD students in Professor Andrew Ng's lab and with faculty members in climate change-related fields.
- The AICC bootcamp is an intense two-quarter program where students work on high-impact research problems at the intersection of AI and climate change.
- The bootcamp as their primary academic engagement (30 hours per week) outside of 1 or 2 courses.
- We encourage students to sign up for research credits (CS 199, CS 399, etc).

- Pre-reqs:
- Students with a background in artificial intelligence, software engineering or medicine are encouraged to apply. This role is suited for students who have taken machine learning and software engineering courses.

- Other courses that have a strong hands-on and practical component, such as
**STATS 390 Consulting Workshop**(repeatable).- This class requires mastery of Statistics at the (graduate) level necessary to provide consultation to fellow members of the university.
- Students attend weekly lectures on Friday to discuss consulting cases and various statistical techniques that arise frequently in consulting.

*When Offered -***Data Driven Impact****(ALP 301)**

- Xplore Projects (CME 291) Units: 3 | Repeatable 2 times (up to 6 units total) -
**Enrollment by application only.**- Autumn projects include:
- IDM Gates Foundation Disease Eradication
- Lawrence Berkeley
- Mathworks Speech/Human Motion
- Multimodal Emotion Classification
- Sandia Global Climate
- Stanford ML in Genomics
- Stanford COVID Lung Imaging
- The Ocean Cleanup Beach Analysis
- World Bank Health Systems

- Autumn projects include:

*RETIRED - Analytics Accelerator (CME 217) Units: 3 | Repeatable 2 times (up to 6 units total) >> REPLACED by**Xplore Projects with ICME (CME 291)**Multidisciplinary graduate level course offering real-world project-based research. Students work in dynamic teams with the support of course faculty, and outside analytics experts to scope and research projects, apply a computational and data analytics lens and follow design thinking methodology.**Enrollment by application only.*

### Data Science Electives (6–9 units)

In consultation with the student's program advisor, the student selects courses within the realm of data science to fulfill the remaining coursework required for the degree.

**Minimum 6 units of elective coursework.**

The following courses may also be taken for elective credit:

##
Programming Methodology (CS 106A)

**Prerequisites: No prior programming experience required.**

**Terms: Aut, Win, Spr | Units: 3**

##
Programming Abstractions (CS 106B)

**Prerequisites: 106A or equivalent.**

**Terms: Aut, Win, Spr | Units: 3**

##
Computer Organization and Systems (CS 107)

**Prerequisites:**106B, or consent of instructor.

**Terms: Aut, Win, Spr | Units: 3**

##
NOT OFFERED 2023-24 - Software Development for Scientists and Engineers (CME 211)

**Prerequisites: introductory programming course equivalent to CS 106A or instructor consent.**

**Terms: Aut | Units: 3**