MSDS IST 687: Intro to Data Science

3 minute read

Intro to Data Science

MSDS - Q1: IST687

COURSE DESCRIPTION:

The course introduces students to applied examples of data collection, processing, transformation, management, and analysis to provide students with hands-on introduction to data science experience. Students will explore key concepts related to data science, including applied statistics, information visualization, text mining and machine learning. “R”, the open source statistical analysis and visualization system, will be used throughout the course. R is reckoned by many to be the most popular choice among data analysts worldwide; having knowledge and skill with using it is considered a valuable and marketable job skill for most data scientists.

LEARNING OBJECTIVES:

At the end of the course, students are expected to understand:

  • Essential concepts and characteristics of data
  • Scripting/code development for data management using R and R-Studio
  • Principles and practices in data screening, cleaning, and linking
  • Communication of results to decision makers

At the end of the course, students are expected to be able to:

  • Identify a problem and the data needed for addressing the problem
  • Perform basic computational scripting using R and other optional tools
  • Transform data through processing, linking, aggregation, summarization, and searching
  • Organize and manage data at various stages of a project lifecycle
  • Determine appropriate techniques for analyzing data

CLASS DELIVERABLES:

CODE:

Class Outline

Assignments and Grading

  • Week 1 Homework
  • Week 2 Homework
  • Week 3 Homework
  • Week 4 Homework
  • Week 5 Homework
  • Week 6 Homework
  • Week 7 Homework
  • Week 8 Homework
  • Week 9 Homework Assignment You must submit this assignment to mark it complete.
  • Week 10 Homework Assignment You must submit this assignment to mark it complete.
  • Week 1 Lab Assignment Not completed
  • Week 2 Lab Assignment Not completed
  • Week 3 Lab Assignment Not completed
  • Week 4 Lab Assignment Not completed
  • Week 5 Lab Assignment Not completed
  • Week 6 Lab Assignment Not completed
  • Week 7 Lab Assignment Not completed
  • Week 8 Lab Assignment Not completed
  • Week 9 Lab Assignment Not completed
  • Week 10 Lab Assignment Not completed
  • Midterm Quiz
  • Project Update 1
  • Project Update 2
  • Project Update 3
  • Final Project
  • Participation

Week 1 | Introduction

  • 1.1 Week 1 Readings
  • 1.2 Data Science: Many Skills
  • 1.3 Data Science: Example
  • 1.4 Data Science Overview, Part II
  • 1.5 Getting Started With R
  • 1.6 Why R?
  • 1.7 Install R and RStudioForum
  • 1.8 R Overview

Week 2 | Rows and Columns

  • 2.1 Week 2 Introduction
  • 2.2 Week 2 Readings
  • 2.3 An R Overview
  • 2.4 Data Modeling Overview
  • 2.5 Playing With VectorsForum
  • 2.6 Rows and Columns
  • 2.7 Dataframes in R
  • 2.8 Using Data Frames
  • 2.9 Sorting Data Frames

Week 3 | Writing Functions and Descriptive Stats

  • 3.1 Week 3 Introduction
  • 3.2 Week 3 Readings
  • 3.3 Writing Functions
  • 3.4 R: Writing Functions
  • 3.5 Descriptive Stats Overview
  • 3.6 Descriptive Stats Example
  • 3.7 Using Descriptive Stats
  • 3.8 Real-World Examples

Week 4 | Sampling

  • 4.1 Week 4 Introduction
  • 4.2 Week 4 Readings
  • 4.3 Introduction to Sampling
  • 4.4 Flipping Coins
  • 4.5 Replicating Samples
  • 4.6 Central Limit Theorem
  • 4.7 Sampling Population
  • 4.8 Distribution of Sampling of the Mean
  • 4.9 Sampling From a Jar
  • 4.10 Mystery Samples
  • 4.11 More on Sampling

Week 5 | Accessing Data

  • 5.1 Week 5 Introduction
  • 5.2 Week 5 Readings
  • 5.3 Introduction to Importing Data
  • 5.4 Importing Spreadsheets
  • 5.5 R & SQL
  • 5.6 SQL
  • 5.7 JSON
  • 5.8 JSON

Week 6 | Pictures Versus Numbers

  • 6.1 Week 6 Introduction
  • 6.2 Week 6 Readings
  • 6.3 Example Visualizations
  • 6.4 Analyzing Visualizations
  • 6.5 Play With Visualizations
  • 6.6 Introduction to Visualizations
  • 6.7 Introduction to the GGPLOT Package
  • 6.8 Histograms and Line Charts
  • 6.9 Ten Principles in Visualization
  • 6.10 Bar and Scatter Plots

Week 7 | Map Mashup

  • 7.1 Week 7 Introduction
  • 7.2 Week 7 Readings
  • 7.3 Example Map Visualizations
  • 7.4 GGPLOT Map Introduction
  • 7.5 An Example Map Mash-Up
  • 7.6 Creating Maps
  • 7.7 Points on a Map
  • 7.8 Map Zoom

Week 8 | Linear Modeling

  • 8.1 Week 8 Introduction
  • 8.2 Week 8 Readings
  • 8.3 Introduction to Linear Regression
  • 8.4 Model Overview
  • 8.5 Working Through an Example
  • 8.6 Working Through a Refined Example
  • 8.7 Working With Linear Models

Week 9 | Data Mining

  • 9.1 Week 9 Introduction
  • 9.2 Week 9 Readings
  • 9.3 Data Mining Overview
  • 9.4 Associative Rule Mining
  • 9.5 Associative Rule Mining in R
  • 9.6 Using Associative Rule Mining
  • 9.7 SVM Overview
  • 9.8 SVM Example
  • 9.9 R: SVM Example

Week 10 | Text Mining

  • Week 10 Introduction
  • 10.2 Week 10 Readings
  • 10.3 Word Clouds
  • 10.4 R: Word Clouds
  • 10.5 Sentiment Analysis
  • 10.6 R: Sentiment Analysis
  • 10.7 Topics in R
  • 10.8 Going Forward in Data SciencePage