Detaillierter Kursinhalt
Introduction
Introducing data science and R
- What are statistics, data mining, machine learning…
- Data science projects and their lifetime
- Introducing R
- R tools
- R data structures
- Lab 1
Data overview
- Datasets, cases and variables
- Types of variables
- Introductory statistics for discrete variables
- Descriptive statistics for continuous variables
- Basic graphs
- Sampling, confidence level, confidence interval
- Lab 2
Data preparation
- Derived variables
- Missing values and outliers
- Smoothing and normalization
- Time series
- Training and test sets
- Lab 3
Associations between two variables and visualizations of associations
- Covariance and correlation
- Contingency tables and chi-squared test
- T-test and analysis of variance
- Bayesian inference
- Linear models
- Lab 4
Feature selection and matrix operations
- Feature selection in linear models
- Basic matrix algebra
- Principal component analysis
- Exploratory factor analysis
- Lab 5
Unsupervised learning
- Hierarchical clustering
- K-means clustering
- Association rules
- Lab 6
Supervised learning
- Neural Networks
- Logistic Regression
- Decision and regression trees
- Random forests
- Gradient boosting trees
- K-nearest neighbors
- Lab 7
Modern topics
- Support vector machines
- Time series
- Text mining
- Deep learning
- Reinforcement learning
- Lab 8
R in SQL Server and MS BI
- ML Services (In-Database) structure
- Executing external scripts in SQL Server
- Storing a model and performing native predictions
- R in Azure ML and Power BI
- Lab 9