- Instructor: Dr Karunakar Karunakar
- Students: 35624
- Duration: 10 weeks

- Introduction to Data Science,
- Importance of Data Science,
- Statistical and analytical methods,
- Deploying Data Science for Business Intelligence,
- Transforming data,
- Machine learning
- Introduction to Recommender systems

- How Data Science solves real world problems,
- Data Science
- Project Life Cycle,
- Principles of Data Science,
- Introduction to various BI
- Analytical tools,
- Data collection,
- Introduction to statistical packages,
- Data visualization tools,
- R Programming,
- Predictive modelling,
- Machine learning,
- Artificial intelligence
- Statistical analysis.

- Converting data into useful information,
- Collecting the data,
- Understand the data,
- Finding useful information in the data,
- Interpreting the data,
- Visualizing the data

- Descriptive statistics,
- Let us understand some terms in statistics,
- Variable

- Dot Plots,
- Histogram,
- Stemplots,
- Box and whisker plots,
- Outlier detection from box plots
- Box and whisker plots

- What is probability?,
- Set & rules of probability,
- Bayes Theorem

- Probability Distributions,
- Few Examples,
- Student T- Distribution,
- Sampling Distribution,
- Student t- Distribution,
- Poison distribution

- Stratified Sampling,
- Proportionate Sampling,
- Systematic Sampling,
- P – Value,
- Stratified Sampling

- Cross Tables,
- Bivariate Analysis,
- Multi variate Analysis,
- Dependence and Independence tests ( Chi-Square ),
- Analysis of Variance,
- Correlation between Nominal variable

- Boxplot in R programming,
- Understanding distribution and percentile,
- Identifying outliers,
- Rstudio Tool,
- Various types of distribution like Normal,
- Uniform and Skewed.

## R Programming

- R language for statistical programming, the various features of R,
- Introduction to R Studio,
- The statistical packages, familiarity with different data types and functions,
- Learning to deploy them in various scenarios,
- Use SQL to apply ‘join’ function,
- Components of R Studio like code editor,
- Visualization and debugging tools,
- Learn about R-bind.

- R Functions, code compilation and data in well-defined format called R-Packages,
- Learn about R-Package structure,
- Package metadata and testing,
- CRAN (Comprehensive R Archive Network),
- Vector creation and variables values assignment.

- R functionality,
- Rep Function,
- Generating Repeats,
- Sorting and generating Factor Levels,
- Transpose and Stack Function.

- Introduction to matrix and vector in R,
- Understanding the various functions like Merge,
- Strsplit,
- Matrix manipulation, rowSums,
- RowMeans,
- ColMeans,
- ColSums,
- Sequencing,
- Repetition,
- Indexing and other functions.

- Understanding subscripts in plots in R,
- how to obtain parts of vectors,
- Using subscripts with arrays,
- As logical variables, with lists,
- Understanding how to read data from external files.

- Generate plot in R,
- Graphs,
- Bar Plots,
- Line Plots,
- Histogram,
- Components of Pie Chart.

- Understanding Analysis of Variance (ANOVA)
- Statistical technique,
- Working with Pie Charts, Histograms,
- Deploying ANOVA with R,
- One way ANOVA, two way ANOVA.

- K-Means Clustering for Cluster & Affinity Analysis,
- Cluster Algorithm,
- Cohesive subset of items,
- Solving clustering issues,
- Working with large datasets,
- Association rule mining affinity analysis for data mining and analysis and learning co-occurrence relationships.

- Introduction to Association Rule Mining,
- The various concepts of Association Rule Mining,
- Various methods to predict relations between variables in large datasets,
- The algorithm and rules of Association Rule Mining, understanding single cardinality.

- Understanding what is Simple Linear Regression,
- The various equations of Line,
- Slope,
- Y-Intercept Regression Line,
- Deploying analysis using Regression,
- The least square criterion,
- Interpreting the results, standard error to estimate and measure of variation.

- Scatter Plots,
- Two variable Relationship,
- Simple Linear Regression analysis,
- Line of best fit

- Deep understanding of the measure of variation,
- The concept of co-efficient of determination,
- F-Test,
- The test statistic with an F-distribution,
- Advanced regression in R,
- Prediction linear regression.

- Logistic Regression Mean,
- Logistic Regression in R.

- Advanced logistic regression,
- Understanding how to do prediction using logistic regression,
- Ensuring the model is accurate,
- Understanding sensitivity and specificity, confusion matrix,
- What is ROC, a graphical plot illustrating binary classifier system,
- ROC curve in R for determining sensitivity/specificity trade-offs for a binary classifier.

- Detailed understanding of ROC,
- Area under ROC Curve,
- Converting the variable, data set partitioning,
- Understanding how to check for multicollinearlity,
- How two or more variables are highly correlated,
- Building of model, advanced data set partitioning,
- Interpreting of the output,
- Predicting the output, detailed confusion matrix,
- Deploying the Hosmer-Lemeshow test for checking whether the observed event rates match the expected event rates.

- Data analysis with R,
- Understanding the WALD test,
- MC Fadden’s pseudo R-squared,
- The significance of the area under ROC Curve,
- Kolmogorov Smirnov Chart which is non-parametric test of one dimensional probability distribution.

- Connecting to various databases from the R environment,
- Deploying the ODBC tables for reading the data,
- Visualization of the performance of the algorithm using Confusion Matrix.

- Creating an integrated environment for deploying R on Hadoop platform,
- Working with R Hadoop, RMR package and R Hadoop Integrated Programming Environment
- R programming for MapReduce jobs and Hadoop execution.

## Python Programming

- Hello, World!,
- Variables and Types,
- Lists,
- Basic Operators,
- String Formatting,
- Basic String Operations,
- Conditions,
- Loops,
- Functions,
- Classes and Objects,
- Dictionaries,
- Modules and Packages

- Numpy Arrays,
- Pandas Basics

- Generators,
- List Comprehensions,
- Multiple Function
- Arguments, Regular Expressions,
- Exception Handling, Sets,
- Serialization,
- Partial functions,
- Code Introspection, Closures, Decorators

- Deploying machine learning for data analysis,
- Solving business problems,
- Using algorithms for searching patterns in data,
- Relationship between variables,
- Multivariate analysis, interpreting correlation,
- Negative correlation.

- Data Transformation key phases Data Mapping and Code Generation,
- Data Processing operation,
- Data patterns, data sampling,
- Sampling distribution, normal and continuous variable,
- Data extrapolation, regression,
- Linear regression model.

- Data analysis,
- Hypothesis testing,
- Simple linear regression,
- Chi-square for assessing compatibility between theoretical and observed data,
- Implementing data testing on data warehouse,
- Validating data,
- Checking for accuracy,
- Data operational monitoring capabilities.

- Various techniques of data modelling and generating algorithms,
- Methods of business prediction,
- Prediction approaches, data sampling,
- Disproportionate sampling,
- Data modelling rules, data iteration,
- Deploying data for mission-critical applications

- Working with large data sets in data warehouses,
- Data clustering, grouping,
- Horizontal & vertical slicing,
- Data sharding in partitioning,
- Clustering algorithms,
- K-means Clustering for analysis and data mining,
- Exclusive clustering,
- Hierarchy clustering,
- Mahout Clustering algorithm and Probabilistic Clustering,
- Nearest neighbour search,
- Pattern recognition, and statistical classification.

Curriculum is empty