# DATA SCIENCE PROFESSIONAL CERTIFICATE PROGRAM

### 80 Hours

### DATA SCIENCE PROFESSIONAL INTRODUCTION

Data Science Professional Certificate course is to accelerate your career in data science by starting from basics in Statistics, Data Management and Analytics to advanced topics like Neural Networks, Machine Learning and Big Data. We offer short-duration, in-person, hands-on data science training that will get you started with practical data science in just one week.

Data Scientists are some of the most sought after professionals in the world of big data analysis. Companies are pulling all stops to efficiently analyze the data that their business is generating. Every company, government program or institution that uses data are looking to hire data scientists. At any given point of time, job portals have over 100,000 data science open positions worldwide.

Data scientist makes data science sing by mastering math, computer programming in Python, R, Hadoop, etc. and statistics to derive insights using the same level of business understanding and gut instinct that drive company executive decisions. Data Scientist is a high ranking professional who has intense curiosity to make discoveries in the world of big data using technologies like Hadoop, Python, NoSQL, Machine Learning, and Statistical Analysis that make taming big data possible for businesses.

Data Analytics is a promising field due to the exponential growth of information created with today’s sophisticated technologies. This course offers working professionals the opportunity to acquire the knowledge and skills that support the management and use of big data.

If you want to learn the fundamentals of data analytics and embark on this exciting path, Our Data Science Professional program is for you. Upon successful completion of the certificate program, you will be able to:

Collect, clean, model, and report data as well as build data products

Analyze data to support informed data driven decisions

### CERTIFICATION PARTNER

Data Science Professional certification course is authorized and co-created with Merkat Intellekt Technologies Private Limited as the Knowledge Partner and comes with a cutting edge industry aligned curriculum and learning methodologies.

Merkat Intellekt Technologies is an digital transformation company has been helping our clients transform their business right from procurement, operations to taking the product to market within a short amount of time and services across business process management and services across technology, analytics, and organizational design.

Upon completion of the course, you will benefit in terms of:

Sharing of Case Studies

You will build multiple projects based on real-life scenarios. Merkat Intellekt Technologies will assist in evaluating project submissions and provide constructive feedback

SME's Lectures

Senior leaders will conduct guest lectures on key trends and real-world challenges plaguing the industry and mentor you towards job-readiness

Industry Approved Curriculum -

You learn in-demand skills and sought-after tools and techniques required by the Data Science industry through interactive case studies and hands-on projects

Deployment in our In-house Projects -

After completing the course and our exam, we will be issuing a Data Science Professional Certificate and will conduct a interview and once they cleared in their interview. We will deploy them in our In-house project with a Job Offer.

### DEDICATED SUPPORT

One on One Personal attention to our student

Q&A forum for real-time doubt resolution

Peer-to-Peer networking opportunity

Access to course content

Post course support

### CAREER GUIDANCE

Continuous mentoring by industry experts

Personalized resume building exercise

Mock interviews with hiring experts

Hiring partnership with 100+ companies

### WHO CAN LEARN OUR DATA SCIENCE COURSE?

The Data Science Professional certificate is ideal for students and experienced professionals who are interested in working in the analytics industry, and are keen on enhancing their technical skills and business understanding of data science. There are no particular prerequisites for this Training Course. If you love mathematics, it is helpful.

Experienced Professionals -

Professionals who are looking to up-skill or change career paths. Technical experience is a plus

Job Seekers -

Recent Graduates in Bachelors or Masters in Science, Math, Statistics, Engineering, Finance or Computer Applications/IT

Global Certifications -

Those looking to enhance their resumes & build a portfolio of demonstrable work in one of the most coveted professions of this century.

### ADVANTAGES OF THIS DATA SCIENCE PROFESSIONAL CERTIFICATE PROGRAM

Busy professionals find this program meets their professional goals while providing a flexible learning experience. As a student in the program, you’ll enjoy:

All courses are delivered in 100% Classroom learning format

Being taught by top faculty from our Digital Transformation company, Merkat Intellekt Technologies

In-house Job Offer facility available at our Path2learn center.

Courses that focus on the most important and marketable components of data analytics

Pre and Post-course support available at our Path2learn center. Students can use our labs after the course as well if they have any clarifications.

Free Project work after our course organized our Experts

Students select their own pace towards completion of the Certification

### WHY SHOULD YOU TAKE THE DATA SCIENTIST CERTIFICATION COURSE?

Data Scientist is the best job of the 21st century – Harvard Business Review

Global Big Data market to reach $122B in revenue by 2025 – Frost & Sullivan

The US alone could face a shortage of 1.4 -1.9 million Big Data Analysts by 2018 – Mckinsey

If you are keen on growing your career then you seriously need to consider the hot domain of data science. Path2learn offers some of the best growth opportunities and salaries in the technology domain.

## COURSE OUTLINE

Introduction to Data Science with R

What is Data Science, significance of Data Science in today’s digitally-driven world, applications of Data Science, lifecycle of Data Science, components of the Data Science lifecycle, introduction to big data and Hadoop, introduction to Machine Learning and Deep Learning, introduction to R programming and R Studio.

Hands-on Exercise – Installation of R Studio, implementing simple mathematical operations and logic using R operators, loops, if statements and switch cases.

Data Exploration

Introduction to data exploration, importing and exporting data to/from external sources, what is data exploratory analysis, data importing, dataframes, working with dataframes, accessing individual elements, vectors and factors, operators, in-built functions, conditional, looping statements and user-defined functions, matrix, list and array.

Hands-on Exercise – Accessing individual elements of customer churn data, modifying and extracting the results from the dataset using user-defined functions in R.

Data Manipulation

Need for Data Manipulation, Introduction to dplyr package, Selecting one or more columns with select() function, Filtering out records on the basis of a condition with filter() function, Adding new columns with the mutate() function, Sampling & Counting with sample_n(), sample_frac() & count() functions, Getting summarized results with the summarise() function, Combining different functions with the pipe operator, Implementing sql like operations with sqldf, Text Mining with StringR, wordcloud & StringR, Data Manipulation with data.table package, Working with dates with the lubridate package.

Hands-on Exercise – Implementing dplyr to perform various operations for abstracting over how data is manipulated and stored.

Data Visualization

Introduction to visualization, Different types of graphs, Introduction to grammar of graphics & ggplot2 package, Understanding categorical distribution with geom_bar() function, understanding numerical distribution with geom_hist() function, building frequency polygons with geom_freqpoly(), making a scatter-plot with geom_pont() function, multivariate analysis with geom_boxplot, univariate Analysis with Bar-plot, histogram and Density Plot, multivariate distribution, Bar-plots for categorical variables using geom_bar(), adding themes with the theme() layer, visualization with plotly package & ggvis package, geographic visualization with ggmap(), building web applications with shinyR, frequency-plots with geom_freqpoly(), multivariate distribution with scatter-plots and smooth lines, continuous vs categorical with box-plots, subgrouping the plots, working with co-ordinates and themes to make the graphs more presentable, Intro to plotly & various plots, visualization with ggvis package, geographic visualization with ggmap(), building web applications with shinyR.

Hands-on Exercise – Creating data visualization to understand the customer churn ratio using charts using ggplot2, Plotly for importing and analyzing data into grids. You will visualize tenure, monthly charges, total charges and other individual columns by using the scatter plot.

Introduction to Statistics

Why do we need Statistics?, Categories of Statistics, Statistical Terminologies,Types of Data, Measures of Central Tendency, Measures of Spread, Correlation & Covariance,Standardization & Normalization,Probability & Types of Probability, Hypothesis Testing, Chi-Square testing, ANOVA, normal distribution, binary distribution.

Hands-on Exercise – Building a statistical analysis model that uses quantifications, representations, experimental data for gathering, reviewing, analyzing and drawing conclusions from data.

Machine Learning

Introduction to Machine Learning, introduction to Linear Regression, predictive modeling with Linear Regression, simple Linear and multiple Linear Regression, concepts and formulas, assumptions and residual diagnostics in Linear Regression, building simple linear model, predicting results and finding p-value, introduction to logistic regression, comparing linear regression and logistics regression, bivariate & multi-variate logistic regression, confusion matrix & accuracy of model, threshold evaluation with ROCR, uses of Poisson Regression, bivariate & multivariate Poisson Regression, implementing Poisson Regression in R, Linear Regression concepts and detailed formulas, various assumptions of Linear Regression,residuals, qqnorm(), qqline(), understanding the fit of the model, building simple linear model, predicting results and finding p-value, understanding the summary results with Null Hypothesis, p-value & F-statistic, building linear models with multiple independent variables.

Hands-on Exercise – Modeling the relationship within the data using linear predictor functions. Implementing Linear & Logistics Regression in R by building model with ‘tenure’ as dependent variable and multiple independent variables.

Logistic Regression

Introduction to Logistic Regression, Logistic Regression Concepts, Linear vs Logistic regression, math behind Logistic Regression, detailed formulas, logit function and odds, Bi-variate logistic Regression, Poisson Regression, building simple “binomial” model and predicting result, confusion matrix and Accuracy, true positive rate, false positive rate, and confusion matrix for evaluating built model, threshold evaluation with ROCR, finding the right threshold by building the ROC plot, cross validation & multivariate logistic regression, building logistic models with multiple independent variables, real-life applications of Logistic Regression.

Hands-on Exercise – Implementing predictive analytics by describing the data and explaining the relationship between one dependent binary variable and one or more binary variables. You will use glm() to build a model and use ‘Churn’ as the dependent variable.

Decision Trees & Random Forest

What is classification and different classification techniques, introduction to Decision Tree, algorithm for decision tree induction, building a decision tree in R, creating a perfect Decision Tree, Confusion Matrix, Regression trees vs Classification trees, introduction to ensemble of trees and bagging, Random Forest concept, implementing Random Forest in R, what is Naive Bayes, Computing Probabilities, Laplace Correction, Implementing Naive Bayes in R, What is KNN algorithm, implementing KNN in R, what is Support Vector Machine, implementing SVM in R, what is XGBOOST, Implementing XGBOOST in R, Impurity Function – Entropy, understand the concept of information gain for right split of node, Impurity Function – Information gain, understand the concept of Gini index for right split of node, Impurity Function – Gini index, understand the concept of Entropy for right split of node, overfitting & pruning, pre-pruning, post-pruning, cost-complexity pruning, pruning decision tree and predicting values, find the right no of trees and evaluate performance metrics.

Hands-on Exercise – Implementing Random Forest for both regression and classification problems. You will build a tree, prune it by using ‘churn’ as the dependent variable and build a Random Forest with the right number of trees, using ROCR for performance metrics.

Unsupervised learning

What is Clustering & it’s Use Cases, what is K-means Clustering, what is Canopy Clustering, what is Hierarchical Clustering, introduction to Unsupervised Learning, feature extraction & clustering algorithms, k-means clustering algorithm, Theoretical aspects of k-means, and k-means process flow, K-means in R, implementing K-means on the data-set and finding the right no. of clusters using Scree-plot, hierarchical clustering & Dendogram, understand Hierarchical clustering, implement it in R and have a look at Dendograms, Principal Component Analysis, explanation of Principal Component Analysis in detail, PCA in R, implementing PCA in R.

Hands-on Exercise – Deploying unsupervised learning with R to achieve clustering and dimensionality reduction, K-means clustering for visualizing and interpreting results for the customer churn data.

Association Rule Mining & Market Basket Analysis

Introduction to association rule Mining & Market Basket Analysis, measures of Association Rule Mining: Support, Confidence, Lift, Apriori algorithm & implementing it in R, Introduction to Recommendation Engine, user-based collaborative filtering & Item-Based Collaborative Filtering, implementing Recommendation Engine in R, user-Based and item-Based, Recommendation Use-cases.

Hands-on Exercise – Deploying association analysis as a rule-based machine learning method, identifying strong rules discovered in databases with measures based on interesting discoveries.

Time Series Analysis

What is Time Series, techniques and applications, components of Time Series, moving average, smoothing techniques, exponential smoothing, univariate time series models, multivariate time series analysis, Arima model, Time Series in R, sentiment analysis in R (Twitter sentiment analysis), text analysis.

Hands-on Exercise – Analyzing time series data, sequence of measurements that follow a non-random order to identify the nature of phenomenon and to forecast the future values in the series.

Introduction to Artificial Intelligence

Introducing Artificial Intelligence and Deep Learning, what is an Artificial Neural Network, TensorFlow – computational framework for building AI models, fundamentals of building ANN using TensorFlow, working with TensorFlow in R.

This course is an excellent way for participants to learn a new skill or continue expanding an existing skill. As one of the most popular courses we offer, there are classes available at many different times to fit all types of schedules. If you’re interested in learning more about this course, reach out and we’d be happy to help.