Data Science
CD-501
(Theory Of Computation)
CD-502
(Machine Learning )
CD-503 (A)
(Data Mining & Warehousing )
CD-503 (B)
(Pattern Recognition)
CD-503 (C)
( Introduction to Toolkits for Data Science )
Syllabus
- Unit-I Introduction of Automata Theory: Review of Sets, Mathematical formal proofs including proof by induction and by contradiction,Introduction to languages, grammars and automata: Alphabet, Representation of language and grammar, Types of Automata, Finite Automata as a language acceptor and translator, Moore machines and mealy machines, composite machine, Conversion from Mealy to Moore and vice versa.
- Unit-II Types of Finite Automata: Non Deterministic Finite Automata (NDFA), Deterministic finite automata machines, conversion of NDFA to DFA, minimization of automata machines, regular expression, applications of regular expressions, Arden’s theorem. Meaning of union, intersection, concatenation and closure, 2 way DFA.
- Unit-III Grammars: Types of grammar, context sensitive grammar, and context free grammar, regular grammar. Derivation trees, ambiguity in grammar, simplification of context free grammar,conversion of grammar to automata machine and vice versa, Chomsky hierarchy of grammar, Chomsky normal form and Greibach normal form.
- Unit-IV Push down Automata: example of PDA, deterministic and non-deterministic PDAs, Context Free Grammar, Parsing, Ambiguity, Normal form of CFGs, CFG to NPDA, NPDA to CFGs CFG equivalent to PDA, Petri nets model.
- Unit-V Turing Machine: Turing Machine as acceptor, Recognizing a Language, Universal TMs, Linear Bounded Automata, Context Sensitive Languages, Recursive and Recursively Enumerable Languages, Unrestricted Grammars. Halting problem of Turing machine & the post correspondence problem, Concept of Solvability and Unsolvability, Church’s Thesis, Complexity Theory – P and NP problems.
Syllabus
- UNIT-I Introduction to machine learning, Machine learning life cycle, Types of Machine Learning System (supervised and unsupervised learning, Batch and online learning, Instance-Based and Model based Learning), scope and limitations, Challenges of Machine learning, data visualization, hypothesis function and testing, data pre-processing, data augmentation, normalizing data sets, , Bias-Variance tradeoff, Relation between AI (Artificial Intelligence), ML (Machine Learning), DL (Deep Learning) and DS (Data Science).
- UNIT-II Clustering in Machine Learning: Types of Clustering Method: Partitioning Clustering, Distribution Model-Based Clustering, Hierarchical Clustering, Fuzzy Clustering. Birch Algorithm, CURE Algorithm. Gaussian Mixture Models and Expectation Maximization. Parameters estimations – MLE, MAP. Applications of Clustering.
- UNIT-III Classification algorithm: - Logistic Regression, Decision Tree Classification, Neural Network, K-Nearest Neighbors (K-NN), Support Vector Machine, Naive Bayes (Gaussian, Multinomial, Bernoulli). Performance Measures: Confusion Matrix, Classification Accuracy, Classification Report: Precisions, Recall, F1 score and Support.
- UNIT-IV Ensemble Learning and Random Forest: Introduction to Ensemble Learning, Basic Ensemble Techniques (Max Voting, Averaging, Weighted Average), Voting Classifiers, Bagging and Pasting, Out-of-Bag Evaluation, Random Patches and Random Subspaces, Random Forests (Extra-Trees, Feature Importance), Boosting (AdaBoost, Gradient Boosting), Stacking.
- UNIT-V Dimensionality Reduction:The Curse of Dimensionality, Main Approaches for Dimensionality Reduction (Projection, Manifold Learning) PCA: Preserving the Variance, Principal Components, Projecting Down to d Dimensions, Explained Variance Ratio, Choosing the Right Number of Dimensions, PCA for Compression, Randomized PCA, Incremental PCA. Kernel PCA: Selecting a Kernel and Tuning Hyper parameters. Learning Theory: PAC and VC model.
Syllabus
- Unit 1: Data Warehousing: Introduction, Delivery Process, Data warehouse Architecture, Data Preprocessing: Data cleaning, Data Integration and transformation, Data reduction. Data warehouse Design: Dataware house schema, Partitioning strategy Data warehouse Implementation, Data Marts, Meta Data, Example of a Multidimensional Data model, Introduction to Pattern Warehousing.
- Unit 2: OLAP Systems: Basic concepts, OLAP queries, Types of OLAP servers, OLAP operations etc. Data Warehouse Hardware and Operational Design: Security, Backup And Recovery,
- Unit 3: Introduction to Data & Data Mining: Data Types, Quality of data, Data Preprocessing, Similarity measures, Summary statistics, Data distributions, Basic data mining tasks, Data Mining V/s knowledge discovery in databases. Issues in Data mining, Introduction to Fuzzy sets and fuzzy logic.
- Unit 4: Supervised Learning (Classification): Statistical-based algorithms, Distance-based algorithms, Decision tree-based algorithms, Neural network-based algorithms, Rule-based algorithms, Probabilistic Classifiers
- Unit 5: Clustering & Association Rule mining: Hierarchical algorithms, Partitional algorithms, Clustering large databases – BIRCH, DBSCAN, CURE algorithms. Association rules : Parallel and distributed algorithms such as Apriori and FP growth algorithms.
Syllabus
- Unit-I Introduction – Definitions, datasets for Pattern, Application Areas and Examples of pattern recognition, Design principles of pattern recognition system, Classification and clustering, supervised Learning, unsupervised learning and adaptation, Pattern recognition approaches, Decision Boundaries, Decision region , Metric spaces, distances.
- Unit -II Classification: introduction, application of classification, types of classification, decision tree, naïve bayes, logistic regression , support vector machine, random forest, K Nearest Neighbour Classifier and variants, Efficient algorithms for nearest neighbour classification, Different Approaches to Prototype Selection, Combination of Classifiers, Training set, test set, standardization and normalization.
- Unit – III Different Paradigms of Pattern Recognition, Representations of Patterns and Classes, Unsupervised Learning & Clustering: Criterion functions for clustering, Clustering Techniques: Iterative square -error partitional clustering – K means, hierarchical clustering, Cluster validation.
- Unit -IV Introduction of feature extraction and feature selection, types of feature extraction , Problem statement and Uses, Algorithms - Branch and bound algorithm, sequential forward / backward selection algorithms, (l,r) algorithm.
- Unit -V Recent advances in Pattern Recognition, Structural PR, SVMs, FCM, Soft computing and Neuro fuzzy techniques, and real-life examples, Histograms rules, Density Estimation, Nearest Neighbor Rule, Fuzzy classification.
Syllabus
- Unit 1: Python for Data Science: Review of Numpy, Pandas and Scikit-learn.Supervised Learning Techniques packages/toolkit for regression and classification: - Decision Trees, Naive Bayes, Classification, Support vector machines, Random Forest, Neural network, Ensemble Methods, Ordinary Least Squares Regression, Logistic Regression, etc. Unsupervised Learning, Clustering: k-means, adaptive hierarchical clustering, Gaussian mixture, Optimization Using Evolutionary Techniques etc.
- Unit 2: R for Data Science: Basic of R and RStudio. R data structures: vectors, factors, lists, arrays, matrices, and data frames. Working with data: Import data into R and visualize data. Data Analytics Software: Weka, Orange, Rapidminer, Minitab, PowerBI, GitHub, Google Colab.
- Unit 3. Introduction to Deep Learning: Basics of TensorFlow and keras,Basics of PyTorch, perform style transfer of one image to another, Perform text generation, and sentiment analysis with PyTorch. Neural networks that recognize objects, improve the accuracy of object recognition using CNN, use pre-trained models to build state-of-the-art classifiers, Saving and Loading models, Time series forecasting with RNNs, and LSTMs,
- Unit 4: Introduction to Time Series Analysis: Time series regression and exploratory data analysis toolkits: ARMA/ARIMA models, model identification/estimation/linear operators, Fourier analysis, spectral estimation, and state-space models.
- Unit 5: Cloud Computing for Data Science: Implementation of Machine Learning and Deep learning through AWS/Azure platform.