Data Science and Machine Learning (DAMA) – MODULES

HOU > Data Science and Machine Learning (DAMA) > Data Science and Machine Learning (DAMA) – MODULES

DAMA501 Linear Algebra and Calculus

Module Code: DAMA501

ECTS Credit Points: 15

Module Type: Compulsory/Elective

Semester in which it is offered: 1^st/3^rd semester

Language: English

Module Outline

Purpose: The students will learn the basic mathematical tools necessary for Machine Learning (ML). These include basic concepts from linear algebra such as vectors, matrices and operations with vectors and matrices. From calculus, students will be exposed to functions of many real variables and the basic concept of the gradient and directional derivative to be applied in backpropagation ML algorithms. Overall, a student without prior knowledge of these mathematical areas will be able to form a background to understand ML techniques while students with prior mathematical knowledge will be able to go much deeper in application of mathematics in ML. The mathematical study will be supplemented by computational software that will enable both analytical and numerical evaluations.

The key subjects of the module are “Linear Algebra” and “Calculus”.

Learning Outcomes:

Knowledge:

Upon successful completion of the Module, students will be able to:

Recognize that a basic mathematical pillar for machine learning is linear algebra and vector calculus.
Summarize basic notions of vector spaces.
Outline the concepts of norm of a vector and of inner product between two vectors.
Explain what an orthonormal basis is in a vector space and describe the orthogonal complement of a subspace of the vector space.
Recall the definition of the trace and the determinant of a matrix.
Explain the concepts of eigenvalues and eigenvectors of square matrices.
Outline the concept of the gradient of a function of many variables and describe its geometric significance.
Summarize the gradient of matrices and its geometric significance.
Summarize the concept of backpropagation.

Skills:

Upon successful completion of the Module, students will be able to:

Carry out core vector–matrix operations—addition, multiplication, transposition, inversion, trace and determinant—both analytically and with computational tools such as SageMath/NumPy.
Compute and interpret norms, inner products and distances in -ℝⁿ vector spaces, using these measures to assess similarity and orthogonality in data representations.
Solve systems of linear equations and perform matrix factorizations to support dimensionality-reduction, stability analysis and numerical optimisation workflows.
Evaluate eigenvalues and eigenvectors.
Derive and implement back-propagation updates for simple feed-forward neural networks, translating analytical derivatives into executable code.
Apply change-of-basis and coordinate-transformation techniques (orthogonal/orthonormal, diagonalisation) to simplify problem formulations and reveal latent structure in datasets.
Leverage computational mathematics environments (e.g., SageMath) to experiment with vector fields, level-set visualisations and optimisation trajectories, validating analytical results numerically.

Competences:

Upon successful completion of the Module, students will be able to:

Apply core mathematical tools (e.g., linear algebra, calculus) to analyze and interpret machine learning models.
Select and apply appropriate matrix decomposition techniques in practical data scenarios.
Use vector space concepts (orthogonality, inner products, basis changes) in interpreting and simplifying machine learning problems.
Evaluate the significance of gradients and backpropagation in optimizing learning algorithms.
Use computational tools (SageMath) autonomously to explore mathematical properties relevant to machine learning.

Evaluation: Completion of written assignments during the academic semester which constitute a 30 percent of each student’s grade, if a pass is obtained in the final or repetitive examination. Final exam grades constitute a 70 percent of the students’ final course grade. For further information, please go to the HOU Study Guide.

Prerequisites: There are no prerequisites for this module.

Teaching Method: Distance using the HOU Distance Learning Platform and conducting Group Counseling Meetings (tele-OSS).

DAMA502 Statistics and Optimization

Module Code: DAMA502

ECTS Credit Points: 15

Module Type: Compulsory/Elective

Semester in which it is offered: 1^st/2^nd/3^rd semester

Language: English

Module Outline

Purpose: The students will learn the basic mathematical tools necessary for Machine Learning (ML). These include basic concepts from probability theory, introductory statistics and convex optimization. Also, the student will learn basic visualization techniques of 1D and 2D data. Overall, a student without prior knowledge of these mathematical areas will be able to form a background to understand ML techniques while students with prior mathematical knowledge will be able to go much deeper in application of mathematics in ML. The mathematical study will be supplemented by computational software that will enable both analytical and numerical evaluations.

The key subjects of the module are” Probability Theory and Statistics”, “Convex Optimization” and “Visualization”.

Learning Outcomes:

Knowledge:

Upon successful completion of the Module, students will be able to:

Recognize that basic mathematical pillars for machine learning are probability theory, statistics and optimization and apply analytical and computational tools.
Summarize the properties of single variate and multivariate Gaussian distribution, find marginals and conditionals as well as transformations of the Gaussian function.
Focus on the binomial Bernoulli distribution and detail the Beta distribution.
Summarize the conjugate priors connected through Bayes theorem.
Explain what sufficient statistics is and outline the exponential family of distributions.
Perform a change of random variables and find the new distribution function.
List basic statistical analysis techniques
Perform hypothesis testing
Identify outliers
Recall how to find minima of a function of a single variable.
Summarize the procedure to find the minimum of a multivariate function using the gradient descent algorithm
Explain how to perform stochastic gradient descent and what are its advantages and limitations compared to the gradient descent method
Describe what are the Lagrange multipliers and explain how they are used in constrained optimization.
Describe convex optimization
Use SageMath to find the minimum of a multivariate function and t to minimize a function with constraints. Evaluate the performance of a model
Acquire knowledge on how to extract useful information from data visualization.
Understand how to visualize 1D and 2D data using binning, density plots, scatter plots, and box plots.

Skills:

Upon successful completion of the Module, students will be able to:

Perform change of variables in probability distributions and derive the resulting distribution functions.
Conduct hypothesis testing using standard statistical procedures.
Detect and interpret outliers in datasets.
Use SageMath to minimize multivariate functions and solve constrained optimization problems.
Evaluate the performance of machine learning models using optimization and statistical metrics.
Generate appropriate visualizations (e.g., histograms, density plots, scatter plots) for 1D and 2D data using appropriate tools.

Competences:

Upon successful completion of the Module, students will be able to:

Select and apply appropriate probabilistic, statistical, and optimization methods to analyze and solve problems in machine learning.
Combine statistical reasoning with computational tools to interpret real-world data and make informed decisions.
Critically assess the suitability of optimization algorithms (e.g., SGD vs. GD) for training specific machine learning models.
Solve practical optimization problems using constrained methods (e.g., Lagrange multipliers) within a machine learning context.
Interpret and communicate findings from data visualizations to support data-driven conclusions.
Work autonomously with mathematical software tools (e.g., SageMath) to analyze data and validate learning algorithms.

Prerequisites: There are no prerequisites for this module.

Teaching Method: Distance using the HOU Distance Learning Platform and conducting Group Counseling Meetings (tele-OSS).

DAMA503: Programming, Databases and Algorithms

Module Code: DAMA503

ECTS Credit Points: 15

Module Type: Compulsory

Semester in which it is offered: 1^st semester

Language: English

Module Outline

Purpose: The goal of this module is to help students comprehend foundational concepts to prepare them appropriately for specialized knowledge on data science and machine learning in subsequent modules. It attempts to function as a bridge between introductory and more advanced data science courses of the program.

The module starts with a presentation of fundamental algorithms and data structures and their role in the context of data science tasks. Algorithms (searching, sorting, recursion, graph algorithms) and Data structures (stacks, queues, linked lists, trees, hash tables, sparse matrices) will be presented in terms of their complexity. Next, the topic of database systems will be discussed to show students how to understand, query and manipulate structured data (tables, keys, normalization, SQL). This section will also cover NoSQL databases. The module continues with offering the practical skills for collaborative and maintainable coding in the context of data science projects that make use of databases, data structures and algorithms. The module completes with practical examples of applications of Data Science that aims to function as a bridge to more advanced concepts.

Learning Outcomes:

Knowledge:

Upon successful completion of the Module, students will be able to:

Understand and explain the role basic data structures play in data science tasks.

Skills:

Upon successful completion of the Module, students will be able to:

Perform analysis of data structures in terms of their time and space complexity
Use the appropriate data structures and algorithms to address data science tasks
Apply recursion to the proper contexts
Design databases to store structured data
Filter data from structured databases using SQL
Handle unstructured data using NoSQL databases
User version control systems to develop data science projects
Integrate software management tools into the development process

Competences:

Upon successful completion of the Module, students will be able to:

Evaluate different data structures and algorithms for data science tasks
Evaluate data stores in terms of the ability to store structured and unstructured data
Evaluate software development tools

Prerequisites: There are no prerequisites for this module.

Teaching Method: Distance using the HOU Distance Learning Platform and conducting Group Counseling Meetings (tele-OSS).

DAMA510: Machine Learning

Module Code: DAMA510

ECTS Credit Points: 15

Module Type: Compulsory

Semester in which it is offered: 2^nd semester

Language: English

Module Outline

Purpose: The students will acquire a background on the algorithmic aspects and the computational requirements of key data science and machine learning approaches. They will learn fundamental concepts and principles that underlie the techniques for extracting knowledge from data, they will become acquainted with a number of practical considerations regarding the analysis and the interpretation of the data, the assessment of the quality of the input data and the derivation of insights from the results of mining the data. After completing this module, they will be able to apply theory, and use languages, algorithms and tools to solve real world problems and to interpret and communicate findings to any kind of audience.

Subjects covered:

Data preprocessing, Feature engineering, Outlier detection, Dimensionality reduction, Clustering, Frequent itemsets, Association rules, Decision Trees, Regression, Support vector machines, Neural networks.

Learning Outcomes:

Knowledge:

Upon successful completion of the Module, students will be able to:

Explain the key phases of the data science process and the role of the data scientist.

Skills:

Upon successful completion of the Module, students will be able to:

Assess the quality and characteristics of input data, such as data types, missing values and outliers.
Apply data preprocessing techniques, such as data cleaning, transformation, and feature scaling, using appropriate tools and languages.
Perform dimensionality reduction to reduce complexity in high-dimensional data.
Compute similarity and distance measures for numerical and categorical attributes.
Apply clustering algorithms to discover groupings in unlabeled data.
Apply frequent itemset mining and association rule learning to extract patterns from transactional data.
Apply regression and classification models to labeled datasets.
Analyze dataset characteristics and prepare data for supervised learning by handling imbalance, selecting attributes, and encoding variables.
Select and engineer relevant features using dimensionality reduction and feature selection techniques to improve predictive accuracy.
Apply learning based on Support Vector Machines and neural networks to classification tasks, and analyze their performance and behavior.

Competences:

Upon successful completion of the Module, students will be able to:

Communicate insights and results effectively, using appropriate visualization tools.
Evaluate the suitability of clustering paradigms for different problems.
Evaluate association rule models based on validation metrics and domain relevance.
Evaluate model performance using a variety of metrics.

Prerequisites: There are no prerequisites for this module.

Teaching Method: Distance using the HOU Distance Learning Platform and conducting Group Counseling Meetings (tele-OSS).

DAMA600 Mining of Massive Datasets

Module Code: DAMA600

ECTS Credit Points: 15

Module Type: Compulsory

Semester in which it is offered: 2^nd/3^rd semester

Language: English

Module Outline

Purpose: This module equips students with specialized knowledge in mining and analyzing massive datasets, focusing on scalable algorithms and big data frameworks. Unlike traditional data science courses, it emphasizes techniques designed to handle data that exceeds the capacity of main memory and must be processed using distributed systems. Students will explore the architecture and principles of systems like MapReduce and Spark, which support large-scale data processing. They will learn methods for efficient similarity search, including minhashing and locality-sensitive hashing, tailored to high-dimensional data. The course covers algorithms for mining frequent patterns and association rules at scale, going beyond conventional in-memory approaches.

In the context of streaming data, students will understand models and techniques for real-time processing, such as sketches and approximate counting. A strong emphasis is placed on mining structured data like graphs, where students will study PageRank, HITS, community detection, and triangle counting—especially relevant in web and social network analysis. The course also introduces scalable recommendation systems using collaborative filtering and matrix factorization methods. Techniques for dimensionality reduction, such as CUR decompositions and random projections, are discussed with an emphasis on their scalability and suitability for large datasets. Machine learning content focuses on the efficient implementation of classification and clustering algorithms for massive datasets.

Students will also examine how to design algorithms under resource constraints, and how trade-offs in approximation, speed, and accuracy are managed at scale. Throughout the course, theoretical foundations are paired with practical assignments involving large datasets and distributed environments. Unlike DAMA510, which focuses on statistical models and introductory machine learning, this course prioritizes the engineering and algorithmic challenges of working with truly massive data. By the end of the module, students will be capable of designing, implementing, and evaluating scalable data mining pipelines using contemporary frameworks.

Learning Outcomes:

Knowledge:

Upon successful completion of the Module, students will be able to:

Describe the challenges of mining large-scale datasets and discuss the relevant computing architectures.
Define and apply similarity measures, and use techniques such as shingling and minhashing for data comparison.
Design locality-sensitive hashing (LSH) schemes to perform efficient similarity search.
Use and evaluate scalable algorithms for frequent itemset mining.

Skills:

Upon successful completion of the Module, students will be able to:

Analyze and model large-scale graph data using algorithms like PageRank and community detection.
Implement data stream processing algorithms using sampling and sketching methods (e.g., Count-Min Sketch).
Apply clustering techniques adapted for large datasets, including K-means and hierarchical clustering methods.
Develop scalable recommender systems using collaborative filtering and matrix factorization.

Competences:

Upon successful completion of the Module, students will be able to:

Understand dimensionality reduction methods including SVD and CUR decompositions.
Describe and implement scalable classification methods for large data (e.g., decision trees, naïve Bayes).
Apply machine learning algorithms in a distributed computing framework like MapReduce.
Evaluate the efficiency, scalability, and applicability of massive data mining techniques in real-world scenarios.

Prerequisites: There are no prerequisites for this module.

Teaching Method: Distance using the HOU Distance Learning Platform and conducting Group Counseling Meetings (tele-OSS).

DAMA610 Deep Learning

Module Code: DAMA610

ECTS Credit Points: 15

Module Type: Compulsory

Semester in which it is offered: 3^rd semester

Language: English

Module Outline

Purpose: The students will be able to implement deep machine learning methods in Jupyter notebooks, use Scikit-Learn, TensorFlow/Keras and PyTorch, write and execute python code. The students are expected to be familiar with linear and nonlinear regression, support vector machines, perform model regularization, implement decision trees and ensemble learning in the form of random forests. They are also expected to know how to perform dimensionality reduction and use principal component analysis. The module focuses on neural network methods and deep learning including fully connected deep networks, convolutional neural networks, pre-trained models, large language models, autoencoders and generative models. Use of recurrent neural networks, physics informed neural networks and restricted Boltzmann machines completes the material of the module. DAMA-610 builds heavily on DAMA-510 and after its completion the students will be able to use the mathematical tools acquired in the latter in real world data problems.

Learning Outcomes:

Knowledge:

Upon successful completion of the Module, students will be able to:

Understand the concept of Supervised, Unsupervised, Self-Supervised and Reinforcement Learning.
Understand Transfer Learning and utilize pre-trained models for relevant tasks.
Understand the concept of Generative Models

Skills:

Upon successful completion of the Module, students will be able to:

Implement machine learning and deep learning models and perform hyperparameter tuning, using Jupyter notebooks, Scikit-learn, TensorFlow/Keras, or PyTorch.
Use Recurrent Neural Networks (RNNs) and evaluate their effectiveness.
Apply Convolutional Neural Networks (CNNs) to specific data sets and describe their structure.
Use Large Language Models (LLMs) for a collection of language tasks.
Perform self-supervised learning by implementing autoencoders
Generate images with Generative Adversarial Networks (GANs) and assess the output quality.
Understand the concept of Diffusion Models and identify emerging use cases.
Apply reinforcement learning to specific problems.
Use Physics Informed Neural Networks (PINNs) in scientific applications
Describe Boltzmann and restricted Boltzmann machines and their role in deep learning.

Competences:

Upon successful completion of the Module, students will be able to:

Evaluate the effectiveness of Artificial Neural Networks in different contexts.
Assess the quality of outputs generated by GANs.
Utilize pre-trained models for relevant tasks using Transfer Learning.

Prerequisites: There are no prerequisites for this module.

Teaching Method: Distance using the HOU Distance Learning Platform and conducting Group Counseling Meetings (tele-OSS).

DAMA700: Applied Research and Development: Systems Practicum

Module Code: DAMA700

ECTS Credit Points: 15

Module Type: Elective

Semester in which it is offered: 3^rd semester

Language: English

Module Outline

Purpose: This module offers students a hands-on opportunity to design, develop, and evaluate intelligent systems in a real-world or research-driven context. Building on prior knowledge of machine learning and deep learning, students will undertake a guided project that emphasizes applied research and systems integration. Projects may involve real datasets, interdisciplinary components, or collaboration with academic labs or industry partners. The module focuses on solving open-ended problems using advanced computational methods, system-level thinking, and iterative experimentation. Students will be expected to document their development process, evaluate their system’s performance, and communicate outcomes effectively. Through this practicum, learners gain critical experience in bridging the gap between theory and practice, managing uncertainty, and delivering functional, research-informed solutions. This module is ideal for students preparing for roles in applied AI, system prototyping, or R&D-focused careers.

Learning Outcomes:

Knowledge:

Upon successful completion of the Module, students will be able to:

Describe the complete applied-research lifecycle—from problem formulation and literature review to experimental design, implementation, evaluation and dissemination—within the context of intelligent systems.
Explain the architecture of end-to-end machine-learning systems, including data ingestion, preprocessing, model training, validation, deployment and post-deployment monitoring.
Explain techniques for systematic hyper-parameter tuning, model selection and benchmarking under computational, time and resource constraints.

Skills:

Upon successful completion of the Module, students will be able to:

Implement a working prototype of a machine learning pipeline using appropriate tools, libraries, and platforms.
Evaluate system performance using defined metrics and identify areas for refinement.
Communicate technical work clearly through structured reports, code documentation, and presentations.

Competences:

Upon successful completion of the Module, students will be able to:

Analyze a domain-specific problem and formulate system-level research or development objectives.
Design a solution architecture that integrates machine learning or deep learning components.
Collaborate effectively in a team environment, integrating contributions and managing responsibilities.

Prerequisites: There are no prerequisites for this module.

Teaching Method: Distance using the HOU Distance Learning Platform and conducting Group Counseling Meetings (tele-OSS).