# Computational methods for large databases

Computational power and approaches based on bioinformatics tools and algorithms, machine learning (ML) or artificial intelligence (AI) are gaining access to the health care systems. Likewise, our possibilities for evidence generation are growing.

ML techniques can be categorized into supervised and unsupervised learning approaches.

**Machine Learning Main Techniques:**

**Supervised Techniques**- Classification

- Regression

**Unsupervised Techniques**- Clustering

- Association Rules

**Supervised and Unsupervised Learning
Algorithms **

The availability of the outcome of interest is the big difference between both approaches. In fact, while in supervised learning algorithms the outcome is given to the algorithm – which can use it as gold standard to convert the input features into the outcome – in unsupervised learning approaches are methods where an algorithm must learn to model the underlying distribution of data elements given input features, but no outcome variable. Supervised methods can be costly and resource intense as they may require human expert input for defining and preparing a gold standard (i.e. the output label). In contrast, unsupervised methods rely purely on the quantity and quality of data for the training process - do not require the manual cost and effort required to develop a gold standard, which can lead to weaker performance.

For supervised approaches, Classification (which can predict a discrete or categorical output variable) and Regression (predicts a numerical continuous output variable) models are the major categories. Some classification models are

(1) **Simple logistic** – uses a logistic function
which is used to predict the outcome variable;

(2) **Support vector machines** - identifies an
optimal hyperplane (a subspace whose dimension is -1 of its ambient space)
capable of separating data into each outcome;

(3) **Decision trees** – generally
predicts the value of an outcome by learning decision rules inferred from the
training dataset. Among examples of Regression Algorithms are simple logistic
regression and random forest regression.

Overall, the process in supervised ML encompasses the following steps:

- During training, the model is given both the features and the labels and learns how to map the former to the latter.
- A trained model is evaluated on a testing set, where we only give it the features, and it makes predictions.
- Then, the predictions are compared with the known labels for the testing set to calculate accuracy.