Computational methods for large databases
Computational power and approaches based on bioinformatics tools and algorithms, machine learning (ML) or artificial intelligence (AI) are gaining access to the health care systems. Likewise, our possibilities for evidence generation are growing.
ML techniques can be categorized into supervised and unsupervised learning approaches.
Machine Learning Main Techniques:
- Supervised Techniques
- Classification
- Regression
- Unsupervised Techniques
- Clustering
- Association Rules
Supervised and Unsupervised Learning Algorithms
The availability of the outcome of interest is the big difference between both approaches. In fact, while in supervised learning algorithms the outcome is given to the algorithm – which can use it as gold standard to convert the input features into the outcome – in unsupervised learning approaches are methods where an algorithm must learn to model the underlying distribution of data elements given input features, but no outcome variable. Supervised methods can be costly and resource intense as they may require human expert input for defining and preparing a gold standard (i.e. the output label). In contrast, unsupervised methods rely purely on the quantity and quality of data for the training process - do not require the manual cost and effort required to develop a gold standard, which can lead to weaker performance.
For supervised approaches, Classification (which can predict a discrete or categorical output variable) and Regression (predicts a numerical continuous output variable) models are the major categories. Some classification models are
(1) Simple logistic – uses a logistic function which is used to predict the outcome variable;
(2) Support vector machines - identifies an optimal hyperplane (a subspace whose dimension is -1 of its ambient space) capable of separating data into each outcome;
(3) Decision trees – generally predicts the value of an outcome by learning decision rules inferred from the training dataset. Among examples of Regression Algorithms are simple logistic regression and random forest regression.
Overall, the process in supervised ML encompasses the following steps:
- During training, the model is given both the features and the labels and learns how to map the former to the latter.
- A trained model is evaluated on a testing set, where we only give it the features, and it makes predictions.
- Then, the predictions are compared with the known labels for the testing set to calculate accuracy.