Files

Abstract

Dimension reduction methods can reduce the complexity of data space, extract meaningful information, and decrease the need for computational resources, being useful tools in research. This thesis focuses on comparing four different dimension reduction methods on common social science datasets and then demonstrating the performance of Principal Component Analysis (PCA) versus feature selection measures. By comparing the performance of PCA, t-SNE, UMAP, and autoencoder on Human Development Indicator (HDI) dataset with visualization and clustering, I showcased PCA tended to be more appropriate in this type of dataset. For the second part, I compared PCA and other feature selection measures (L1, L2, tree or random forest selection) on HDI, IMDB movie review, and image datasets, all of which are a representative of common social science datasets, and different measures performed variously based on the features of the datasets.

Details

Actions

from
to
Export
Download Full History