Files
Abstract
Machine learning has crowned itself as a breakthrough in a number of domains, such as computer vision and natural language processing, matching and sometimes exceeding human level performance on certain supervised learning tasks.A major factor driving these success stories is the effort undertaken in designing different artificial neural network architectures that are particularly equipped to handle specialized tasks. For example, convolutional neural networks (CNNs) were designed to leverage spatial locality and other properties assumed in images, whereas recurrent neural networks and attention mechanisms are used with sequential data. While the majority of datasets available may not exhibit a special structure and is presented in plain tabular form, the most commonly used artificial neural network on tabular datasets is the multi-layer perceptron (MLP).
Similar to how choosing the right learner architecture may be considered as part of the training process that is essential for a good performance, we believe that some effort on the data representation side could further improve the learning outcome. The underlying idea when using artificial neural networks is that representation learning happens automatically using raw data, unlike classical machine learning methods that are still widely applied on tabular datasets, and that are often accompanied by feature selection and transformation techniques. We hypothesize that certain feature transformations could enhance the representation learning process and the subsequent machine learning outcome achieved by artificial neural networks, and propose three feature transformation techniques. The key principles underlying these transformations are highlighting the entities and relationships described by the data. The first transformation leverages our domain knowledge by manually designing visual representations of the features to be used by a two-dimensional CNN. Within the tabular space, one transformation generates a partitioning of the input feature vector such that each partition represents an entity or a relationship, to be used by a modular MLP (an MLP with multiple input layers). Another transformation generates a permutation of the feature vector where related features are neighbors, to be used with a one-dimensional CNN to capture the implicit feature groups. We provide empirical evidence suggesting that such transformations yield better results than baseline MLPs and existing methods, and require less time, data, and parameters.
We propose a method to automate the transformation process, and evaluate it empirically using synthetic and real-world datasets. The synthetic datasets are designed in a way that allows different levels of representation of the underlying entities and relationships, providing additional insight into the learning process. The real-world datasets are derived from experiments reported by other approaches that propose automatic feature transformation techniques to enhance deep learning performance. Our results show a clear advantage over these approaches, not only in the learning outcome and the resources required, but also in the simplicity of method.