Files
Abstract
Knowledge Graph Embedding algorithms learn low-dimensional vector representa- tions for facts in a Knowledge Graph (where each fact is a relation edge between head and tail nodes). Learned embeddings are increasingly adapted for down- stream tasks (like question-answering, statement classification, recommendation engines, and even auto-completion of incomplete Knowledge Graphs). In the real world, training datasets of facts (in domains like news events, public records, or user actions) can be inaccurate due to sampling issues, bias, malicious intent, or noise. This makes it important to design for robustness to corrupted data. This paper investigates how two popular Knowledge Graph Embedding algorithms (TransE, DistMult) respond to deliberate corruption of training data, using the Freebase15K-237 dataset. This paper uses a novel true/false triplet classification evaluation to demonstrate the utility of learned embeddings as features for truth prediction, showing surprising robustness to corrupted data. It uses negative sam- pling to operationalize different types of common-sense reasoning in the stages of learning (model training) and testing (model evaluation) - demonstrating the usefulness of naive negative sampling strategies, contrary to prevailing intuitions in the literature. It uses this to frame a social science interpretation for the Knowledge Graph Embedding training process, with implications for designing for robustness against inaccuracies in Knowledge Graph data.