This thesis explores the theoretical foundations of distributed,machine learning with practical considerations. Distributed learning,systems are able to handle data sets that cannot be processed on a,single machine, and utilize parallel computing resources to speed up,the learning process. However, it also brings unique challenges due to,the characteristics of modern data sets and distributed computing,infrastructure: on one hand, machines are required to being able to,extract meaningful parsimonious structures from the large-scale,,high-dimensional data; on the other hand, the heterogeneity in,distributed data sets enforce us to consider flexible models that are,adaptive to each local machine; last but not least, the overall,effectiveness in distributed learning systems depends on the,efficiency of learning algorithms on multiple resources: computing,,communication, sample and memory, and we need to design algorithms,that balance multiple efficiency constraints.,In this thesis, we considered distributed machine learning under both,homogeneous and coupled setting. In the homogeneous setting, each,machine has access to an independent local data set drawn from the,same source distribution, while in the coupled scenario, the local,data sets might drawn from different distributions and the goal is to,extract the common structure through distributed learning. In this,thesis, we study the trade-offs between sample complexity,,computational cost, communication and memory efficiency in both,settings; we propose novel methods that effectively leverage the,similarity/relatedness structure between machines for several,distributed learning problems, with improved theoretical guarantees.,We also examine the practical performance of the proposed approaches,with existing methods via numerical experiments.