We present several approaches to learn generative models for high-dimensional data. Estimating the underlying distribution for a given dataset is one of the most fundamental problems in statistics. Once a distribution estimate is obtained we can in principle derive any population statistic. In particular, we can synthesize data, impute missing values, detect outliers, denoise observations and perform classification. High-dimensional distribution estimation is a challenging problem due to the curse of dimensionality which states that without appropriate model assumptions the required number of samples for accurate estimation increases exponentially in the data dimension. The main theme of this dissertation is to decompose the estimation problem into simpler subproblems. Specifically, we consider mixture models in which the data points are divided into clusters (i.e., the data matrix is split row-wise) and separate models for each data cluster are learned. We also use coarse-to-fine strategies that start from vague descriptions of the data which are then successively refined. Another modeling technique we study is the partitioning of the variables into subsets (i.e., the data matrix is split column-wise). Separate models for these smaller subsets of variables can then be learned. A related approach are compositional models in which multiple local data models are combined into a global data model according to a specified composition rule. A further model family we explore are autoregressive models which make use of the fact that high-dimensional distributions can always be decomposed into products of univariate conditional distributions.