Statistical estimation problems in multivariate analysis and machine learning often seek linear relations among variables. This translates to finding an affine subspace from the sample data set that, in an appropriate sense, either best represents the data set or best separates it into components. In other words, statistical estimation problems are optimization problems on the affine Grassmannian, a noncompact smooth manifold that parameterizes all affine subspaces of a fixed dimension. The affine Grassmannian is a natural generalization of Euclidean space, points being $0$-dimensional affine subspaces. The main objective of the first part of this work is to show that, like the Euclidean space, the affine Grassmannian can serve as a concrete computational platform for data analytic problems --- points on the affine Grassmannian can be concretely represented and readily manipulated; distances, metrics, probability densities, geodesics, exponential maps, parallel transports, etc, all have closed-form expressions that can be easily calculated; and optimization algorithms, including steepest descent, Newton, conjugate gradient, have efficient affine Grassmannian analogues that use only standard numerical linear algebra.,We then extend the framework to a nest of linear subspaces, that represent the variables in different regimes. Diving into the multi-scale representation of the data revealed by these problems requires a systematic study of nest of linear subspaces, which form a compact Riemannian manifold called the flag manifold. The main goal of this work is to show that flag manifold can be represented by matrix groups concisely and computed easily, and optimization on flag manifold can be performed with matrix operations, which bridges the gap between algebra and geometry.,Lastly, we study the Yates's algorithm that was first proposed to exploit the structure of full factorial designed experiment to obtain least squares estimates for factor effects for all factors and their relevant interactions. In short it is an organized way to do iterative summation which avoids repeated computation. Many well-known algorithms including Fast Fourier transform and Walsh transform turned out to be special cases of Yates's method. Here we show that Yates's algorithm is optimal in the sense of a contraction of tensors but may be improved when considered from the perspective of bilinear complexity. We also show that it is a projection of a tensor network and point out its relationship with tensor train and tree tensor network.