Files
Abstract
This paper proposes fuzzy clustering as a statistical technique to analyze neighborhoods in a manner which acknowledges three simple, yet difficult to account for characteristics; a universal dataset which covers the entirety of the US; categorical classifications which utilize a range of variables; and an acknowledgement of the heterogeneity within classifications. Structured in two sections, the first posits that fuzzy clustering performs better than k-means clustering, the most widely used dimension reduction technique. To demonstrate this, two comparative models are created using census data from 2010 and 2020. Through these models, clustering fit is tested through a t-test of with sum-of-squares, demonstrating that fuzzy clustering provides an optimal output. Then, the predictive power is tested by analyzing neighborhood transitions between 2010 and 2020, where lower BIC scores show that latent minority membership provides a significant added value. The second portion focuses on identifying the added value for fuzzy clustering, where the paper identifies differing characteristics within a single neighborhood typology, and how neighborhoods which exhibit multiple characteristics are distributed across different metropolitan areas. Concluding, it shifts the scale of the analysis to the metropolitan region of Chicago, and then the neighborhood of Hyde Park where it concretizes how fuzzy clustering identifies census tracts that do not neatly align with a single category.