Quantum computing is positioned to be a transformative technology to key areas of scientific interest such as quantum chemistry, material science, and cryptography. At a hardware level, recent years have seen broad progress in a variety of qubit modalities, however, error rates are still orders of magnitude higher than what is necessary for large-scale quantum programs. This gap in error rates motivates the use of quantum error correction (QEC) and the realization of fault-tolerant quantum computing (FTQC), yet today, although the theory of FTQC is promising, effectively connecting it to real devices poses significant challenges. In this dissertation we study opportunities for more efficient and scalable FTQC architectures through the co-design of quantum hardware and software. We first look at how unique hardware capabilities in quantum computers made from 2D arrays of optically trapped atoms can improve QEC performance. For computation, we study how longer-range interactions between atoms can enable a novel embedding of QEC blocks on physical qubits that improves logical-level connectivity. For memory, we study how mid-circuit movement of atoms can effectively implement higher encoding rate QEC codes, creating a qubit-efficient memory system. We next consider the performance impact of real-time classical systems in FTQC architectures, and explore the use of speculation to improve large-scale decoding systems. Finally, we study how software systems can be used to improve QEC designs. We consider the problem of QEC circuit optimization, and study an automated approach based on iterative optimization of performance-reducing subcircuits.