Let W be a binary-input memoryless symmetric (BMS) channel with Shannon capacity I ( W ) and fix any 0"> 0 . We construct, for any sufficiently small 0"> 0 , binary linear codes of block length O (1 2+ ) and rate I ( W ) − that enable reliable communication on W with quasi-linear time encoding and decoding. Shannon's noisy coding theorem established the existence of such codes (without efficient constructions or decoding) with block length O (1 2 ) . This quadratic dependence on the gap to capacity is known to be best possible. Our result thus yields a constructive version of Shannon's theorem with near-optimal convergence to capacity as a function of the block length. This resolves a central theoretical challenge associated with the attainment of Shannon capacity. Previously such a result was only known for the erasure channel.Our codes are a variant of Arikan’s polar codes based on multiple carefully constructed local kernels, one for each intermediate channel that arises in the decoding. A crucial ingredient in the analysis is a strong converse of the noisy coding theorem when communicating using random linear codes on arbitrary BMS channels. Our converse theorem shows extreme unpredictability of even a single message bit for random coding at rates slightly above capacity.