Millimeter-wave (mm-wave) systems rely on narrow-beams to cope with the severe signal attenuation in the mm-wave frequency band. However, susceptibility to beam mis- alignment due to mobility or blockage requires the use of beam-alignment schemes, with huge cost in terms of overhead and use of system resources. In this paper, a beam-alignment scheme is proposed based on Bayesian multi-armed bandits, with the goal to maximize the alignment probability and the data-communication throughput. A Bayesian approach is proposed, by considering the state as a posterior distribution over angles of arrival (AoA) and of departure (AoD), given the history of feedback signaling and of beam pairs scanned by the base- station (BS) and the user-end (UE). A simplified sufficient statistic for optimal control is identified, in the form of preference of BS-UE beam pairs. By bounding a value function, the second-best preference policy is formulated, which strikes an optimal balance between exploration and exploitation by selecting the beam pair with the current second-best preference. Through Monte-Carlo simulation with analog beamforming, the superior performance of the second-best preference policy is demonstrated in comparison to existing schemes based on first-best preference, linear Thompson sampling, and upper confidence bounds, with up to 7%, 10% and 30% improvements in alignment probability, respectively.