Deploying large scale antenna arrays is a key characteristic of current and future wireless communication systems. However, due to some non-ideal practical conditions, such as the unknown array geometry or possible hardware impairments, the accurate channel state information becomes hard to acquire. This impedes the design of beamforming/combining vectors that are crucial to fully exploit the potential of the large-scale MIMO systems or to combat the high path-loss in millimeter wave (mmWave) communications. In this paper, we propose a novel solution that leverages deep reinforcement learning (DRL) to learn the beam pattern that is optimized for a group of users without the explicit knowledge of the channels. Simulation results show that the developed solution is capable of finding the near optimal beam pattern with quantized phase shifters and with only requiring the beamforming gain feedback from the users.