In recent years, the amounts of network traffic targeted towards cloud data centers have fluctuated based on user requests. This traffic is bursty and requires a high degree of attention. Due to the variable nature of this traffic, some requests need to be re-allocated on-the-fly. Such circumstances result in performance degradations due to resource management. As appropriate solutions can be proposed only based on understanding the workload and the environment, Reinforcement Learning (RL) is a strategy that is predominantly used. Further, it has been shown that the Poisson arrival rates do not capture real-world burstiness. Thus, we mainly have a two-fold problem to address: (i) the traffic requires a new modelling approach that can characterize the burstiness, and (ii) balancing the load that can maximize the reward to the provider in such circumstances. In this paper, we propose a novel, yet simple traffic modelling that enables burst detection based on an index policy. We show that the throughput constraints play a crucial role in scheduling and our proposed RL technique produces reliable results in such a scenario. Our RL algorithm decides what instance of the request traffic needs to be processed so that the cloud provider can maximize its profit and the decisions made in hindsight are non-biased. We compare the proposed policy with two state-of-the-art approaches and draw key inferences as to why an index policy performs better in scenarios that demand RL. We observe over five times shorter average wait times while bursty workload crosses a saturation limit of 150% compared to conventional policies.