We consider a reinforcement learning (RL) based joint cache placement and delivery (CPD) policy for cellular networks with limited caching capacity at both Base Stations (BSs) and User Equipments (UEs). The dynamics of file preferences of users is modeled by a Markov process. User requests are based on current preferences, and on the content of the user's cache. We assume probabilistic models for the cache placement at both the UEs and the BSs. When the network receives a request for an un-cached file, it fetches the file from the core network via a backhaul link. File delivery is based on network-level orthogonal multipoint multicasting transmissions. For this, all BSs caching a specific file transmit collaboratively in a dedicated resource. File reception depends on the state of the wireless channels. We design the CPD policy while taking into account the user Quality of Service and the backhaul load, and using an Actor-Critic RL framework with two neural networks. Simulation results are used to show the merits of the devised CPD policy.