Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration but they also need to be divergent for a well-rounded description of a query. The previous state-of-the-art methods used a number of views of a particular image as one of the key parameters in order to achieve diverse results. This parameter, while improving the overall results, can omit images that are most recently taken. This might not work if the user is interested in recent images when solving the problem of divergence. Secondly, all prior work considered only one of the clustering algorithms for diversification. The performance of most of these clustering techniques is highly data-dependent and might render inefficient for different kinds of datasets. The main focus of this paper is to use visual description of a landmark location by choosing diverse pictures that best describe all the details of a queried location from community-contributed data sets. For this, an end-to-end framework has been built, to retrieve relevant results that are also diverse. Different retrieval re-ranking and diversification strategies are evaluated to find a balance between relevance and diversification. Clustering techniques are employed to improve divergence. A unique fusion approach has been adopted to overcome the dilemma of selecting an appropriate clustering technique and the corresponding parameters, given a dataset to be investigated. Extensive experiments have been conducted on the Flickr Div150Cred dataset. This system has proved to achieve results that are on par with the start-of-art work done on the MediaEval Challenge. This is achieved without using one of the key parameters that contribute to the improved overall metric results - “Number of Views”.