Cloud computing often uses the multi-tenancy architecture where tenants share system software. To support dynamically increasing demands from multi-tenants, the cloud service providers have to duplicate computing resources to cope with the fluctuation of requests from tenants. This is currently handled by virtualization and duplication at the application level in the existing cloud environment, such as Google App Engine. However, duplicating at the application level only may result in significant resource waste as the entire application is duplicated. This paper proposes a two-tier SaaS scaling and scheduling architecture that works at both service and application levels to save resources, and the key idea is to increase the resources to those bottleneck components only. Several duplication strategies are proposed, including lazy duplication and pro-active duplication to achieve better system performance. Additionally, a resource allocation algorithm is proposed in a clustered cloud environment. The experiment results showed that the proposed algorithms can achieve a better resource utilization rate.