Recent studies have shown that Alzheimer's disease (AD) is related to alteration in brain connectivity networks. One type of connectivity, called effective connectivity, defined as the directional relationship between brain regions, is essential to brain function. However, there have been few studies on modeling the effective connectivity of AD and characterizing its difference from normal controls (NC). In this paper, we investigate the sparse Bayesian Network (BN) for effective connectivity modeling. Specifically, we propose a novel formulation for the structure learning of BNs, which involves one L1-norm penalty term to impose sparsity and another penalty to ensure the learned BN to be a directed acyclic graph - a required property of BNs. We show, through both theoretical analysis and extensive experiments on eleven moderate and large benchmark networks with various sample sizes, that the proposed method has much improved learning accuracy and scalability compared with ten competing algorithms. We apply the proposed method to FDG-PET images of 42 AD and 67 NC subjects, and identify the effective connectivity models for AD and NC, respectively. Our study reveals that the effective connectivity of AD is different from that of NC in many ways, including the global-scale effective connectivity, intra-lobe, interlobe, and inter-hemispheric effective connectivity distributions, as well as the effective connectivity associated with specific brain regions. These findings are consistent with known pathology and clinical progression of AD, and will contribute to AD knowledge discovery.