A distributed canny edge detector: Algorithm and FPGA implementation

Qian Xu, Srenivas Varadarajan, Chaitali Chakrabarti, Lina Karam

Research output: Contribution to journalArticlepeer-review

135 Scopus citations


The Canny edge detector is one of the most widely used edge detection algorithms due to its superior performance. Unfortunately, not only is it computationally more intensive as compared with other edge detection algorithms, but it also has a higher latency because it is based on frame-level statistics. In this paper, we propose a mechanism to implement the Canny algorithm at the block level without any loss in edge detection performance compared with the original frame-level Canny algorithm. Directly applying the original Canny algorithm at the block-level leads to excessive edges in smooth regions and to loss of significant edges in high-detailed regions since the original Canny computes the high and low thresholds based on the frame-level statistics. To solve this problem, we present a distributed Canny edge detection algorithm that adaptively computes the edge detection thresholds based on the block type and the local distribution of the gradients in the image block. In addition, the new algorithm uses a nonuniform gradient magnitude histogram to compute block-based hysteresis thresholds. The resulting block-based algorithm has a significantly reduced latency and can be easily integrated with other block-based image codecs. It is capable of supporting fast edge detection of images and videos with high resolutions, including full-HD since the latency is now a function of the block size instead of the frame size. In addition, quantitative conformance evaluations and subjective tests show that the edge detection performance of the proposed algorithm is better than the original frame-based algorithm, especially when noise is present in the images. Finally, this algorithm is implemented using a 32 computing engine architecture and is synthesized on the Xilinx Virtex-5 FPGA. The synthesized architecture takes only 0.721 ms (including the SRAM read/write time and the computation time) to detect edges of \(512\times 512\) images in the USC SIPI database when clocked at 100 MHz and is faster than existing FPGA and GPU implementations.

Original languageEnglish (US)
Article number6774938
Pages (from-to)2944-2960
Number of pages17
JournalIEEE Transactions on Image Processing
Issue number7
StatePublished - Jul 2014


  • Canny edge detector
  • Distributed image processing
  • FPGA
  • high throughput
  • parallel processing

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'A distributed canny edge detector: Algorithm and FPGA implementation'. Together they form a unique fingerprint.

Cite this