

# HIGHLY DECOMPOSED CORDIC BASED LOW COMPLEXITY DCT ARCHITECTURE FOR IMAGE PROCESSING APPLICATION

D. SATHIYAPRAGASAM<sup>1</sup>, V.ANANDKUMAR<sup>2</sup>

<sup>1</sup>PG Student, VLSI Design, <sup>2</sup>Assistant Professor (Sr.G), Department of ECE, Surya Group of Institutions, India

## Abstract:

Field programmable gate arrays are ideally suited for the implementation of DCT based digital image compression. In this paper highly decomposed DCT algorithm with CORDIC unit is proposed for 2D-DCT computation. In this paper, an area efficient rotation based processing elements is used to do matrix multiplication. Simultaneous implementation of several multiplications will be performed through pipelined based finites state (FSM) for optimized high speed multiplications to overcome the overcome the throughput problems. In this paper, we extend the methodology for designing a low-power area-efficient DCT for image compression using only shift registers, and adders /subtractions, and special interconnections. Through hardware synthesis we provide that shift and add based DCT computations is efficient one over conventional multiplier based approach and finally accuracy was measured by computing PSNR value of reconstructed image with original image. Compared with previous work the proposed method will achieves better performance with considerable quality enhancement.

### I. INTRODUCTION

data processing, Multimedia which encompasses almost every aspects of our daily life such as communication broad casting, data search, advertisement, video games, etc, has become an integral part of our life style. The most significant part of multimedia systems is application involving image or video, which require computationally intensive data processing. Moreover, as the use of mobile device increases exponentially, there is a growing demand for multimedia application to run on these portable devices. Discrete cosine transform (DCT) is one of the major compression schemes owing to its near optimal performance and has energy compaction efficiency greater than any other transform. The principle advantage of image transformation is the removal of redundancy between neighboring pixels. This leads to uncorrelated transform coefficients which can be encoded independently. DCT has that de correlation property.

To discard an appropriate amount of information, the compressor divides each DCT output value by a quantization coefficient and rounds the result to an integer. The larger the quantization coefficient, the more data is lost, because the actual DCT value is represented less and less accurately. Each of the 64 positions of the DCT output block has its own quantization coefficient, with the higher-order terms being quantized more heavily than the loworder terms i.e. the higher-order terms have larger quantization coefficients. The resulting coefficients contain a significant amount of redundant data. Huffman compression will listlessly remove the redundancies, resulting in smaller data. The human eye is able to distinguish between small differences in brightness over a relatively large area but it is not capable of distinguishing the exact strength of a high frequency brightness variation so good. Hence we can greatly reduce the amount of information in the high frequency components. This is achieved by simply dividing each component in the frequency domain by a constant for that component, and then rounding to the nearest integer. This rounding operation is the only loss operation in the whole process if the DCT computation is performed with sufficiently high precision. As a result of this, it is typically the case that many of the higher frequency components are rounded to zero, and many of the rest become small positive or negative numbers, which take many fewer bits to represent.Entropy coding is a form of lossless data compression. The steps involved in entropy encoding are arrangement of image components in a zigzag manner employing run length encoding (RLE) algorithm. The RLE groups similar frequencies together, inserting length coding zeros, and then using Huffman encoding on what is left. The arithmetic coding technique is mathematically superior to Huffman coding and the JPEG standard also allows it's though it is not mandatory. Arithmetic coding typically makes files about 5-7% smaller. The previous quantized DC coefficient is used to predict the current quantized DC coefficient. The difference between the two is encoded rather than the actual value. The encoding of the 63 quantized AC coefficients does not use such prediction differencing. DCT algorithms with multiplier units are opted for high speed purpose [5] but they tend to consume more power. Hence multipliers less Distributed Arithmetic based algorithms were proposed. In а novel reconfigurable adder-based architecture for DA realizing the inner product which is the key computation in many digital signal processing applications was proposed. In ROM based DA architecture, the input signal correlations and



quantization are exploited to reduce the arithmetic operation. The drawback of DA based architecture is that as the number of inputs and the internal precision increases, it needs a large size of ROM, which increases the hardware complexity. Several NEDA based algorithms which are free from ROM eliminates the redundancy in the conventional DA based DCT have been proposed. [2] [3].The architecture presented here uses algorithmic strength reduction technique is used to reduce the complexity in the computation of 1-D DCT. This reduction in complexity reduces the power consumption. The speed of the architecture is increased by pipelining technique.

# II. RELATED WORK AND CONTRIBUTIONS

A JPEG (Joint Photographic Experts Group) has been working to establish the first international compression standard for continuous-tone still images, both grayscale and color, JPEG'S proposed standard aims to be generic, to support a wide variety of applications for continuous-tone images. To meet the standard includes two basis compression methods, each with various modes of operation. A DCT-based method is specified for loss compression, and a predictive method for "lossless" compression. JPEG features a subset of the other DCT-based modes of operation. The baseline method to date, and is sufficient in its own right for large number of applications, this article provides an overview of the JPEG standard, and focuses an overview of the JPEG standard, and focuses in detail on the baseline method.

H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group (MPEG), the main goals of the H.234/AVC standardization effort have been enhanced compression of this performance and provision of the "networkfriendly" video representation addressing "conversational" (storage, broadcast, or streaming) applications, H.264/AVC, describes profiles and applications for the standard, and applications for the standard and outline the history of the standardization process for low-power and Dualvideos a standard requirement, especially in mobile applications. Due to the advent of the newly announced H.264, a generic problem of standard incompatibility has appeared between H.264 and prevalent MPEG-x video standards, which must be resolved on both algorithmic and architectural levels. A novel discrete cosine transform (DCT) architecture that allows aggressive voltage scaling for low-power dissipation, even under process parameter variations with minimal overhead as opposed to existing techniques. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors appear only from the long paths that are designed to be less

contributive to output quality. The proposed architecture allows a graceful degradation in the peak SNR (PSNR) under aggressive voltage scaling as well as extreme process variations. Results show that even under large process variations and aggressive supply voltage scaling, there is a gradual degradation of image quality with considerable power savings for the proposed architecture, when compared to existing implementations in a 90-nm process technology.

## III PROPOSED APPROCH

Raw image data used in applications such as high definition television, video conferencing, computer communication, etc. require large storage and high speed channels for handling huge volumes of image data. In order to reduce the storage and communication channel bandwidth requirements to manageable levels, data compression techniques are inevitable. To meet the timing requirements for high speed applications the compression has to be achieved at high speeds. The speed of the system depends on the time taken by the longest data path which is known as the critical path. To increase the speed of the architecture, pipelining technique can be used. Pipelining is the technique of introducing latches along the data path so as to reduce the critical path. Reduction in the critical path increases the speed of operation. Pipelining latches can only be placed across any feed-forward cutest of the architecture graph. The clock period is then limited and the critical path may be between An input and a latch, a latch and an output, two Latches an input and an output.

In this architecture, the pipelining is of fine grain type as the pipeline latches are introduced inside the 1-D module. The critical path of the 1-D module is from the input to the output. This includes the computational time of pre-computing modules and the final adders as well. Referring to the Figure 3.3 and 3.4, the critical path time is from the onset of the input and the arrival of the speed can be improved if latches are introduced between the shifter and adder module of pre-computing modules as shown in the Figure 4.2. After pipelining the critical path is from the input(X) to 3X generator.



FIG .1 DCT BLOCK DIAGRAM





#### FIG 2 BLOCKS DIAGRAM OF PIPELINED PRE-COMPUTING MODULES

#### A) INVERSE DCT ARCHITECTURE

The future of hand-held portable devices that can receive compressed image/video content likely depends on finding effective energy management techniques for standards like the MPEG-2 video decoder. The Inverse Discrete Cosine Transform (IDCT) is the most computationally intensive portion of the image/ video decoder. Thus, it would be desirable, in terms of energy conservation, to use a low complexity approximate. The technique used here is the strength reduction which removes the redundant calculations

### B) ODD AND EVEN DCT ALGORITHM

The 2-D IDCT is obtained by row column decomposition. In inverse DCT, column wise transform is done first and the row transform is then applied on the transposed matrix of column transform output. Each 1-D IDCT is split into odd and even matrix as that of the direct 2-D DCT architecture. This reduces the complexity of the architecture. An N point inverse 2-D DCT is given by (10)

$$x(i,j) = \frac{2}{N} \sum_{u=0}^{N-1} \sum_{v=0}^{N-1} C(u)C(v)X(u,v)\cos\frac{(2i+1)u\pi}{2N} \times \cos\frac{(2j+1)v\pi}{2N}$$











FIG 5 THE EVEN DCT ARCHITECTURE

The even DCT contains the DC part of the transformed image i.e. in frequency domain. Hence care should be taken in calculating the Z0 value. The corresponding cosine basis is d and is represented as  $d = 2^5 + 2^3 + 2^2 + 1$ . The precomputing unit of the even DCT consists of 1A, - 1A and 3A only

Paper ID # IC15039



## TABLE I ODD BASIS

| Real Valued Basis | Binary Equivalent |  |
|-------------------|-------------------|--|
| 0.8348            | 110 1111          |  |
| 0.8377            | 110 1011          |  |
| 0.5530            | 100 0110          |  |
| 0.2431            | 001 1111          |  |
| 0.2360            | 001 1110          |  |
| 0.8859            | 111 0001          |  |
| 0.5557            | 100 0111          |  |
| 0.2386            | 001 1110          |  |
| 0.8270            | 110 1001          |  |
| 0.8914            | 111 0010          |  |

#### TABLE II EVEN BASIS

| Real Valued Basis | Binary Equivalent |  |
|-------------------|-------------------|--|
| 0.7033            | 101 1010          |  |
| 0.9271            | 111 0110          |  |
| 0.7112            | 101 1011          |  |
| 0.3835            | 011 000 1         |  |
| 0.7120            | 101 1011          |  |
| 0.7104            | 101 1011          |  |
| 0.7070            | 101 1010          |  |

# TABLE I

COMPARISON TABLE

| S/N<br>O | PARAMETER<br>S | EXISTNG<br>METHOD                                  | PROPOSE<br>D<br>METHOD |
|----------|----------------|----------------------------------------------------|------------------------|
| 1        | AREA           | 60<br>EMBEDDED<br>MULTIPLIE<br>R SHOULD<br>BE USED | NIL                    |
| 2        | FREQUENCY      | 158.35MHZ                                          | 156.52MHZ              |
| 3        | POWER          | 28.26 mW                                           | 24.99 mW               |

# IV SIMULATION RESULT & VERIFICATIONS

a) Simulation output for original spatial data sequence



b) Simulation output for 2-d DCT coefficients



c) Reconstructed spatial data sequence (8x8)



d) Image analysis results

i) Original image



ii) Image reconstructed from DCT coefficients



*Paper ID # IC15039* 



#### CONCLUSION

The 2-D DCT and IDCT architectures which adopts algorithmic strength reduction technique to reduce the device utilization pulling the power consumption low have thus been designed. The DCT computation is performed with sufficiently high precision yielding an acceptable quality. The pipelined 2-D DCT architecture achieves a maximum operating frequency of 295.95 MHZ. The first 8 2-D coefficients arrived at the nineteenth clock cycle and for the full 64 coefficients, it took about 26 clock cycles to compute. The future work can be oriented towards developing an encoder by architecting a quantize, based on the strength reduction technique and an entropy encoder.

#### REFERENCES

[1] N. Ahmed, T. Natarajan, and K. R. Rao. "Discrete Cosine Transform" IEEE Trans.Comput, vol. 23, no. 1,pp. 90-93, Jan, 1996.

[2] J. R. Caballero and F. T. Luk, "CORDIC arithmetic for an SVD processor," J, Parallel Distrib. Comput., vol. 5. No. 3, pp. 271-290, jun. 1988.

[3] **Y.Giegand, K.J.sullivan, P, Bjontegarrd, and A, Luthra,** "Video encoder and decoder using image processing standard." IEEE Trans.circuit system video tech., vol 13, no. 7,pp. 500-576,2003.

[4] **G. Karakonstantis, N, Banerjee, K, Roy**, "Process-variation resilient and voltage-scalable DCT architecture for robust low-power computing," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no, 10, pp. 1461-1470, oct, 2010.

[5] **T. Liu, S, Wang and C. Lee,** "A Low-power dual-mode video decoder for mobile applications," IEEE Commun. Mag., vol. 44, no. 8,pp. 119-126, Aug. 2006.

[6] J. R. Qaballero and F. T. Muk, "H/Avc vedio coding standard," J, Parallel Distrib. Comput., vol. 5. No. 3, pp. 271-290, jun. 1988.

[7] Wiegand, G.J. Sullivan, G, Bjontegarrd, and A, Luthra, "Overview of the H.264/AVC video coding standard," IEEE Trans. Circuits Syst Video Technol., vol 13, no. 7, pp. 560-576, jul, 2003.

[8] **G.K. Wallace.** "The JPEG still picture compression standard," IEEE Trans, consum. Electron., vol.38 no., 1, pp. 18-34, Feb. 1992.