# Low Power 8, 16 & 32 bit ALU Design Using Clock Gating

R Keerthi Kiran, Dr. A B Kalpana

**Abstract**— In today's world most of us use one or another sort of portable electronic device, and the major problem we face is the battery backup of device. This problem motivated in design of low power ALU which is the integral of any processor. This paper is an attempt in designing a low power energy efficient ALU using clock gating technique to reduce the dynamic power consumption. In this paper an 8-bit, 16-bit and 32-bit ALU with and without clock gating is designed and analyzed for its power consumption for different clock frequencies. It has been seen that we can reduce the dynamic power consumption by using clock gating to a great extent. It is observed that the percentage of power reduction increases as size of ALU is increased.

\_\_\_\_\_ 🜢 \_\_\_\_\_

Index Terms— Low Power, Clock Gating, Dynamic Power, Clock Power, Energy Efficient, ALU.

#### **1** INTRODUCTION

In today's world of rapid growth and development the power required for all the electronics devices is very high and the pollution due to power generation effects the environment. Electronics are embedded in almost all electro/mechanical devices. Designing energy efficient devices increases the run time of the devices dependent upon battery and also decreases power consumption of other electronic devices and hence decrease the carbon footprint.

Lot of research is being conducted on low power design approaches. Clock Gating is one of the prominent in low power design. ALU contains many arithmetic and logic modules and in a given time it is required to compute only one of it. But in normal ALU design all the modules are executed for every operation and generally using a MUX the output of the required module is selected. Assuming an ALU has 16 modules when we need the addition of two numbers; all the 16 different modules are operated and then output of only adder module is selected. It's a waste of power to execute even the other modules that were not required for that operation.

In the synchronous design we need clock signal to synchronize all the modules and signals. The clock signal doesn't carry any significant information and is used only for synchronization. Power consumed by clock network is increasing as the circuit size is increasing and also as operating frequency is increasing. In most case clock power is about 20-30% of total power consumption.

Clock gating is a technique which can be utilized to save both unnecessary clock and logic power. By using clock gating we can disable modules that are not required to be operated at a given time for an operation.The digital synchronous circuit are designed to operate on every raising edge or falling edge of clock, by maintaining the clock at a single state (i.e. at a high or low state; not letting it to toggle) for the modules that are not required at that instant, power can be saved. By disabling portions of the circuit that is not required so that the flips flops in them doesn't have to switch states hence power is saved. Switching states requires power and by not letting to switch state the switching power nears zero.

Clock Gating concept is not free from trade-offs, extra gates have to be added to the circuit to decide when a module is to be disabled and when not. Also another requirement of clock gating is that the modules must have some kind of enable input to decide when they must be gated. The clock gating concept is very useful where there are many modules which are not required to be operated every time.

The effect of clock gating in an 8 bit ALU for different clock frequency is explained and analyzed in [1]. In that the effect of clock gating is analyzed only for a single ALU size for different operating frequencies. In this paper 8, 16 and 32-bit ALU with clock gating and without clock gating is designed and analyzed for their power consumption at not only different clock frequencies but also for different sizes of ALU. The different formats in which clock gating can be applied are explained in [2]. Different formats are Latch based design, Latchfree based design, and flip-flop based design. Different types of power optimization schemes are discussed in [3], which are clock gating, clock enable and blocking inputs that can be used to reduce the total power consumption of a circuit.

The power reduction technique dependent upon a specific application is discussed in [4], [5]. The details of how power consumption can be reduced by using chain structure rather than tree structure is discussed in [4]. In later the use of power efficient dynamic modules for different applications based on the detailed analysis of dynamic circuits are described. But both [4], [5] are application specific.

A power consumption method by using Vedic modules is demonstrated in [6], [7].

R Keerthi Kiran is currently pursuing masters degree program in VLSI & embedded systems in Bangalore Institute of technology, India. E-mail: rkeerthikiran67@gmail.com

<sup>•</sup> Dr. A B Kalpana is Assistant professor in the department of electronics and communication engineering, Bangalore institute of technology, India. E-mail: abkalpana@gmail.com

International Journal of Scientific & Engineering Research, Volume 6, Issue 8, August-2015 ISSN 2229-5518

## 2 METHODOLOHY

#### 2.1 Clock Gating

Clock gating is a popular technique used in many synchronous circuits to reduce dynamic power dissipation. Different methods of clock gating are explained in [2]. Fig 1 shows the general functional block for a synchronous circuit. In this the clock is directly fed to the functional unit. Hence the clock signal is delivered to the module all the time. Fig 2 explains how to apply clock gating. FCG (Functional Clock Gating) generates an enable signal based upon which operation has to be performed. CGL (Clock Gating Logic) in its simplest form is an AND gate which generates a GCLK (gated clock) signal based on clk and FCG signal, which is fed into the functional unit. Whenever output of FCG is high GCLK follows clk and the functional unit is enabled and it performs its task. And whenever output from FCG is low GCLK is low and the functional unit doesn't perform any operation and all the registers, flip-flops and clock network inside the functional unit doesn't undergo any switching activity thus reducing overall power consumption.



Fig 1. General functional block diagram



Fig 2. Functional block diagram with clock gating

## 2.2 Block Diagram of ALU

In Fig 3 top level RTL (Register Transfer Level) schematic of 32 bit ALU is shown of both with and without clock gating. For implementing ALU with clock gating the same select lines can be used to which is used for MUX, it does not require any extra inputs to determine the clock gating logic. Fig 4 shows the detailed schematic of the 8 bit ALU without clock gating and Fig 5 shows the detailed schematic of the 8 bit ALU without clock gating. By observing Fig 4 and Fig 5 it is very clear that clock gating is slightly complex to implement. As we need

FCG and CGL for every module that needs to be gated.



Fig 3. Top level RTL schematic of 32 bit ALU with and without clock gating



Fig 4. Detailed schematic of 8 bit ALU without clock gating

IJSER © 2015 http://www.ijser.org



Fig 5. Detailed schematic of 8 bit ALU with clock gating

In this paper the ALU implemented consists of 16 modules. Table 1: list various ALU operations for different combination of select lines. For ADDER/SUBTRACTOR module Cin must be 0 for ADDITION and 1 for SUBTRACTION. For other modules value of Cin doesn't matter.

# **3** RESULTS AND DISCUSSION

The design is done for FPGA implementation using XILINX ISE Design Suite 14.1 tool. Artix7 family was chosen as it has sufficient number of bounded I/O to support even the 32 bit ALU of this design. Though the design is done in FPGA platform the same design can be used for ASIC implementation.

| Fable 1: | able 1: ALU OPERATIONS BASED ON SELECT LINI |    |           |            |             |  |  |  |
|----------|---------------------------------------------|----|-----------|------------|-------------|--|--|--|
| S        | ;                                           | S2 | <b>S1</b> | <b>S</b> 0 | OPERATION   |  |  |  |
| 0        |                                             | 0  | 0         | 0          | AND         |  |  |  |
| 0        |                                             | 0  | 0         | 1          | OR          |  |  |  |
| 0        |                                             | 0  | 1         | 0          | XOR         |  |  |  |
| 0        |                                             | 0  | 1         | 1          | XNOR        |  |  |  |
| 0        |                                             | 1  | 0         | 0          | INVERT A    |  |  |  |
| 0        |                                             | 1  | 0         | 1          | ADD/SUB     |  |  |  |
| 0        |                                             | 1  | 1         | 0          | MULTIPLY    |  |  |  |
| 0        |                                             | 1  | 1         | 1          | COMPARATOR  |  |  |  |
| 1        |                                             | 0  | 0         | 0          | INC/DEC A   |  |  |  |
| 1        |                                             | 0  | 0         | 1          | INC/DEC B   |  |  |  |
| 1        |                                             | 0  | 1         | 0          | INVERT A    |  |  |  |
| 1        |                                             | 0  | 1         | 1          | SHIFT A/L L |  |  |  |
| 1        |                                             | 1  | 0         | 0          | SHIFT LR    |  |  |  |
| 1        |                                             | 1  | 0         | 1          | SHIFT AR    |  |  |  |
| 1        |                                             | 1  | 1         | 0          | ROTATE L    |  |  |  |
| 1        |                                             | 1  | 1         | 1          | ROTATE R    |  |  |  |

Maximum operating frequency for 8bit ALU is 455 MHz for both with and without clock gating. Maximum operating frequency for 16-bit ALU is 410 MHz for both with and without clock gating. Maximum operating frequency for 32-bit ALU is 311 MHz for both with and without clock gating. Obtaining the power analysis report above the maximum operating frequency will violate the timing constraint so the power is analyzed for a maximum frequency of 300MHz.

Table 2 to 7 gives the power analysis of the different sizes of ALU with and without clock gating. All power indicated in table 2 to 7 are in watts. By comparing Table 3 with Table 2 which is for 8 bit ALU it is analyzed that there is significant amount of power reduction in the ALU where the clock gating has been used. In this paper all the components of dynamic powers are reduced i.e. clock power, logic power, signal power and also IO power. For all the ALU size all the components of dynamic powers were reduced by using clock gating.

Table 2 : POWER CONSUMPTION OF 8-BIT ALUWITHOUT CLOCK GATING. ALL POWERS ARE INWATTS

| Frequency/<br>Time   | Clocks<br>Power | Logic<br>Pow-<br>er | Sig-<br>nals<br>Pow-<br>er | IOs<br>Pow-<br>er | Total<br>Dynamic<br>Power |
|----------------------|-----------------|---------------------|----------------------------|-------------------|---------------------------|
| 100 MHz/<br>10 ns    | 0.004           | 0.002               | 0.002                      | 0.009             | 0.017                     |
| 150 MHz/<br>6.667 ns | 0.006           | 0.003               | 0.004                      | 0.014             | 0.027                     |
| 200 MHz/ 5<br>ns     | 0.008           | 0.004               | 0.005                      | 0.018             | 0.035                     |
| 250 MHz/ 4<br>ns     | 0.010           | 0.005               | 0.006                      | 0.023             | 0.044                     |
| 300 MHz/<br>3.33ns   | 0.012           | 0.006               | 0.007                      | 0.027             | 0.052                     |

1130

Table 3 : POWER CONSUMPTION OF 8-BIT ALU WITH CLOCK GATING. ALL POWERS ARE IN WATTS

| Frequency/<br>Time   | Clocks<br>Power | Logic<br>Pow-<br>er | Sig-<br>nals<br>Pow-<br>er | IOs<br>Pow-<br>er | Total<br>Dynamic<br>Power |
|----------------------|-----------------|---------------------|----------------------------|-------------------|---------------------------|
| 100 MHz/<br>10 ns    | 0.003           | 0.001               | 0.001                      | 0.005             | 0.010                     |
| 150 MHz/<br>6.667 ns | 0.005           | 0.002               | 0.002                      | 0.008             | 0.017                     |
| 200 MHz/ 5<br>ns     | 0.006           | 0.002               | 0.003                      | 0.011             | 0.022                     |
| 250 MHz/ 4<br>ns     | 0.008           | 0.003               | 0.004                      | 0.014             | 0.029                     |
| 300 MHz/<br>3.33ns   | 0.009           | 0.003               | 0.004                      | 0.017             | 0.033                     |

Table 4 : POWER CONSUMPTION OF 16-BIT ALU WITHOUT CLOCK GATING. ALL POWERS ARE IN WATTS

| Frequency/<br>Time   | Clocks<br>Power | Logic<br>Pow-<br>er | Sig-<br>nals<br>Pow-<br>er | IOs<br>Pow-<br>er | Total<br>Dynamic<br>Power |
|----------------------|-----------------|---------------------|----------------------------|-------------------|---------------------------|
| 100 MHz/<br>10 ns    | 0.011           | 0.008               | 0.008                      | 0.017             | 0.044                     |
| 150 MHz/<br>6.667 ns | 0.016           | 0.011               | 0.012                      | 0.025             | 0.064                     |
| 200 MHz/ 5<br>ns     | 0.021           | 0.015               | 0.015                      | 0.034             | 0.085                     |
| 250 MHz/ 4<br>ns     | 0.027           | 0.019               | 0.020                      | 0.042             | 0.108                     |
| 300 MHz/<br>3.33ns   | 0.040           | 0.022               | 0.027                      | 0.051             | 0.140                     |

Table 5 : POWER CONSUMPTION OF 16-BIT ALUWITH CLOCK GATING. ALL POWERS ARE IN WATTS

| Frequency/<br>Time   | Clocks<br>Power | Logic<br>Pow-<br>er | Sig-<br>nals<br>Pow-<br>er | IOs<br>Pow-<br>er | Total<br>Dynamic<br>Power |
|----------------------|-----------------|---------------------|----------------------------|-------------------|---------------------------|
| 100 MHz/<br>10 ns    | 0.005           | 0.002               | 0.003                      | 0.010             | 0.020                     |
| 150 MHz/<br>6.667 ns | 0.007           | 0.003               | 0.005                      | 0.016             | 0.031                     |
| 200 MHz/ 5<br>ns     | 0.009           | 0.005               | 0.006                      | 0.022             | 0.042                     |
| 250 MHz/4<br>ns      | 0.012           | 0.006               | 0.008                      | 0.028             | 0.054                     |
| 300 MHz/<br>3.33ns   | 0.014           | 0.007               | 0.010                      | 0.035             | 0.066                     |

Fig 6 to Fig 8 gives the graphical representation of total dynamic power for 8-bit, 16-bit and 32-bit ALU with and without clock gating. All the power indicated are in watts. By the inclusion of clock gating circuit to the ALU for 8-bit ALU the total dynamic power consumption is reduced by 36.53% at 300 MHz, and for 16-bit ALU there is a power reduction of 52.85 % at 300 MHz and for 32-bit ALU there is a power reduction of 64.53% at 300 MHz's.

Table 6 : POWER CONSUMPTION OF 32-BIT ALU WITHOUT CLOCK GATING. ALL POWERS ARE IN WATTS

| Frequency/<br>Time   | Clocks<br>Power | Logic<br>Pow-<br>er | Sig-<br>nals<br>Pow-<br>er | IOs<br>Pow-<br>er | Total<br>Dynamic<br>Power |
|----------------------|-----------------|---------------------|----------------------------|-------------------|---------------------------|
| 100 MHz/<br>10 ns    | 0.033           | 0.029               | 0.033                      | 0.032             | 0.127                     |
| 150 MHz/<br>6.667 ns | 0.049           | 0.044               | 0.049                      | 0.048             | 0.190                     |
| 200 MHz/ 5<br>ns     | 0.066           | 0.059               | 0.066                      | 0.064             | 0.255                     |
| 250 MHz/ 4<br>ns     | 0.095           | 0.072               | 0.096                      | 0.081             | 0.344                     |
| 300 MHz/<br>3.33ns   | 0.112           | 0.085               | 0.112                      | 0.097             | 0.406                     |



| Frequency/<br>Time   | Clocks<br>Power | Logic<br>Pow-<br>er | Sig-<br>nals<br>Pow-<br>er | IOs<br>Pow-<br>er | Total<br>Dynamic<br>Power |
|----------------------|-----------------|---------------------|----------------------------|-------------------|---------------------------|
| 100 MHz/<br>10 ns    | 0.010           | 0.006               | 0.010                      | 0.020             | 0.046                     |
| 150 MHz/<br>6.667 ns | 0.014           | 0.008               | 0.015                      | 0.032             | 0.069                     |
| 200 MHz/ 5<br>ns     | 0.019           | 0.011               | 0.020                      | 0.044             | 0.094                     |
| 250 MHz/ 4<br>ns     | 0.024           | 0.014               | 0.024                      | 0.057             | 0.119                     |
| 300 MHz/<br>3.33ns   | 0.028           | 0.017               | 0.029                      | 0.070             | 0.144                     |



Fig 6: Total dynamic power consumption of 8bit ALU

http://www.ijser.org

the 300 5 % n of USER © 2015 ALU implemented with clock gating decreases compared to ALU implemented without clock gating but the total dynamic power consumption in ALU with clock gating doesn't exceed the total dynamic power consumption of ALU without clock gating.



Fig 7: Total power consumption 16bit ALU



Fig 8: Total power consumption 32bit ALU

## **4 TESTING AND VERIFICATION**

For 8 bit ALU all the modules are individually verified for its functionality for all the possible test cases with 100% coverage. After integrating all the modules in ALU the 8 bit ALU is verified for the all the test cases with 100% coverage. For 16 bit ALU all the modules are individually verified for its functionality for all the possible test cases with 100% coverage. After integrating all the modules in ALU the 16 bit ALU is verified for random test cases with 70% coverage. For 32 bit ALU all the modules are individually verified randomly for 60% of the test cases. After integrating all the modules in ALU the 32 bit ALU is verified randomly for 40% of the test cases. For verifying all the ALU automated test bench was developed which verified the result for all test cases. 32 bit ALU was verified for 40% of test cases as the number of test cases was too large and it required vast amount of time.

# 5 CONCLUSION

In this paper energy efficient ALU's has been designed for different ALU size with the use of clock gating concept. Clock gating can be applied efficiently to reduce all the parts of dynamic power i.e. clock power, logic power, signal power and IO power. It is seen that the percentage of total dynamic power reduced increases with increase in operand size of ALU by using clock gating. Also as the operating frequency is increased the percentage of total dynamic power reduced in ALU implemented with clock gating decreases compared to ALU implemented without clock gating but the total dynamic power consumption in ALU with clock gating doesn't exceed the total dynamic power consumption of ALU without clock gating.

# 6 REFERENCES

- Bishwajeet Pandey, Jyotsana Yadav, M. Pattanaik, and Nitish Rajoria, "Clock Gating Based Energy Efficient ALU Design and Implementation on FPGA", Energy Efficient Technologies for Sustainability (ICEETS), 2013 International Conference, pp. 93 - 97, 2013.
- [2] Jagrit Kathuria, M. Ayoubkhan, and Arti Noor, "A Review of Clock Gating Techniques", MIT International Journal of Electronics and Communication Engineering, pp. 106 - 114, 2011.
- [3] J. P. Oliver, J. Curto, D. Bouvier, M. Ramos, and E. Boemo, "Clock gating and clock enable for FPGA power reduction", 8th Southern Conference on Programmable Logic (SPL), pp. 1 - 5, 2012.
- [4] Yu Zhou and Hui Guo, "Application Specific Low Power ALU Design", IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, pp. 214 - 220, 2008.
- [5] Na Gong, Jinhui Wang, and Ramalingam Sridhar, "Application-Driven Power Efficient ALU Design Methodology for Modern Microprocessors", *Quality Electronic Design (ISQED)*, 2013 14th International Symposium, pp. 184 - 188, 2013.
- [6] Ramalatha, M., Dayalan, K.D., Dharani, P., Priya, S.D., "High Speed Energy Efficient ALU Design using Vedic Multiplication Techniques", Advances in Computationals Tools for Engineering Applications (ACTEA), pp. 600 - 603, 2009.
- [7] Anvesh kumar, Ashish raman, "Low Power ALU Design by Ancient Mathematics" Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference, pp. 862 – 865, 2010.