A FPGA Implementation of High Security Hybrid Reconfigurable Cryptographic Processor with RSA and SEA

A. Chitra, K. Sangeethalakshmi, P. Sivalakshmi, R. Shimila

Abstract - Data security is in Demand in everyday life of Digital World, since Digital data’s can be reproduced much easily. To achieve the maximum security required a Parallel Processing. User Reconfigurable Cryptographic RISC Microprocessor is proposed in our paper. Rather than protecting the data using tools and external codes, a microprocessor is specially designed in our project to offer maximum digital security. Cryptographic processor can be classified either as asymmetric cryptography or a symmetric cryptography processor. Asymmetric cryptography has the advantage of Reception security but has the limitation of High resource Utilization. And a symmetric cryptography processor has the limitation of single key security but comparatively has the advantages of low area, resource and power consumption. Thus in this project we are proposing Hybrid architecture in which both the advantage of asymmetric and symmetric cryptographys are combined. For implementation, Asymmetric RSA cryptography and a symmetric lightweight SEA encryption is combined to mutate a reconfigurable Cryptographic processor.

Index words - Data security, Reduced instruction set Computer (RISC), Reconfigurable Architecture, Cryptographic Processor, Scalable Encryption Algorithm (SEA), Rivest-Shamir-Adelman (RSA) cryptosystem.

1 INTRODUCTION

Today more and more sensitive data is stored digitally. Bank accounts, medical records and personal emails are some categories that data must keep secure; this has made cryptography an important research topic. Cryptography is used for confidentiality, authentication, data integrity, and non-repudiation, which can be divided into two families: secret-key cryptography [1]-[3] and public-key cryptography [4]-[6]. Secret-key cryptography, which usually has a relatively compact architecture and smaller key size than public-key cryptography, is often used to encrypt/decrypt sensitive information or documents. Public-key cryptography offers fundamental technology for key agreement, encryption/decryption (two keys), and digital signatures. The use of systems with increasing complexity, which usually are more secure, has a result low throughput rate and more energy consumption. However the evolution of cipher has no practical impact, if it has only theoretical background. Every encryption algorithm should exploit as much as possible the conditions of the specific system without omitting the physical, area and timing limitations. This fact requires new ways in design architectures for secure and reliable crypto systems. After the advent of the internet and especially nowadays, security of data and protection of privacy have become a major concern for everyone’s life, although mathematicians and researchers have been trying to address this problem since the end of the World war II, when cryptology science came out from the ambit of the army to enter the Bell laboratories. DES, RSA and PGP are only the tip of the iceberg of a vast amount of cryptographic algorithms developed by those scientists. The software implementations are 10-100 slower than the hardware.

The prevalence of microprocessors in all aspects of everyday life has led to information being easily and widely distributable in digital format. However, this information can be confidential or copyrighted material, or only intended for access by selected individuals. Therefore it is necessary that microprocessors are available which are capable of efficiently encrypting this information, without detriment to performance. RISC microprocessors are used abundantly in numerous communication systems, such as smart cards, mobile handsets, set-top boxes (STBs) and Personal Digital Assistants (PDAs). Implementing public-key cryptosystems on a general-purpose processor (GPP) is flexible because a variety of cryptosystems can be used at runtime. A drawback of GPP realizations that it generally results in a lower throughput rate and larger power consumption. Considerable effort has been directed toward a fast realization of cryptography algorithms consisting of very large integer operands (up to 4096 bits). For real-time applications, a dedicated hardware implementation is required to speed up the computation of cryptosystems. In particular, an application-specific integrated circuit (ASIC) solution generally leads to a higher throughput rate at a lower cost, but it is inflexible; therefore, it is only applied to a limited subset of cryptosystems. To mitigate the gap between GPP and ASIC realization, application-specific cryptographic processors have been proposed [7]-[9], each with its own instruction-set architecture, data path design, and target applications.


2 PROPOSED CRYPTOGRAPHIC
PROCESSOR

Protecting the digital data through encryption using tools and external codes are highly cost effective and also results in performance degradation. To achieve much efficiency in encryption a reconfigurable cryptographic microprocessor is designed in this project to offer maximum digital security. With the conventional design of SEA [10] and RSA [6] standards as supporting co-processors, a logic module is also been implemented in the design to ensure the robustness of this processor design to serve all kind of encryption and decryption needs. A typical CPU unit with RAM, ALU, PC, Register bank and Buses are included as prioritized units for utilizing the Cryptographic co-processors, which consists of Parallel Processing Unit, Bit permutation unit, sequencing cache and Byte permutation units. A sophisticated instruction sets will be derived to issue control signals to the main processor to initiate and control cryptographic operations. The performance evaluation of this processing design will be analyzed through a programmable FPGA kit.

A cryptographic processor can be classified either as asymmetric cryptography [8] or a symmetric cryptography processor [16]. Thus Asymmetric cryptography has the advantage of Reception security but has the limitation of High resource Utilization. And a symmetric cryptography processor has the limitation of single key security but comparatively has the advantages of low area, resource and power consumption.

3 ALGORITHM AND FORMULATIONS

In this section a brief introduction about RSA and SEA algorithms are provided.

3.1 RSA ALGORITHM

In 1977 Rivest and Adelman [] introduced what has become one of the most successful and widely used public key cryptosystem based on computing modular exponentiations. The algorithm has been implemented in many commercial applications It is based on a very simple number-theoretical idea, and yet it has been able to resist all cryptanalytic attacks. The idea is a clever use of the fact that, while it is easy to multiply two large primes, it is extremely difficult to factorize their product. Thus, the product can be publicized and used as the encryption key. The primes themselves cannot be recovered from the product and are used for decryption. As it was proposed [8].

3.1.1 STEPS IN RSA ALGORITHM

- Generate two large random primes, p and q. Compute n = pq and (φ) phi = (p-1)(q-1). n is known as the modulus
- Choose an integer e, 1 < e < phi , such that gcd (e, phi), e is the encryption exponent.
- Compute the decryption exponent d, 1 < d < phi , such that d = e-1 Mod phi.
- The public key is (n, e) and the private key is (n, d).
- Keep all the values d, p, q and phi secret.
- Send the public key to transmitter and secret key to receiver.
- Transmitter encrypt the original message, c = me Mod n, and send the cipher text to receiver.
- Receiver decrypt cipher text by, m = cd Mod n, and retrieve the original message.

3.1.2 MONTGOMERY MULTIPLICATION ALGORITHM

The two operations the RSA encryption and decryption are mutually inverse. This indicates that the original data can be recovered through the RSA encrypt/decrypt process. The operation of modular exponentiation is carried out iteratively by repeating a modular multiplication and modular squaring operation. The modular multiplication and squaring algorithm are based on Montgomery algorithm. The Montgomery multiplication [14]-[15] requires two multipliers first one is used for modular squaring and second using for modular multiplication. Registers are used to store the result from the parallel and serial way. The parallel register associated with the squarer is used to feed the parallel operand to the squarer while the serial register is used to feed the serial input to the squarer and multiplier. The parallel register associated with the multiplier is used to store the result from the multiplier, which is used as a parallel input to the multiplier.
The Montgomery multiplication is carried out as follows,

\[ C = ME \mod N; \]
\[ P_0 = 2^{n+2} \mod N; \]
\[ C_0 = 2^{n+2} \mod M, C = P_n \]
For \( i = 0 \) to \( n - 1 \);
\[ C_{i+1} = C_i^2 \mod N; \]
if \( e_i = 1 \) then \( P_{i+1} = P_i C_i \mod N, \) else \( P_{i+1} = P_i \).

Let message \( M = 10 \), Public key exponent \( e = 5 \), \( n = 15 \). Without
Montgomery multiplication the equation is calculated as, \( M^e \mod n \Rightarrow 10^5 \mod 15 \Rightarrow (10 \cdot 10 \cdot 10 \cdot 10 \cdot 10) \mod 15 \), which is very difficult to calculate and large buffer
size is needed to calculate the modular exponentiation. But
with Montgomery multiplication algorithm, using squaring
and repeated multiplication reduces the memory size. \( 10^2 \mod 15 = I_1; (I_1 \cdot 10) \mod 15 = I_2; (I_2 \cdot 10) \mod 15 = I_3 \) \( (I_3 \cdot 10) \mod 15 = \) cipher text. Thus the memory required to store the
intermediate results will not exceed more than \( N \), Which will
reduce the overall chip size by reducing the length of the
register size. But it will increase the number of operation to
get the cipher as well as decrypted text.

### 3.2. SEO ALGORITHM

Scalable encryption algorithm (SEA) is a parametric block
cipher for resource-constrained systems (example sensor
networks RFID). It was initially designed as a low-cost
encryption/authentication routine (i.e., with small code size
and memory) targeted for processors with a limited
instruction set (i.e., AND, OR, XOR gates, word rotation,
and modular addition). This algorithm takes the plaintext,
key, and the bus sizes as parameters and, therefore, can be
straightforwardly adapted to various implementation
contexts and/or security requirements.

#### 3.2.1 PARAMETERS AND DEFINITIONS

\( SEAn, b \) operates on various text, key and word
sizes. It is based on a Feistel structure with a variable number
of rounds, and is defined with respect to the following
parameters:

- \( n \): plaintext size, key size.
- \( b \): processor (or word) size.
- \( nb = 2n \): number of words per Feistel branch.
- \( nr \): number of block cipher rounds.

Let \( x \) be an \( n/2 \) bit. In the following we will consider two
representations,

- Bit representation: \( x = x(n - 1) x(n - 2) \ldots x(2) x(1) x(0) \).
- Word representation: \( xW = x_{nb-1} x_{nb-2} \ldots x_2 x_1 x_0 \).

#### 3.2.2 BASIC OPERATIONS

Due to simplicity constraints, SEA is based on a limited
number of elementary operations denoted as follows:

**3.2.2.1 BITWISE XOR \( \oplus \):**

The bitwise XOR is defined on \( n/2 \) vectors,
\( \oplus: Z_{n/2}^2 \times Z_{n/2}^2 \rightarrow Z_{n/2}^2 \)
\( x, y \rightarrow z \leftrightarrow x \oplus y, 0 \leq i \leq n-1 \).

**3.2.2.2 SUBSTITUTION BOX \( S \):**

For efficiency purposes, it is applied bitwise to any set of
three words of data using the following recursive definition:
\( S: Z_{nb2b} \rightarrow Z_{nb2b} \)
\( x \rightarrow x = S(x) \leftrightarrow x_{3i+1} = (x_{3i+2} \land x_{3i}) \oplus x_{3i+1}, \)
\( x_{3i+2} = (x_{3i+2} \lor x_{3i}) \oplus x_{3i+2}, \)
\( 0 \leq i \leq nb/3 - 1 \).

**3.2.2.3 WORD ROTATION \( R \):**

The word rotation is defined on \( nb \)-word vectors:
\( R: Z_{nb2b} \rightarrow Z_{nb2b} \)
\( x \rightarrow y = R(x) \leftrightarrow y_{i+1} = y_i, 0 \leq i \leq nb - 2, \)
\( y_0 = x_{nb-1} \).

**3.2.2.4 BIT ROTATION \( R \):**

The bit rotation is defined on \( nb \)-word vectors:
\( R: Z_{nb2b} \rightarrow Z_{nb2b} \)
\( x \rightarrow y = R(x) \leftrightarrow y_{3i+2} = (y_{3i+2} \land x_{3i}) \oplus x_{3i+2}, \)
\( 0 \leq i \leq nb/3, \) where
\( x_{3i+2} = x_{3i+2} \land x_{3i}, x_{3i+1} = x_{3i+1} \land x_{3i}, \)
\( x_{3i} = x_{3i} \lor x_{3i}, \)
\( 0 \leq i \leq nb/3 - 1 \).

**3.2.2.5 ADDITION MOD2B \( \Box \):**

The mod 2b addition is defined on \( nb \)-word vectors:
\( \Box: Z_{nb2b} \times Z_{nb2b} \rightarrow Z_{nb2b} \)
\( x \rightarrow y = x \Box y \leftrightarrow z = x \Box y, 0 \leq i \leq nb - 1 \).
3.2.2.6 The Round and Key Round: Based on the previous definitions, the encrypt round FE, decrypt round FD and key round FK are pictured in Figure 3 and defined as the functions:

\[ \text{FE}(L_i, R_i, K_i) = (L_{i+1}, R_{i+1}) \]

\[ \text{FD}(L_i, R_i, K_i) = (L_i, R_{i-1}) \]

\[ \text{FK}(K_{L_i}, K_{R_i}, C_i) = (K_{L_{i+1}}, K_{R_{i+1}}) \]

4 HARDWARE DESCRIPTION

Fig 4: Reconfigurable Cryptographic Processor architecture

Fig-4 depicts the block diagram of the proposed reconfigurable cryptographic processor (RCP). The core of our RCP includes 1024 reconfigurable data path, which is used to carry out the required cryptographic operations. That is the functionality of the data path can be reconfigured for performing both symmetric SEA and asymmetric RSA.

4.1 RCP Architecture

As shown in Fig 4, the RCP consists of four main blocks controller unit, registers, I/O interface, and reconfigurable cryptographic unit. The choice of 32-bit system bus is to be compatible with most popular buses. The operand size and internal bus used in the RCP allows users to handle up to 1024-bit operations for cryptography applications. Note that buffers are required to translate data to/from the processor. The controller is responsible for controlling data path and for implementing cryptography algorithms. There are two units in the controller Random logic selection unit and Length and shut down controller. The random selection unit is to select the required cryptographic operation and the shut down controller is used to shut down the unused data path to save power. The registers are designed to store the initial values of algorithmic variables and intermediate results. The reconfigurable data path consists of six local registers (P, Q, N, Z, Pub, Pri), a parallel register, full adder and 1024 reconfigurable cells. The RCs can reconfigure to perform specified cryptographic operations and to set their interconnection with registers P, Q, N, Z, Pub, and Pri. The I/O interface is used for two purposes.
The first one is to transfer data between the system bus and internal bus. The second one aligns data I/O data depending on the predetermined precision, i.e., operand size. Fig 5 shows the I/O structure, which consists of 32-32 buffers and an I/O bus controller for bus selection. Moving 1024 bits of data to/from the RCP thus requires 32 cycles.

5. RESULTS AND PERFORMANCE

The proposed work is implemented in the Quartus II version 10.0 and family cyclone II. The following tables table I, II and III show the synthesis details of RSA.

<table>
<thead>
<tr>
<th>TABLE 1</th>
<th>RSA Mapping summary</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total Logic Elements</td>
<td>56</td>
</tr>
<tr>
<td>Dedicated logic Registers</td>
<td>38</td>
</tr>
<tr>
<td>Combinational functions</td>
<td>56</td>
</tr>
<tr>
<td>Total Pins</td>
<td>17</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE 2</th>
<th>RSA Power Analysis summary</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total thermal Power dissipation</td>
<td>70.41 mW</td>
</tr>
<tr>
<td>Core Static power Dissipation</td>
<td>47.36 mW</td>
</tr>
<tr>
<td>I/O thermal power dissipation</td>
<td>23.06 mW</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE 3</th>
<th>RSA Timing Analysis</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency</td>
<td>177 MHZ</td>
</tr>
<tr>
<td>Worst case tco</td>
<td>8.410 ns</td>
</tr>
<tr>
<td>Clock set up</td>
<td>5.649 ns</td>
</tr>
</tbody>
</table>

The following tables show the synthesis summary of SEA algorithm implemented in Quartus II version 10.0 and family Stratix II.

<table>
<thead>
<tr>
<th>TABLE 4</th>
<th>SEA Mapping Summary</th>
</tr>
</thead>
<tbody>
<tr>
<td>Combinational ALUTs</td>
<td>368</td>
</tr>
<tr>
<td>Total pins</td>
<td>196</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE 5</th>
<th>SEA Power Analysis Summary</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total thermal Power dissipation</td>
<td>330.84 mW</td>
</tr>
<tr>
<td>Core Static power Dissipation</td>
<td>303.05 mW</td>
</tr>
<tr>
<td>I/O thermal power dissipation</td>
<td>27.80 mW</td>
</tr>
</tbody>
</table>

6 CONCLUSIONS

A Hybrid Reconfigurable cryptographic processor that supports both the RSA and SEA cryptographic operations using reconfigurable data path was presented. The reconfigurable data path merged the RSA and SEA to obtain double security to meet the high security demands. Experimental results show that the developed cryptographic processor exhibits obvious speed and performance advantages in comparison with related works and offers more security.

REFERENCE


