**Research Article** 

# Highly efficient design of SDRAM-based CTMISSN 1751-858X<br/>Received on 20th March 2018<br/>Revised 3rd January 2019<br/>Accepted on 15th February 2019<br/>E-First on 30th May 2019

Chenguang Guo<sup>1,2</sup> , Jiancheng Xu<sup>1</sup>, Wenyao Xu<sup>3</sup>

<sup>1</sup>School of Electronics and Information, Northwestern Polytechnical University, No.127 West Youyi Road, Beilin District, Xi'an, People's Republic of China

<sup>2</sup>Third Design Department, Beijing Microelectronics Technology Institute, No.J07 Business Park Building, High-tech Zone, Xi'an, People's Republic of China

<sup>3</sup>Department of Computer Science & Engineering, University at Buffalo, the State University of New York, No.330 Davis Hall Buffalo, USA E-mail: nvs\_2016@126.com

**Abstract:** The amount of radar's raw echo data is usually very large. At the same time, synthetic aperture radar (SAR) imaging system needs rapid transpose efficiency to improve the real-time performance of the system. Therefore, modern real-time SAR system requires high-speed and large-capacity devices which are usually SDRAM chips to store raw echo data and to solve the corner turning problem by efficient matrix transpose method. By designing data interleaved patterns and controlling command cycles of SDRAM chips in a reasonable way, this paper presents a novel matrix transpose method which can be used to improve the efficiency of corner turning memory (CTM) for real-time SAR imaging system. After board-level verification, the efficiency of this new matrix transpose method can be greatly improved to >99%, which is greater than other typical SDRAM-based CTM design methods and is more suitable for real-time SAR imaging system.

# 1 Introduction

Raw echo data are collected into synthetic aperture radar (SAR) imaging system in the form of two-dimensional matrix usually. For real-time SAR imaging system, image processing time is very important, otherwise the target may be lost. Meanwhile, in order to obtain high image resolution, it is necessary to decouple two-dimensional matrix into range line data and azimuth line data, and transform matrix data between range line and azimuth line many times through various SAR image processing algorithms. Therefore, matrix transpose is indispensable to the range-azimuth line conversion, and transpose efficiency can determine the real-time performance of SAR imaging system [1–5].



Fig. 1 RD and CS algorithms

Matrix transpose can be thought as a mapping between two matrices:  $X_{i,j} \rightarrow X_{j,i}$ , which is usually used in multidimensional image and signal processing systems, such as SAR imaging systems [6, 7]. The transpose is very simple when matrix scale is small, and data can be read into internal memories easily. For large matrix-scale applications like radar image processing, corner turning problem is essential [8, 9], and data must be stored in external memories which are usually SDRAM chips with larger storage capacity and higher storage speed than SRAM, MRAM etc.

Take the most commonly used SAR imaging algorithms as examples, Range-Doppler (RD) and Chirp-Scaling (CS) processing algorithms require several times of range-azimuth line conversion [10, 11], as shown in Fig. 1. The places where the matrix needs to be transposed are represented by the nodes in Fig. 1. Once continuous radar echo data are received, SAR image frames need to be processed in time. Each frame means transpose at least once, which takes a lot of time to access memory. Therefore, the efficiency of corner turning memory (CTM) has a great influence on real-time SAR imaging system.

SDRAM chips are one of the most widely used memories in SAR imaging because of its large storage capacity and high speed. There are many CTM design methods based on SDRAM chips, such as two-fame method [12], three-fame method [13], sub-block matrix mapping method, and so on [14–19]. All of these methods are very effective in maintaining data input and output balance. However, none of these methods can make full use of the device characteristics of SDRAM chips. Additional command cycles generated by page switch, reads, writes, refresh, and other operations will affect the transpose efficiency, which can lead to the real-time performance of SAR imaging system declining.

According to the SDRAM chips device characteristics and the two-dimensional matrix transpose rules, novel reading and writing interleave patterns are designed here. The command cycles of SDRAM chips are fully utilised to make the utilisation of data bus close to full bandwidth and the refresh operation of SDRAM chips is avoided.

The rest of this paper is organised as follows. Section 2 introduces the traditional SDRAM-based CTM design methods. Then, a highly efficient SDRAM-based CTM design method is given in Section 3. Section 4 presents a board-level verification



doi: 10.1049/iet-cds.2018.5037

www.ietdl.org



Fig. 2 Two-fame structure CTM



Fig. 3 Three-fame structure CTM



**Fig. 4** 3D architecture of the SDRAM chip (MT48LC8M32B2)

platform and its simulation results. Finally, conclusions are drawn in Section 5.

# 2 Traditional SDRAM-based CTM design methods

After receiving a complete two-dimensional matrix line by line, the matrix can be read out in columns. Here, the two-fame method, the three-fame method, and the sub-block matrix mapping method are taken as examples to illustrate the principles of the traditional SDRAM-based CTM design methods.

#### 2.1 Two-fame method

Reference [12] shows a two-fame structure CTM, as shown in Fig. 2. A memory is divided into two equal pages. Each page's X direction length is equal to the sampling numbers ( $N_R$ ) which used in the range line processing, while each page's Y direction length is equal to half of the points number of FFT ( $N_A$ ) which used in the azimuth line processing. The input and output ways of the adjacent image frames are different. For example, if the previous frame is written by row and read by column, then the latter frame should be written by column and read by row and vice versa. The read speed is two times faster than the write speed.

As of the simultaneous reading and writing operations, the twofame structure CTM needs dual-port memories which usually have small storage capacity and high price. Large number of dual-port memories may cause large volume and large power consumption of the SAR imaging system. Therefore, the dual-port memory is not suitable for large capacity applications. Besides, since the speed of reading and the speed of writing are different, the SAR imaging system which is in two clock domains needs to increase additional logic resources such as FIFOs. Furthermore, if the SAR imaging system chooses the SDRAM chips as its external memory, in order to maintain the data availability, the dual-port memory may need to be refreshed periodically, which means the matrix transpose efficiency may be affected.

#### 2.2 Three-fame method

Reference [13] shows a three-fame structure CTM, as shown in Fig. 3. A memory is divided into three equal pages. Like the two-frame structure CTM, each page's X direction length is equal to the sampling numbers ( $N_R$ ) which used in the range line processing, while each page's Y direction length is equal to half of the points number of FFT ( $N_A$ ) which used in the azimuth line processing. The input and output are operated at the same time. When one of the three pages is written by row, another two pages are read by column. The read speed is two times faster than the write speed.

The three-fame structure CTM is also worked in two clock domains and needs to increase additional logic resources such as FIFOs. Meanwhile, if the SAR imaging system chooses the SDRAM chips as its external memory, the refresh operation may not be ignored.

#### 2.3 Sub-block matrix mapping method

The control commands of the SDRAM chips include ACTIVE, READ, WRITE, PRECHARGE, NOP, and so on [20]. According to the characteristics of the SDRAM chips, the access efficiency is high in continuous access, and the access efficiency is very low in discrete access.

For the sub-block matrix mapping method, the original matrix is divided into many sub-block matrixes. If the number of consecutive accesses increases after each ACTIVE command, the ACTIVE and PRECHARGE commands required by the whole matrix will be reduced. Thus, the impact of the control commands on the access efficiency will be reduced. The less the clock cycles occupied by the control commands, the higher the access efficiency. This is the basic principle of the SDRAM-based subblock matrix mapping method.

References [14–19] are not the same, but they all use the subblock matrix mapping method to realise matrix transpose. By reducing the page switch times and increasing the number of consecutive accesses, the ACTIVE and PRECHARGE command cycles are reduced, which means the control commands occupy only a few clock cycles. However, none of these documents are making full use of the device characteristics of the SDRAM chips, and the data bus utilisations which will be shown in Section 4 are not closed to full bandwidth.

# 3 Highly efficient SDRAM-based CTM design method

Take the SDRAM chip MT48LC8M32B2 which has similar architecture and control commands with the other SDRAM chips as an example, Fig. 4 shows its three-dimensional (3D) architecture. During the read or the write operation, there are three steps: ACTIVE-READ/WRITE-PRECHARGE. Due to the time delay of each operation, the no-operation command (NOP) is also needed [20].

In order to explain the principle of the highly efficient CTM design method, this paper selects a two-dimensional SAR echo data matrix. The real part and the imaginary part of each data in the matrix are both 32-bit wide. The size of the matrix is  $4096 \times 4096$ , and the image size of each frame is 1 Gb. By using burst length (BL) 4 mode and 2-clock CAS Latency (CL), this paper takes

SDRAM chip MT48LC8M32B2 with capacity of 256 Mb as the transposed memory. The 3D SDRAM chip has four banks, as shown in Fig. 4, and the size of each bank is  $4096 \times 512 \times 32$ . To maintain the data availability, the SDRAM chip requires 4096 refresh cycles every 64 ms [20].

Fig. 5 shows the architecture of the highly efficient CTM design here. It consists of eight pieces of MT48LC8M32B2 and one memory management unit (MMU). The MMU is mainly used for data stream switching and SDRAM address signals generating. If the left side of SDRAM chips (SDRAM1~SDRAM4) are used to process the previous frame, the right side of SDRAM chips (SDRAM5~SDRAM8) will be used to process the latter frame and vice versa. By adopting ping-pong operating strategy, the two sides have the same principle. In order to combine the real part and the imaginary part of each data conveniently, each side is divided into two groups.



Fig. 5 Architecture of the highly efficient CTM design

Take the left two groups of the SDRAM chips as an example. In order to make full use of the ACTIVE and PRECHARGE command cycles, the write operation will be alternated between the Group 1 and the Group 2, and the read operation will be circulated among four banks (BANK0~BANK3), as shown in Figs. 6a and b. Therefore, whether it is reading or writing, the valid data can be seamlessly connected, and the data access efficiency can be maximised.

The process of the refresh operation has two steps: ACTIVE-PRECHARGE, as shown in Fig. 6*c*, which has been included in the process of the read operation or the write operation. This means that after each read or write operation, a refresh operation of the selected row has also been done. Since the access of the SDRAM chips is continuous, if each read or write operation activates different row, the time required by all row traversal of the Group 1 and the Group 2 is about 0.33 ms, which is far <64 ms needed for the refresh operation. Therefore, the refresh operation can be avoided.

The original matrix data are written in row and read out by column. In the process of writing, every four data are written for a switch between the Group 1 and the Group 2, and every four rows of data which will be written in the same number of banks are written for a switch among four banks (BANK0~BANK3). For the convenience of description, every four rows of data are divided into two groups (S and S). Thus, each frame consists of 2048 groups (S0~S1023, S'0~S'1023), while the S groups are written in the Group 1, and the S° groups are written in the Group 2. As of the write operation is alternated between the Group 1 and the Group 2, the valid data can be seamlessly written, as shown in Fig. 6a.

The interleaving pattern in the Group 1 and the Group 2 is shown in Table 1, and the interleaving pattern between the Group 1 and the Group 2 is shown in Table 2. The image data are represented by d(x, y), while the abscissa x represents the range line, and the ordinate y represents the azimuth line.

In the process of reading, the MMU needs to generate the starting address only, as shown in Fig. 6b. According to the data arrangement of the writing process, the read operation is first



**Fig. 6** SDRAM access operations (CL = 2, BL = 4) (*a*) Write operation, (*b*) Read operation, (*c*) Refresh operation

# Table 1 Interior-group interleaving pattern

| Group 1:S0          |                      |                      |                      |
|---------------------|----------------------|----------------------|----------------------|
| d(0:3,0)            | <i>d</i> (0:3,1)     | d(0:3,2)             | d(0:3,3)             |
| d(0:3,8)            | <i>d</i> (0:3,9)     | <i>d</i> (0:3,10)    | <i>d</i> (0:3,11)    |
|                     |                      |                      |                      |
| <i>d</i> (0:3,4088) | <i>d</i> (0:3, 4089) | <i>d</i> (0:3, 4090) | <i>d</i> (0:3, 4091) |
|                     |                      |                      |                      |

| Group 2:S'0         |                      |                      |                      |
|---------------------|----------------------|----------------------|----------------------|
| d(0:3,4)            | d(0:3,5)             | d(0:3,6)             | d(0:3,7)             |
| <i>d</i> (0:3,12)   | <i>d</i> (0:3,13)    | <i>d</i> (0:3,14)    | d(0:3,15)            |
|                     |                      |                      |                      |
| <i>d</i> (0:3,4092) | <i>d</i> (0:3, 4093) | <i>d</i> (0:3, 4094) | <i>d</i> (0:3, 4095) |

#### Table 2 Inter-group interleaving pattern

| BANK0          |      |     |        |      | BA          | NK1 |        |      | BA   | NK2 |        |            | BA   | NK3 |        |
|----------------|------|-----|--------|------|-------------|-----|--------|------|------|-----|--------|------------|------|-----|--------|
| group 1        |      |     |        |      |             |     |        |      |      |     |        |            |      |     |        |
| S0             | S32  |     | S992   | S1   | <b>S</b> 33 |     | S993   | S2   | S34  |     | S994   | <b>S</b> 3 | S35  |     | S995   |
| S4             | S36  |     | S996   | S5   | S37         |     | S997   | S6   | S38  |     | S998   | S7         | S39  |     | S999   |
|                |      |     |        |      |             |     |        |      |      |     |        |            |      |     |        |
| S28<br>group 2 | S60  |     | S1020  | S29  | S61         |     | S1021  | S30  | S62  |     | S1022  | S31        | S63  |     | S1023  |
| S'0            | S'32 |     | S'992  | S'1  | S'33        |     | S'993  | S'2  | S'34 |     | S'994  | S'3        | S'35 |     | S'995  |
| S'4            | S'36 |     | S'996  | S'5  | S'37        |     | S'997  | S'6  | S'38 |     | S'998  | S'7        | S'39 |     | S'999  |
|                |      | ••• |        |      |             |     |        |      |      |     |        |            |      |     |        |
| S'28           | S'60 |     | S'1020 | S'29 | S'61        |     | S'1021 | S'30 | S'62 |     | S'1022 | S'31       | S'63 |     | S'1023 |



Fig. 7 Verification platform of real-time SAR imaging system

performed in the Group1. Every four data are read out for a switch among four banks (BANK0~BANK3), and every four columns of data are read out for a switch between the Group 1 and the Group 2 until all the matrix data are read out. As of the read operation is circulated among four banks, the valid data can be seamlessly read out, no bus cycle waste, as shown in Fig. 6b.

In each read or write operation, the MMU based on the interleaving patterns shown in Tables 1 and 2 can be able to activate different rows of the SDRAM chip and complete the cyclic traversal of different rows. Meanwhile, the valid data can be seamlessly connected. Therefore, the refresh operation can be avoided, and the data access efficiency can be maximised.

## 4 Board-level verification

The efficiency of the CTM matrix transpose methods can be calculated by dividing measured bandwidth by theoretical

#### Table 3 Execution time for RD algorithm

| Step                       | Execution time, ms |
|----------------------------|--------------------|
| from data input to node 1  | 172.5              |
| from node 1 to data output | 172.5              |

| Step                       | Execution time, ms |
|----------------------------|--------------------|
| from node 1 to node 2      | 172.5              |
| from node 2 to node 3      | 172.4              |
| from node 3 to data output | 172.4              |

bandwidth. In order to verify the efficiency of the proposed CTM design method, this paper takes a SAR imaging system with the RD algorithm and the CS algorithm shown in Fig. 1 as an example. The verification platform is shown in Fig. 7. Using eight pieces of MT48LC8M32B2 as the transposed memory, the imaging algorithm and the MMU module are realised by the FPGA chip XC7K325T. Meanwhile, the processing periods of 4K-point FFT/ IFFT is 4206 clock cycles. The operating frequency is 100 MHz. Since each data in the matrix is 64-bit wide, the theoretical bandwidth of this system is 800 MB/s.

Tables 3 and 4 show the execution time for the RD algorithm and the CS algorithm in Fig. 1. As shown in Fig. 1, modules such as PCMF, AMF, PF, and RMF are executed in parallel with FFT and IFFT without additional image processing time [17]. Since the execution time of 4K-point FFT/IFFT is nearly 42.1  $\mu$ s, the total execution time for the RD algorithm and the CS algorithm are 345 and 517.3 ms, respectively.

Table 5 shows the comparison of the proposed SDRAM-based CTM design method with several typical SDRAM-based CTM design methods. The asterisk symbol (\*) represents that the item does not appear in the original paper.

Compared with the typical SDRAM-based CTM design methods, the proposed SDRAM-based CTM design method has the highest transpose efficiency. As of the need to process the continuous image frames, there will be a certain redundant control

| Table 5 Compari | son of the CTN | design method |
|-----------------|----------------|---------------|
|-----------------|----------------|---------------|

| Transpose method                 | Theoretical<br>bandwidth,<br>MB/s | Measured<br>bandwidth,<br>MB/s | Efficiency,<br>% |
|----------------------------------|-----------------------------------|--------------------------------|------------------|
| two-fame [1]                     | 1064                              | 524.8                          | 49.3             |
| three-fame [2]                   | 528                               | 340                            | 64               |
| sub-block matrix<br>mapping [3]  | 250                               | 110                            | 44               |
| sub-block matrix<br>mapping [4]  | 6250                              | 3931.3                         | 62.9             |
| sub-block matrix<br>mapping [5]  | 1280                              | 960                            | 75               |
| sub-block matrix<br>mapping [10] | *                                 | *                              | 76               |
| sub-block matrix<br>mapping [11] | 1280                              | 960                            | 78               |
| sub-block matrix<br>mapping [12] | 8000                              | 6936                           | 87.6             |
| the proposed method              | 800                               | 799                            | 99.9             |

cycles when the frame data is switched, and the transpose efficiency is <100%. After board-level verification, the transpose efficiency can be >99%, and the processing time of each frame are about 345 ms with the RD algorithm and 517.3 ms with the CS algorithm, which can meet the real-time requirement of the SAR imaging system.

### 5 Conclusions

This paper presents a novel highly efficient CTM design system based on the device characteristics of SDRAM chips and the twodimensional matrix transpose rules. The SDRAM access methods and the interleaving patterns have been given here. Besides, the transpose efficiency can be >99%, and no redundant logic resources such as FIFOs are needed due to one clock domain working condition. Furthermore, since the transpose efficiency is close to the limit, the proposed CTM design method will be very popular in the real-time SAR imaging domain and other multidimensional image and signal processing domains.

#### References 6

[1] Jia, G.W., Buchroithner, M.F., Chang, W.G., et al.: 'Fourier-based 2-D imaging algorithm for circular synthetic aperture radar: analysis and application', IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 2016, 9, (1), pp. 475–489

- Zhang, L., Wang, G.Y., Qiao, Z.J., et al.: 'Two-stage focusing algorithm for highly squinted synthetic aperture radar imaging', *IEEE Trans. Geosci.* [2] Remote Sens., 2017, 55, (10), pp. 5547-5562
- Bacci, A., Staglianò, D., Giusti, E., et al.: 'Compressive sensing for [3] interferometric inverse synthetic aperture radar applications', IET Radar Sonar Navig., 2016, 10, (8), pp. 1446-1457
- Lee, Y.C., Koo, V.C., Chan, Y.K.: 'Design and development of FPGA-based [4] FFT co-processor for synthetic aperture radar (SAR)'. 2017 Progress in Electromagnetics Research Symp. – Fall (PIERS – FALL), Singapore, 2017, pp. 1760-1766
- Chen, S., Yuan, Y., Zhang, S.N., et al.: 'A new imaging algorithm for forward-looking missile-borne bistatic SAR', *IEEE J. Sel. Top. Appl. Earth* [5]
- Obs. Remote Sens., 2016, 9, (4), pp. 1543–1552 Chen, B.X., Yang, M.L., Wang, Y., et al.: 'The applications and future of [6] synthetic impulse and aperture radar'. 2016 CIE Int. Conf. on Radar (RADAR), Guangzhou, 2016, pp. 1–5 (Cumming, I.G., Wong, F.H.: 'Digital processing of synthetic aperture radar
- [7] data algorithms and implementation' (Publishing House of Electronics Industry Press, Boston, London, 2012)
- Lutomirski, A., Tegmark, M., Sanchez, N.J., et al.: 'Solving the corner-[8] turning problem for large interferometers', Mon. Not. R. Astron. Soc., 2011, 410, (3), pp. 2075-2080
- [9] Liu, Y., Xie, Y.Z., Huang, X.B.: 'Implementation of parallel interface and matrix transpose for SAR imaging based on Virtex6 FPGA'. IET Int. Radar Conf. 2013, Xi'an, 2013, pp. 1-4
- [10] Bao, Z., Xing, M.D., Wang, T.: 'Radar imaging technology' (Publishing
- House of Electronics Industry Press, Beijing, China, 2014) Ren, G., Han, J.Z., Han, C.D.: 'CTM on multiprocessor: solution for bottleneck of SAR'. 2000 5th Int. Conf. on Signal Processing Proc., 16th [11] World Computer Congress 2000, Beijing, 2000, pp. 1915-1920
- Lu, S.X., Han, S., Wang, Y.F.: 'The structure and implementation of the two-frame corner turn memory (CTM) in real time imaging of SAR', *J. Electron.* [12] Inf. Technol., 2005, 27, (8), pp. 1226–1228
- Xie, Y.K., Zhang, T., Han, C.D.: 'Design and implementation of matrix transposition unit for real-time SAR image systems', J. Comput. Res. Dev., [13] 2003, 40, (1), pp. 6-11
- Bao, S.G., Zhou, H.B.: 'Study and implementation of high efficient matrix [14] transpose for SAR real-time imaging system', Mod. Rad., 2013, 35, (3), pp. 24 - 27
- [15] Lin, T., Xie, Y.Z., Liu, W.: 'Implementation of matrix in-place transpose for
- real-time SAR imaging system', *Comput. Eng.*, 2013, **39**, (6), pp. 319–321 Bian, M.M., Bi, F.K., Wang, J.H.: 'Research and implementation of matrix transpose for real-time SAR imaging system', *Comput. Eng. Applic.*, 2011, [16] **47**, (22), pp. 117–119 Liu, X.N., Chen, H., Xie, Y.Z.: 'Research and implementation of CTM for
- [17] real-time SAR imaging processing'. IET Int. Radar Conf., Xi'an, 2013, pp. 1-
- [18] Bian, M.M., Bi, F.K., Liu, F.: 'Matrix transpose methods for SAR imaging system'. IEEE 10th Int. Conf. on Signal Processing Proc., Beijing, 2010, pp. 2176-2179
- Lei, K., Wei, G.: 'Design and implementation of matrix transpose based on [19] SMP system'. IET Int. Radar Conf. 2015, Hangzhou, 2015, pp. 1-5
- [20] 'MT48LC8M32B2'. Available at http://pdf1.alldatasheet.net/datasheet-pdf/ view/116939/MICRON/MT48LC8M32B2.html, accessed October 2004