From the Book: Aims This book introduces the concepts and methodologies employed in designing a system-on-chip (SoC) based around a microprocessor core and in designing the microprocessor core itself. The principles of microprocessor design are made concrete by extensive illustrations based upon the ARM. The aim of the book is to assist the reader in understanding how SoCs and microprocessors are designed and used, and why a modern processor is designed the way that it is. The reader who wishes to know only the general principles should find that the ARM illustrations add substance to issues which can otherwise appear somewhat ethereal; the reader who wishes to understand the design of the ARM should find that the general principles illuminate the rationale for the ARM being as it is. Other microprocessor architectures are not described in this book. The reader who wishes to make a comparative study of architectures will find the required information on the ARM here but must look elsewhere for information on other designs. Audience The book is intended to be of use to two distinct groups of readers: Professional hardware and software engineers who are tasked with designing an SoC product which incorporates an ARM processor, or who are evaluating the ARM for a product, should find the book helpful in their duties. Although there is considerable overlap with ARM technical publications, this book provides a broader context with more background. It is not a substitute for the manufacturer’s data, since much detail has had to be omitted, but it should be useful as an introductory overview and adjunct to that data. Students ofcomputer science, computer engineering and electrical engineering should find the material of value at several stages in their courses. Some chapters are closely based on course material previously used in undergraduate teaching; some other material is drawn from a postgraduate course. Prerequisite knowledge This book is not intended to be an introductory text on computer architecture or computer logic design. Readers are assumed to have a level of familiarity with these subjects equivalent to that of a second year undergraduate student in computer science or computer engineering. Some first year material is presented, but this is more by way of a refresher than as a first introduction to this material. No prior familiarity with the ARM processor is assumed. The ARM On 26 April 1985, the first ARM prototypes arrived at Acorn Computers Limited in Cambridge, England, having been fabricated by VLSI Technology, Inc., in San Jose, California. A few hours later they were running code, and a bottle of Mot & Chandon was opened in celebration. For the remainder of the 1980s the ARM was quietly developed to underpin Acorn’s desktop products which form the basis of educational computing in the UK; over the1990s, in the care of ARM Limited, the ARM has sprung onto the world stage and has established a market-leading position in high-performance low-power and low-cost embedded applications. This prominent market position has increased ARM’s resources and accelerated the rate at which new ARM-based developments appear. The highlights of the last decade of ARM development include: the introduction of the novel compressed instruction format called ‘Thumb’ which reduces cost and power dissipation in small systems; significant steps upwards in performance with the ARM9, ARM10 and ‘StrongARM’ processor families; a state-of-the-art software development and debugging environment; a very wide range of embedded applications based around ARM processor cores. Most of the principles of modern SoC and processor design are illustrated somewhere in the ARM family, and ARM has led the way in the introduction of some concepts (such as dynamically decompressing the instruction stream). The inherent simplicity of the basic 3-stage pipeline ARM core makes it a good pedagogical introductory example to real processor design, whereas the debugging of a system based around an ARM core deeply embedded into a complex system chip represents the cutting-edge of technological development today. Book structure Chapter 1 starts with a refresher on first year undergraduate processor design material. It illustrates the principle of abstraction in hardware design by reviewing the roles of logic and gate-level representations. It then introduces the important concept of the Reduced Instruction Set Computer (RISC) as background for what follows, and closes with some comments on design for low power. Chapter 2 describes the ARM processor architecture in terms of the concepts introduced in the previous chapter, and Chapter 3 is a gentle introduction to user-level assembly language programming and could be used in first year undergraduate teaching for this purpose. Chapter 4 describes the organization and implementation of the 3- and 5-stage pipeline ARM processor cores at a level suitable for second year undergraduate teaching, and covers some implementation issues. Chapters 5 and 6 go into the ARM instruction set architecture in increasing depth. Chapter 5 goes back over the instruction set in more detail than was presented in Chapter 3, including the binary representation of each instruction, and it penetrates more deeply into the corners of the instruction set. It is probably best read once and then used for reference. Chapter 6 backs off a bit to consider what a high-level language (in this case, C) really needs and how those needs are met by the ARM instruction set. This chapter is based on second year undergraduate material. Chapter 7 introduces the ‘Thumb’ instruction set which is an ARM innovation to address the code density and power requirements of small embedded systems. It is of peripheral interest to a generic study of computer science, but adds an interesting lateral perspective to a postgraduate course. Chapter 8 raises the issues involved in debugging systems which use embedded processor cores and in the production testing of board-level systems. These issues are background to Chapter 9 which introduces a number of different ARM integer cores, broadening the theme introduced in Chapter 4 to include cores with ‘Thumb’, debug hardware, and more sophisticated pipeline operation. Chapter 10 introduces the concept of memory hierarchy, discussing the principles of memory management and caches. Chapter 11 reviews the requirements of a modern operating system at a second year undergraduate level and describes the approach adopted by the ARM to address these requirements. Chapter 12 introduces the integrated ARM CPU cores (including StrongARM) that incorporate full support for memory management. Chapter 13 covers the issues of designing SoCs with embedded processor cores. Here, the ARM is at the leading edge of technology. Several examples are presented of production embedded system chips to show the solutions that have been developed to the many problems inherent in committing a complex application-specific system to silicon. Chapter 14 moves away from mainstream ARM developments to describe the asynchronous ARM-compatible processors and systems developed at the University of Manchester, England, during the 1990s. After a decade of research the AMULET technology is, at the time of writing, about to take its first step into the commercial domain. Chapter 14 concludes with a description of the DRACO SoC design, the first commercial application of a 32-bit asynchronous microprocessor. A short appendix presents the fundamentals of computer logic design and the terminology which is used in Chapter 1. A glossary of the terms used in the book and a bibliography for further reading are appended at the end of the book, followed by a detailed index. Course relevance The chapters are at an appropriate level for use on undergraduate courses as follows: Chapter 1 (basic processor design); Chapter 3 (assembly language programming); Chapter 5 (instruction binaries and reference for assembly language programming). Chapter 4 (simple pipeline processor design); Chapter 6 (architectural support for high-level languages); Chapters 10 and 11 (memory hierarchy and architectural support for operating systems). Chapter 8 (embedded system debug and test); Chapter 9 (advanced pipelined processor design); Chapter 12 (advanced CPUs); Chapter 13 (example embedded systems). A postgraduate course could follow a theme across several chapters, such as processor design (Chapters 1, 2, 4, 9, 10 and 12), instruction set design (Chapters 2, 3, 5, 6, 7 and 11) or embedded systems (Chapters 2, 4, 5, 8, 9 and 13). Chapter 14 contains material relevant to a third year undergraduate or advanced postgraduate course on asynchronous design, but a great deal of additional background material (not presented in this book) is also necessary. Support material Many of the figures and tables will be made freely available over the Internet for non-commercial use. The only constraint on such use is that this book should be a recommended text for any course which makes use of such material. Information about this and other support material may be found on the World Wide Web at: http: Any enquiries relating to commercial use must be referred to the publishers. The assertion of the copyright for this book outlined on page iv remains unaffected. Feedback The author welcomes feedback on the style and content of this book, and details of any errors that are found. Please email any such information to: Acknowledgements Many people have contributed to the success of the ARM over the past decade. As a policy decision I have not named in the text the individuals with principal responsibilities for the developments described therein since the lists would be long and attempts to abridge them invidious. History has a habit of focusing credit on one or two high-profile individuals, often at the expense of those who keep their heads down to get the job done on time. However, it is not possible to write a book on the ARM without mentioning Sophie Wilson whose original instruction set architecture survives, extended but otherwise largely unscathed, to this day. I would also like to acknowledge the support received from ARM Limited in giving access to their staff and design documentation, and I am grateful for the help I have received from ARM’s semiconductor partners, particularly VLSI Technology, Inc., which is now wholly owned by Philips Semiconductors. The book has been considerably enhanced by helpful comments from reviewers of draft versions. I am grateful for the sympathetic reception the drafts received and the direct suggestions for improvement that were returned. The publishers, Addison Wesley Longman Limited, have been very helpful in guiding my responses to these suggestions and in other aspects of authorship. Lastly I would like to thank my wife, Valerie, and my daughters, Alison and Catherine, who allowed me time off from family duties to write this book.
Abdel-Hafeez S (2023). Programmable Feedback Shift Register, Circuits, Systems, and Signal Processing , 42 :8 , (4784-4808), Online publication date: 1-Aug-2023 .
Yu J, Xu Z, Zeng S, Yu C, Qiu J, Shen C, Xu Y, Dai G, Wang Y and Yang H INCA Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference, (1-6)
Lim R and Thiele L Testbed Assisted Control Flow Tracing for Wireless Embedded Systems Proceedings of the 2017 International Conference on Embedded Wireless Systems and Networks, (180-191)
Melo M, Smaniotto G, Maich H, Agostini L, Zatt B, Rosa L and Porto M A parallel motion estimation solution for heterogeneous system on chip Proceedings of the 29th Symposium on Integrated Circuits and Systems Design: Chip on the Mountains, (1-6)
Jie J (2016). Industrial Case Study of Transition from V-Model into Agile SCRUM in Embedded Software Testing Industries, ACM SIGSOFT Software Engineering Notes , 41 :2 , (1-3), Online publication date: 11-May-2016 .
da Costa C, Brasil C, Silva L, Murliky L, Fragoso J, Debom G, Matias R, Fracalossi A and Krug M NKE Proceedings of the 31st Annual ACM Symposium on Applied Computing, (1900-1902)
Mahalingam P and Asokan S A framework for optimizing GCC for ARM architecture Proceedings of the International Conference on Advances in Computing, Communications and Informatics, (337-342)
Mokhov A, Sokolov D, Rykunov M and Yakovlev A Formal modelling and transformations of processor instruction sets Proceedings of the Ninth ACM/IEEE International Conference on Formal Methods and Models for Codesign, (51-60)
Junior J and Hexsel R (2019). A minimalist cache coherent MPSoC designed for FPGAs, International Journal of High Performance Systems Architecture , 3 :2/3 , (67-76), Online publication date: 1-May-2011 .
Lin C and Chen C Cache sensitive code arrangement for virtual machine Transactions on high-performance embedded architectures and compilers III, (24-42)
Lin C and Chen C Cache Sensitive Code Arrangement for Virtual Machine Proceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 6590, (24-42)
Bonny T and Henkel J (2010). Huffman-based code compression techniques for embedded processors, ACM Transactions on Design Automation of Electronic Systems , 15 :4 , (1-37), Online publication date: 1-Sep-2010 .
Lin T, Hsiao P, Lin C, Kuo S, Lin C, Kuo Y, Liu C and Chu Y Collaborative voltage scaling with online STA and variable-latency datapath Proceedings of the 20th symposium on Great lakes symposium on VLSI, (347-352)
Großschädl J, Oswald E, Page D and Tunstall M Side-channel analysis of cryptographic software via early-terminating multiplications Proceedings of the 12th international conference on Information security and cryptology, (176-192)
Whitham J and Audsley N Implementing time-predictable load and store operations Proceedings of the seventh ACM international conference on Embedded software, (265-274)
Gray I and Audsley N Exposing non-standard architectures to embedded software using compile-time virtualisation Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, (147-156)
Zhou X and Petrov P (2009). Direct address translation for virtual memory in energy-efficient embedded systems, ACM Transactions on Embedded Computing Systems , 8 :1 , (1-31), Online publication date: 1-Dec-2008 .
Bungo J (2008). The use of compiler optimizations for embedded systems software, XRDS: Crossroads, The ACM Magazine for Students , 15 :1 , (8-15), Online publication date: 1-Sep-2008 .
Bonny T and Henkel J Instruction re-encoding facilitating dense embedded code Proceedings of the conference on Design, automation and test in Europe, (770-775)
Lin C and Chen C Code arrangement of embedded java virtual machine for NAND flash memory Proceedings of the 3rd international conference on High performance embedded architectures and compilers, (369-383)
Zhou X, Yu C, Dash A and Petrov P (2008). Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors, ACM Transactions on Design Automation of Electronic Systems (TODAES) , 13 :1 , (1-25), Online publication date: 1-Jan-2008 .
Murali S, Mutapcic A, Atienza D, Gupta R, Boyd S and De Micheli G Temperature-aware processor frequency assignment for MPSoCs using convex optimization Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, (111-116)
Park S and Shin H Performance evaluation of memory management configurations in Linux for an OS-level design space exploration Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation, (24-33)
Petrov P and Orailoglu A (2007). Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory, International Journal of Parallel Programming , 35 :2 , (157-177), Online publication date: 1-Apr-2007 .
Kwak J and Jhon C Recovery logics for speculative update global and local branch history Proceedings of the 21st international conference on Computer and Information Sciences, (258-266)
Blume H, Becker D, Botteck M, Brakensiek J and Noll T Hybrid functional and instruction level power modeling for embedded processors Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation, (216-226)
Kreahling W, Hines S, Whalley D and Tyson G (2006). Reducing the cost of conditional transfers of control by using comparison specifications, ACM SIGPLAN Notices , 41 :7 , (64-71), Online publication date: 12-Jul-2006 .
Kreahling W, Hines S, Whalley D and Tyson G Reducing the cost of conditional transfers of control by using comparison specifications Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems, (64-71)
Kwak J, Jhang S and Jhon C Accuracy enhancement by selective use of branch history in embedded processor Proceedings of the 6th international conference on Computational Science - Volume Part IV, (979-986)
Kwak J, Jhang S and Jhon C History length adjustable gshare predictor for high-performance embedded processor Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV, (631-638)
Sohn J, Woo J, Yoo J and Yoo H Design and test of fixed-point multimedia co-processor for mobile applications Proceedings of the conference on Design, automation and test in Europe: Designers' forum, (249-253)
Chang C, Wu W, Su H, Huang Z and Li H ARM based microcontroller for image capturing in FPGA design Proceedings of the First international conference on Advances in Visual Computing, (672-677)
Hung L, Goshima M and Sakai S Mitigating Soft Errors in Highly Associative Cache with CAM-based Tag Proceedings of the 2005 International Conference on Computer Design, (342-350)
Muresan R, Vahedi H, Zhanrong Y and Gregori S Power-smart system-on-chip architecture for embedded cryptosystems Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (184-189)
Fox A An algebraic framework for verifying the correctness of hardware with input and output Proceedings of the First international conference on Algebra and Coalgebra in Computer Science, (157-174)
Paver N, Khan M, Aldrich B and Emmons C (2005). Accelerating Mobile Video, Journal of VLSI Signal Processing Systems , 41 :1 , (21-34), Online publication date: 1-Aug-2005 .
Chinnery D and Keutzer K Closing the power gap between ASIC and custom Proceedings of the 42nd annual Design Automation Conference, (275-280)
Flynn M and Hung P (2005). Microprocessor Design Issues, IEEE Micro , 25 :3 , (16-31), Online publication date: 1-May-2005 .
Batcher K and Walker R Cluster miss prediction with prefetch on miss for embedded CPU instruction caches Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, (24-34)
Sohn J, Woo R and Yoo H A programmable vertex shader with fixed-point SIMD datapath for low power wireless applications Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, (107-114)
Roop P, Salcic Z, Biglari-Abhari M and Bigdeli A A New Reactive Processor with Architectural Support for Control Dominated Embedded Systems Proceedings of the 16th International Conference on VLSI Design
Chatzigeorgiou A and Stephanides G (2019). Energy Metric for Software Systems, Software Quality Journal , 10 :4 , (355-371), Online publication date: 1-Dec-2002 .
Chen H, Kao C and Huang I (2018). Analysis of hardware and software approaches to embedded in-circuit emulation of microprocessors, Australian Computer Science Communications , 24 :3 , (127-133), Online publication date: 1-Jan-2002 .
Chen H, Kao C and Huang I Analysis of hardware and software approaches to embedded in-circuit emulation of microprocessors Proceedings of the seventh Asia-Pacific conference on Computer systems architecture, (127-133)
Furber S, Efthymiou A, Garside J, Lloyd D, Lewis M and Temple S (2001). Power Management in the Amulet Microprocessors, IEEE Design & Test , 18 :2 , (42-52), Online publication date: 1-Mar-2001 .
AMULET3 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors