CES Manager Account: Consortium for Embedded Systems

Project: Research project

Project Details

Description

CES Manager Account: Consortium for Embedded Systems Design of Dense RFID Systems for Indexing in the Physical World across Space, Time, and Human Experience Exploring Multicore-based Hardware/Software Architectures for Mobile Edge Comuting Devices Consortium for Embedded Systems -- Master Account Setup Memory-Aware Compilation for Modern Multi-core Systems The local memory in each core of modern and futuristic multi-core processors is limited, and is not virtualized. As long as the whole application (code + data) fits into the local memory, everything is fine. However, as soon as the application size is more than the local memory, the application must explicitly bring the required data inside the local memory before it is needed, evicting some not-so-urgently needed data. Doing this at a word granularity is similar to implementing caches in software, or Software Cache. Although this provides quick porting, this solution is extremely inefficient. For memory transactions to be efficient in a multi-core architecture, data transactions must be performed at a much coarser granularity. Efficient and coarse-grain memory management is possible by exploiting the access pattern of application code and data. In a well written code global data is pretty small, and is used extensively throughout the program. Previous research and our observations propound the use of a permanent area for the global data. The global variables therefore can be mapped to the permanent area at compile time, and the mapping is not changed throughout the program. Application call graph analysis can be done to determine better mapping for application code and stack data. Heap data must be managed using the least recently used policy, and finally, the unpredictable data in all the above categories, can be managed through a software cache. This proposed organization of the limited local memory is shown in Figure 2. We have already developed code management scheme. This work, SDRM: Simultaneous Determination of Regions and Function-to-Region Mapping for Scratchpad Memories was published in the prestigious International Conference on High Performance Computing, 2008. It enables the single-cores to correctly execute, even if the application code is much larger than the size of the limited local memory. The compiler automatically and efficiently prefetches soon to be needed functions into the local memory, and evicts the less needed ones to the main memory. This proposal, intends to extend the same capabilities to other variable size data, namely, the function stack, and the heap variables. The function stack grows and shrinks like a stack, and therefore a circular management scheme in which only the top few functions are in the local memory at any time will work well. However, this scheme is not suitable for heap data. Heap data analysis can be used to prefetch the heap data blocks that will be needed next, and they must be managed in the local memory by the least recently used policy. The experimental setup for this project will be the IBM Cell processor in the Sony Playstation 3, and we will modify the GNU GCC compiler to support our code, application stack, and heap data management schemes. Robust Testing for Reconfigurable Networked Control Systems and Mixed-Signal Systems The framework of temporal logic robust testing for certain classes of dynamical systems was developed during my PhD thesis work [1]. In detail, in my thesis I developed a robustness theory for the Metric Temporal Logic (MTL) and, consequently, a robust testing framework. MTL can be utilized as a real-time design requirements language for any metric system. The main idea in MTL robust testing is to find certificates (formally referred to as approximate bisimulation functions) that demonstrate that any system trajectory starting near the test trajectory stays close to the test trajectory and within the bounds of the robust MTL behavior. The robust testing framework has been applied to linear-time invariant systems [2], to hybrid systems [3] and to linear parameter varying systems [4]. Recently, I have explored a different approach [5] for the robustness analysis of model-based generated simulations based on self-validated arithmetics. The proposed research will be contacted in two parallel tracks. First, there is a need to develop a state-of-art analysis/verification toolbox based on the theory of robust testing. In this track, there exist several issues that must be addressed: (1) how to perform parallel execution for each test trajectory, (2) how to sample efficiently the parameter space, (3) how to provide coverage guarantees, and (4) how to choose automatically the certificates that describe the behavior of neighborhoods of signals for each class of systems. Moreover, in order to be able to test directly Simulink models, we will have to develop either a Simulink to hybrid automata translator or, preferably, develop a symbolic model representation similar to [5]. The testing platform will be developed using the C/C++ embedded Matlab language MEX. This will enable the verification of large state-space models within practical execution times. Concurrently, we will develop the theory for the neighborhood certificates for NCS. Toward that goal we will take advantage of the recent advances in stability of NCS [6]. Reconfigurable NCS will be modeled as dynamic hybrid automata and the proposed robust testing framework will enable us to study transient and real-time system properties. The theoretical results will be demonstrated on the existing prototype Matlab robust testing toolbox. As a by-product of the theory, we will derive a robust testing Matlab toolbox for mixed-signal systems. Currently, there do not exist any practical approaches for the bounded-time verification of such systems. Initially, the theory will only be developed for linear networked control systems. However, extensions to non-linear systems are possible either for small dimensional polynomial systems or for systems whose nonlinearities can be bounded as in [ Robust Testing for Networked Control and Mixed-Signal Systems The framework of temporal logic robust testing for certain classes of dynamical systems was developed during my PhD thesis work [1]. In detail, in my thesis I developed a robustness theory for the Metric Temporal Logic (MTL) and, consequently, a robust testing framework. MTL can be utilized as a real-time design requirements language for any hybrid system. The main idea in MTL robust testing is to find certificates (formally referred to as approximate bisimulation functions) that demonstrate that any system trajectory starting near the test trajectory stays close to the test trajectory and within the bounds of the robust MTL behavior. The robust testing framework has been applied to linear-time invariant systems [2], to hybrid systems [3] and to linear parameter varying systems [4]. Recently, with my co-authors, I have explored randomized falsification techniques for hybrid systems [5]. The work in [7], which was among the outcomes of last years NSF IUCRC funded project, is also investigating randomized methods for system falsification. The work proposed here will be a continuation of the last years NSF IUCRC funded project. Namely, approximate bisimulation functions will be studied that target networked control systems that operate on machines which are subject to floating point computation errors. In addition, focus will be given on the integration of robust [3, 4] and randomized testing techniques [7] with application to the hybrid system falsification problem. The latter approach is particularly useful when verification, i.e., complete coverage of the parameter space, cannot be achieved. In order to improve the verification process, we will also study the problem of modular verification of complicated specifications. Namely, given a complicated specification in MTL, we will investigate conditions under which the specification can be broken into a number of subspecifications which guarantee the system correctness with respect to the initial specification. The modular approach can be beneficial in the verification of large-scale systems when the subspecifications refer to different output signals of the system. In this case, automated model reduction techniques can be used that guarantee that the output of the reduced system is close to the output of the initial system. Therefore, the complexity of the verification problem can be reduced. Memory Optimizations for Limited Local Memory Multi-core Systems Single-core performance was greatly enhanced by caches in the memory hierarchy. However, as we scale to hundreds and thousands of cores, such a memory architecture does not scale. This is not only because caches are power-hungry, but also because maintaining coherency between different caches in multicore systems has too much performance overheads. Power-efficient and scalable memory architectures, in which each core has software controlled local memory, are becoming popular. Examples are: the IBM Cell processor in Sony Playstation 3, the experimental 80-core processor from Intel, and the latest 6-core TI 6472 processor (Figure 1). However, programming such architectures is a challenge, as the burden of memory management is shifting to the software. Right now, compilers do not make use of the local memory, and the developers have to manually modify the applications to exploit the power and performance advantages of the local memory. As a result, in addition to worrying about the functionality and correctness of the application, developers have to worry about memory management also. We aim to relive programmers of this burden, and develop compiler techniques to automatically exploit the local memories. The goal of this project is to manage the application data automatically in the local memory in the complier, transparent to the user, and also provide the maximum flexibility in terms of using programming constructs, such as pointers and dynamic data structures. Figure 1: 6-core processor from TI. Modern and futuristic processor are coming with software-controlled memories in each core. IUCRC Consortium for Embedded Systems Proposal 2010-11 F1 { F2 } F1 { for (i=0; i<100; i++) { F2 } } F1 F1 F2 F2 F1 F2 Same call 100 graph 2 example programs Call Control Flow Graphs Research Description: The cores in a multi-core architecture can access the local memory in an extremely power and performance efficient manner. Local memories are much more efficient than caches primarily because they are almost 30% smaller and consume about 40% lesser power, at the same access speed. There are two main steps in using local memories to achieve these goals: A Light-Weight Runtime Multi-Tasking Scheduler for Embedded Multi-Core Architectures PROBLEM STATEMENT: The project aims to address the problem of executing multiple independent applications on an embedded multicore architecture. The various applications compete for the limited processor resources, and a varying sub-set of these applications may be present under different use-cases. In other words the number of competing applications may change over time. Also, while some of the applications may have performance constraints, others may not have any such requirements. Further, the target embedded multi-core processors that will be considered as part of the project have limited (or no) operating system (OS) services support. Finally, as we are addressing the embedded domain, the execution of the various applications must occur in a power efficient manner. The objective of the project is to develop and demonstrate a dynamic run-time scheduler for multi-tasking workloads on embedded multi-core processors. Statistical Techniques for Property Exploration of Cyber-Physical Systems Assume that we would like to analyze an already developed Cyber-Physical System (CPS) and that we are asked to find out what properties it satisfies or it does not satisfy. In an ideal scenario, we would also have the formal system specifications which should be satisfied by the system. In this case, we might be able to extract a hybrid model of the system and perform formal verification or model-based testing against the specifications. However, as it is usual in practice and, especially, in non-safe critical applications, such formal specifications are not readily available. Therefore, in such cases, we need to explore or discover the properties that the system satisfies. We propose to investigate property exploration techniques for CPS. Formally, we will study the following problem: Given a specification in a real-time temporal logic with a number of unknown parameters and a CPS (as a model or an implementation), find the parameters that will make the specification satisfiable on the system. Feasibility of Integrating Memristors & Threshold Logic for Compact, Low Power Digital Circuits This is a proposal to integrate two new approaches, both of which are compatible with existing CMOS technology and design methodologies. One is the new technology of resistive memory or memristor (ReM), and the other is a novel digital circuit architecture that realizes threshold logic. ReM is an enabling technology for 3D integration since the devices are built in metal layers, which enables switches to be fabricated in back-end-of-line (BEOL) processes. This can potentially achieve significant reduction in area. Furthermore, in many logic architectures, ReM devices may be used in lieu of MOSFETs for many functions. ReM technologies, which are generally compatible with CMOS, when employed in conjunction with circuit architectures (e.g. threshold logic) offer a potential solution to the future requirements of high density, low power digital circuits. Programming Non-coherent Cache Architectures RATIONALE AND PROBLEM STATEMENT: Multi-core architectures are becoming popular since they provide a way to improve peak performance without much increase in the power consumption. In addition, reliability and temperature issues can be managed at coarser granularity of thread level. As we transition from a few cores to many cores, scaling the memory architecture is one of the most difficult challenges. Cache coherency protocols do not scale well with the number of cores, and therefore maintaining the illusion of a single unified memory in the hardware is becoming difficult. As a result, the distributed nature of the memory organization is being exposed to the software. A purely distributed memory multi-core architecture is Limited Local Memory architecture, e.g., the Cell processor, in which each core can access only its local memory. All the code and data that a core needs has to be present in its local memory. This is simple if the code and data requirements of the task mapped to the core is less than the size of the local memory, but becomes very difficult if the application needs more memory. Then the data/code needed must be brought in before its use through explicit DMA calls, and some other not-so-urgently-needed data may need to be evicted. Compiler techniques to automatically handle this complexity are still a topic of research. One promising scalable memory multi-core architecture is Non-Coherent Cache (NCC) architecture. This architecture seeks a compromise between shared memory multi-core architecture and purely distributed memory multi-core architectures. In the NCC architecture, each core has a cache, but the caches are not coherent. The absence of coherency keeps the caches scalable, while retaining programmability when there is no sharing, or when the shared data is not updated. The 48-core experimental processor from Intel, the Single-chip Cloud Computer (SCC) has NCC architecture. A core can access the entire main memory (off-chip) through L2 and L1 cache. Programming models, like scattergather and Message Passing Interface are ideally suited to program such architectures. This is because in these paradigms, an application is divided into tasks, which do not share memory, and any data sharing is explicitly specified in the program using send and receive functions. However, only a small set of application domains write their applications in MPI or scatter-gather paradigms. In addition, purely distributed memory programming is considered difficult, and also results in duplication of memory. Multithreading is very popular programming paradigm, extensively used, efficient in memory usage, and arguably easier to use. We want to execute this large repertoire of multi-threaded programs on NCC architectures. CES Member: Marvell Semiconductor, Inc. Temporal Logic Testing for Stochastic Cyber-physical SystemS WWe propose too study stochaastic methodss for the evaluuation of the rreliability andd performance of Stochastiic Cyber PPhysical Systeems (SCPS) wwith respect to correctness criteria definned using Mettric Temporal Logic (MTLL) [1] sspecifications. First, we wiill build a librrary of benchmmark problemms of small buut challengingg SCPS and aa number oof correspondding specificattions that cannnot be handleed by existingg methods. WWe will extendd the existing sstochastic search methods tthat are suppoorted by our ttool S-TaLiRoo to handle nooisy cost funcctions. Then, the pperformance oof the algorithhms will be coompared usinng the benchmmark library. FFinally, we wwill study the pproblem oof what types of theoretical guarantees wwe can providde on the perfformance of ssome of the reesulting stochhastic ssearch algorithhms. Parallelization of Embedded Control Applications on Multi-core Architectures: A Case Study The project aims to address the problem of parallelization of embedded control applications on multi-core architectures. The problem is particularly challenging due to the limited parallelization opportunities in control algorithms and the need for guaranteeing timing properties. The problem must be addressed in the context of realistic systems where mature and extensively verified sequential code implementations exist. Thus, the parallelization approach must essentially transform sequential control intensive code into a multi-threaded parallelized version. The timing guarantees that exist on sequential code must be preserved in the parallelized implementation. As the formal model of the control algorithm is not available, guaranteeing timing properties adds an additional degree of complexity to the problem. The objective of the project is to develop design methodologies and parallelization approaches that are effective for realistic control systems. We aim to accomplish this objective by undertaking a design case study of parallelizing an embedded control application. We will select a suitable control algorithm in consultation with industry members. As part of the study we will also evaluate existing multi-core processors and software stacks for their suitability for safety critical control applications. We will take a two tiered approach to the case study. We will first address the parallelization of numerical compute intensive portions of the control algorithm, and then address the control intensive portions. Outcomes of the project will include a report on the suitability of different multi-core architecture styles for embedded control applications, a parallel implementation of the representative control algorithm, and a report on the design methodology and parallelization approach. Feasibility of Integrating Memristors and Threshold Logic for Compact, Low Power Digital Circuits This is a proposal to integrate two approaches in computational logic design, both of which are compatible with existing CMOS technology and design methodologies. One is the new technology of resistive memory, and the other is a novel digital circuit architecture that realizes threshold logic (TL). The effort will extend research on memristor-based TL (MTL) circuits though the development and validation of standard cell libraries, a demonstration of large scale MTL circuit feasibility by designing multistage threshold logic functions, and the investigation of memristor usage in the broader class of threshold-based neuromorphic circuits. Design of an Optimal Closed Loop Controller and its Implementation in an OS Scheduler for Dynamic Energy Management in Heterogeneous Multi-Core Proces Energy efficiency has taken center stage in all aspects of computing, regardless of whether it is performed on a portable battery-powered device, a desktop, on severs in a datacenter, or on a supercomputer. Its importance has become even more critical as multi-core processors are becoming the de facto standard of computing systems in all market segments: smartphones, laptops, desktop PCs, and severs. However, sophisticated thermal and energy management support within latest operating system is missing. Tightly integrated thermal and energy management within an OS is necessary to maximize performance or energy efficiency given thermal and/or energy constraints. Our proposed research will enable existing operating system to achieve complete control over dynamic thermal management techniques through scheduler subsystem. The aim of this project is to develop an efficient, closed-loop controller that will dynamically control the speeds and voltages of each core, and migrate tasks among the cores so as to optimize performance or energy efficiency (performance/watt), or minimize peak temperature, etc. This optimization framework will be based on accurate power and thermal models, which will account for the key deficiencies of existing approaches enumerated above. The thermal aware, optimal DVFS and task migration algorithms to be developed, will be implemented within an operating system scheduler subsystem without compromising any scheduling characteristics such as fairness, interactivity, batch/real time process support and load balancing on heterogeneous CMPs. This integrated framework will be flexible enough to be deployed on various processors and operating systems. It will enhance throughput for servers and battery life for smart gadgets without compromising demanded performance or existing scheduling policies. Synthesis and Design of Robust Threshold Logic Circuits Embedded systems, which have been widely used in various applications, will be rapidly evolved into many forms and ubiquitously deployed to improve virtually every aspect of human life and the society. A significant challenge in the path to this bright future is to develop highperformance and low-power System-on-a-Chip (SoC) devices, which are core components in every modern embedded system. This project tackles this challenge by developing both synthesis and circuit techniques that enable using Threshold Logic (TL) circuits to implement complex SoCs for high-performance and low-power embedded systems. Recent research has demonstrated that TL circuits outperform conventional logic circuits for CMOS technologies and can serve as an excellent choice for implementing digital circuits using future nano devices. The proposed research develops effective TL circuit synthesis algorithms that simultaneously optimize power, area, speed, and robustness of the TL circuits. It develops analytical models and subsequently establishes systematic design and optimization procedures for TL gate design using both CMOS and future nano devices. It also explores new techniques to achieve post-fabrication configuration at the TL gate level to effectively cope with process variations and defects, which are expected to be worsen for future atomic-scale devices. Furthermore, the project investigates applying the developed TL circuit techniques in Time-to- Digital Converter (TDC) based Analog-to-Digital Converter (ADC) designs. Finally, the developed techniques will be evaluated using the benchmark circuits provided or suggested by member companies of the NSF IUCRC Center for Embedded Systems which have interests and expertise in embedded systems for a wide range of applications. Intellectual Merit: Despite their excellent potentials to outperform conventional logic circuits, TL circuits have not yet been adopted by semiconductor industries to address the low-power and high-performance challenges in the development of embedded systems. This is mainly due to the lack of effective TL synthesis tools, systematic TL gate design and optimization procedures, and the concerns of circuit robustness. The project takes a unified approach to simultaneously address these difficult challenges. The research will help pave the way for a new paradigm for implementing high-performance low-power SoCs, which significantly benefits embedded systems in various aspects. The merits of the proposed approaches include new synthesis algorithms to accommodate large TL functions and to cope with process variations, integrating accurate analytical TL gate models in synthesis procedures, and coping with process variations by exploring post-fabrication configuration at the TL gate level. In addition, the proposed work will extend TL circuit techniques to the domain of mixed-signal circuit design. Broader Impact: The proposed research activities will eventually help semiconductor companies produce more reliable and affordable microprocessors and SoC chips, which directly promote the development of more sophisticated, power-efficient and miniature embedded systems. This will clearly enrich various aspects of human life and benefit many industries. The proposed research on implementing TL circuits using future nano devices will help the development of future embedded systems that have great complexity beyond todays imagination. In addition, the project will be used as a vehicle to help students develop interests and abilities to conduct research in post-CMOS technologies. It will also provide research opportunities to undergraduate students. Such experiences will likely inspire their interests in technology and motivate them for graduate studies. Both institutions have an excellent tradition in recruiting and graduating students from underrepresented groups. Therefore, the project will also positively impact the diversity of future workforce. Concurrency and Scheduling Analysis of Real-time Embedded Software on Multi-core Processors Multicore processor has been applied widely in server and desktop systems. Given its advantages of high performance and energy efficiency, the technology has received a great interest in embedded application domains. While porting multi-threaded embedded software to a multicore processor with SMP-ready RTOS seems straightforward, developers may be cautious that synchronization errors that is benign on a uniprocessor system may surface in multiprocessor execution and that synchronization overhead may limit the scale-up factor. To alleviate these concerns, we propose a research project to develop (1) A scalable race detector for embedded software on multicore environment. With consideration of execution order imposed by scheduling algorithm and task models, the detector is aimed to identify different types of races, such as data race, atomicity violation, and order bugs. (2) An integrated scheduling and synchronization scheme to limit the degree of preemption and blocking, which in turn minimizes the overhead incurred in resource access control. Performance Optimal Control of a System of Interconnected Components Under Thermal and Energy Constraints To combat increasingly dense, power hungry, and thermally intensive logic, SoC designers have begun to implement independent controls (e.g. voltage and frequency scaling) over the various processing elements which make up the system. While these controls help alleviate the problem, the full potential of the SoC under temperature constraints can only be accomplished through rigorous and dynamic control. This project builds on several significant and successful prior accomplishments of the PI on this topic. First, we developed a comprehensive set of techniques to optimize the performance and energy efficiency of heterogeneous multicore processors, by controlling the speeds, voltages of each core and task allocation. Second, we incorporated the optimization methods and dynamic error correction techniques within a closed-loop controller, and demonstrated the value of the controller on the Intel SandyBridge quad-core processor board. We showed a 30% improvement of energy efficiency as compared to the existing governors. Third, we developed a novel task-level power profiler for multi-core processors. Finally, we integrated the closedloop controller and the task profiler within the Linux OS. The result is a first Linux OS implementation of a controller that optimizes the energy efficiency of multi-core processors. This project proposes to extend the prior efforts on multi-cores to SoCs. We propose to study the inter-related effects each processing element and the shared interconnect have on the thermal, power, and performance condition of the SoC unit. These relationships will be developed into accurate, but efficient models. Additionally, a real-time optimal control algorithm will be crafted around these models which will be verified on a real SoC of the industry partners choosing. Parallelization of Embedded Control Applications on Multi-core Architectures: A Case Study The project aims to address the problem of parallelization of existing embedded control applications on multi-core architectures. The problem is particularly challenging due to the limited parallelization opportunities in control algorithms and the need for guaranteeing timing properties. The problem must be addressed in the context of realistic systems where mature and extensively used and tested code for single-core systems exists. Both the timing and the functional guarantees that exist on the single-core system must be preserved in the parallelized implementation. As the formal model of the control algorithm is not available, guaranteeing both timing properties and functional behavior adds an additional degree of complexity to the problem. The objective of the project is to develop design methodologies and parallelization approaches that are effective for realistic control systems. We aim to accomplish this objective by undertaking a design case study of parallelizing an embedded control application on a multi-core platform. A suitable embedded control application and multi-core platform will be provided by the industry members. We will take a two tiered approach to the case study. We will first address the parallelization of numerical compute intensive portions of the control algorithm, and then address the control intensive portions. Outcomes of the project will include a report on challenges and the process of porting existing systems onto multi-core architectures and a tool for performing conformance testing between the singlecore and multi-core system. Achieving energy efficient mobile computing through explicit data communication and global power management Today, the mobile phone operating system (OS) plays a central role of power management and optimization. It aims to finish required computations as soon as possible and quickly switches the power-hungry CPU to the various low-power states (P-State). Though one would assume that the CPU is constantly active when a user interacts with his/her mobile device, in reality most of the work is performed by the Special Functional Cores (SFCs) on the mobile platform, e.g., video encoder/decoders, image and graphics processor, or 4G/Wi-Fi antenna. For example, during a phone call or while browsing a web page, the CPU is running at a low P-State most of the time. This is because most computations are performed in the SFCs and the main responsibility of the CPU is to coordinate computations and facilitate data communications between the memory and SFUs. While OS-oriented power management techniques can reduce the overall power usage, we find that there is still a significant opportunity for power optimization on mobile platforms. This proposal aims to design a power/energy-efficient mobile phone architecture by taking into account (1) user behavior of smartphone usage, and (2) the cost of data communication between the hardware components, such that the CPU does not have to wake up as often for data coordination. We also need to understand the architectural implications brought by modern smart phone applications, in order to design architectures that can be as powerful and responsive as need be, and be as efficient and energy-conserving for these handheld devices. Improving Usability of Multi-core DSPs DSPs or Digital Signal Processors are used everywhere from consumer electronics, military and security equipment, to even cloud computing. Unlike general-purpose processors, DSPs applications demand extremely high power-efficiency. To this end, most DSPs are going multi-core and feature complex memory hierarchies. The latest 6-core DSP from TI, TI6472 has a hybrid memory model, comprising of a cache as well as a scratch pad memory or SPM. The idea is to use SPMs for analyzable memory accesses (and therefore improve power and performance), but not loose programmability (through the presence of caches). The idea is that applications can be quickly developed (they will use caches), and as-and-when better power or performance is needed, application can be analyzed, and part of the data can be shifted to the SPM. There is another complication in TI 6472, and that is, that the caches in the TI 6472 are non-coherent. This implies that if one core updates some data in its cache, it does not automatically get updated in the cache of the other core. Therefore at a later time, the other core reads the same data, it may get the older value. This is okay if the tasks executing on the DSP core do not communicate, but if the tasks communicate, especially when one of the cores is writing some value and the other one is reading it, then such tasks will not execute correctly. Since coherence is absent in the hardware, it must be implemented in software. To alleviate the burden of worrying about coherency during application development, we will plan to provide it through compiler. The compiler will discover when communication is needed, and change the application to perform explicit communication therefore enabling correct execution of multi-threaded programs. In the last year, we have developed a first-cut solution for software coherency. This year, we will develop optimizations to improve the performance of compiler-provided coherence. In particular, we will explore two avenues, i) merging write notice, and ii) static discovery of no conflict between tasks. Visual Interface for Metric Temporal Logic Specifications This is a proposal to develop a computational design tool for Metric Temporal Logic (MTL). The tool utilizes functions to visualize, create, edit, and test any MTL specification through figure touch actions interactively and intuitively. The most challenging part is to find an efficient and effective way to envision the basic Boolean logic and temporal operators such as G (always), F (eventually), X (next), and U (until). By using this tool, a user can create and edit the MTL specifications in a touch sensor device even without any special experience or knowledge on MTL. The interactivity, validity, and usability will be evaluated with a user study. As a case study, we are going to apply the tool for visualizing the specifications for an automobile engine. Spintronic Threshold Logic Array We have recently invented a novel circuit architecture for a threshold gate that employs a Spin Transfer Torque Magnetic Tunnel Junction (STT-MTJ) device. An STT-MTJ device is by itself a primitive threshold element. This key characteristic led to design of an extremely simple multi-input threshold gate, named as STL. An STL cell, which consists only of a few MOSFETs and two STT-MTJs, can be used to realize a logic function, which might require several levels of AND/OR gates using conventional logic design. The STT-MTJ devices do not occupy any silicon area, and appear as vias that are fabricated as part of the backend of the line (BEOL) fabrication. This proposal is to use STL cells to design a first-of-a-kind, non-volatile programmable logic-in-memory array. The structure, referred to as Spintronic Threshold Logic Array (STLA), resembles a conventional RAM with its memory cell replaced with a threshold logic circuit. Being a non-volatile circuit, the state or the result of a computation will be retained after the power is switched off. This will allow an STLA to be powered off in the midst of a computation in response to servicing an interrupt for example, and resume the computation later without loss of data. This is a new capability not available in any existing logic architecture and is potentially a paradigm shift in digital system design. Researchers at Qualcomm are interested in evaluating this architecture with the aim of fabricating a prototype circuit. Yr 5: CES Project: Readout Integrated Circuit for Fast Imager Cortical Processor based on RRAM The fundamental goal of this project is to demonstrate STDP-like learning algorithm on a RRAM array A small-scale of RRAM cross-point array will be designed and fabricated at ASU Nano fab. The size of the array is 10x10, which can represent a small network of 10 input neurons connecting to 10 output neurons. Such size of the network is chosen considering the fabrication yield at university-level cleanroom and also the limitation of the number of pads that can be accessed by the probe card simultaneously. The RRAM device will be engineering towards synaptic applications. The optimization targets include 1) the analog memory behavior with continuous weights modulation approaching 5% accuracy, 2) low energy consumption per programming spike (<1 pJ/spike). With the optimized RRAM devices, the crosspoint array will be then fabricated with the probing pads at the edge of the array for externally connection. The Spike-Timing-Dependent Plasticity (STDP)-like learning algorithm will be validated on this crosspoint array. This is done by the software programmed neurons and the hardware synaptic RRAM array. The RRAM array will be connected to arbitrary waveform pulse generator through the probe card and the switch matrix. The programming spikes will be generated by the arbitrary waveform pulse generator and multiplexed to the RRAM array through the switch matrix. The key steps in the proposed learning algorithm will be realized in the following ways: 1) weighted sum: the spikes will be generated according to the image data of the input neurons and transmitted to the RRAM array, and the cross-point array will do the weighted sum of the input spikes for each output neurons; 2) thresholding: the weighted sum will be compared with the threshold of the output neuron in the software, and if it exceeds the threshold, the output neuron will fire and generate spikes; 3) back-propagation: the spikes generated from output neuron will be feed back to the input neurons through the cross-point array; 4) weight-modulation: depending on the difference of the original data and the reconstructed data in the 3rd step, the weight of each RRAM synapse in the array will be modified by a programming voltage pulse proportional to the reconstruction error. I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations Yr 6: Automated Testing for Functional Coverage for Cyber-Physical Systems YR 6: Parallelization of Embedded Control Applications on Multi-core Architectures: A Case Study YR 6: Energy-aware application scheduling for heterogeneous and parallel smart phone architectures Yr 6: Performance Optimal Control of a System of Interconnected Components Under Thermal and Energy Constraints (YEAR II) YR 6: Design of Ultra-low Power Circuits for Compressive Sensing in Mobile Devices CES Member: Robert Bosch GmbH CES Member: Robert Bosch GmbH CES Member: Toyota InfoTechnology Center CES Membership Renewal: Toyota InfoTechnology Center Rev 1 CES Membership: Toyota Info Technology Center 2020 (Renewal) CES Membership: Toyota Info Technology Center 2021 (Renewal) CES Member: Toyota InfoTechnology Center CES Membership RENEWAL: Toyota 2022 CES Member: Cognitive Medical Systems CES Project: Yr 11 Multi-Attribute Circuit Authentication and Reliability Techniques Revision - 3 CES Project: Efficient Offloading of Code in Mobile Cloud with Performance, Battery, and Energy Considerations CES Project: Design of an Efficient Multi-Layer LSTM-RNN on a High-End FPGA Targeting Critical Applications CES Project: Coverage guarantees for requirements guided falsification CES Project: Design and Analysis of Collision Risk Assessment and Collision Avoidance System for Automobiles CES Membership: Alphacore - Year 10 Renewal CES Membership: Alphacore - Year 10 Renewal CES Membership: Alphacore - Year 10 Renewal CES Membership: Alphacore - Year 11 Renewal CES Project: Charge Sensitive Amplifiers and Read-Out Chain Design for DoE Experiments CES Membership: Ball Aerospace & Technologies Corp CES Membership: Shenzhen Qianhai AnyCheck Information Technology Co., Ltd. CES Membership Renewal: Marvell CES Membership Renewal: Marvell YR 11 CES Membership Renewal: Qualcomm CES Membership: Qualcomm 2021 (Renewal) CES Membership: Qualcomm 2021-22 (Renewal)
StatusActive
Effective start/end date9/1/076/30/23

Funding

  • INDUSTRY: Various Consortium Members: $23,048.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.