Research Projects

Brain-Inspired Hyper-Dimensional Computing:
To achieve real-time performance with high energy-efficiency, we rethink not only how we accelerate machine learning algorithms in hardware, but also we redesign the algorithms themselves using strategies that more closely model the ultimate efficient learning machine: the human brain. My Ph.D. research developed brain-inspired HyperDimensional (HD) computing as an interdisciplinary research area emerged from theoretical neuroscience. HD computing is motivated by the understanding that the human brain operates on high-dimensional representations of data originated from the large size of brain circuits. It thereby models the human memory using points of a high-dimensional space. HD computing mimics several desirable properties of the human brain, including: robustness to noise and hardware failure and single-pass learning where training happens in one-shot without storing the training data points or using complex gradient-based algorithms. These features make HD computing a promising solution for: (1) today's embedded devices with limited storage, battery, and resources, as well as (2) future computing systems in deep nanoscaled technology, which will have high noise and variability. I exploited the mathematics and the key principles of brain functionalities to create cognitive platforms.
Our platform includes: (1) novel HD algorithms supporting classification, clustering, regression, and reinforcement learning which represent the most popular categories of algorithms used regularly by professional data scientist [D&T'17, DATE'19, DAC'19], (2) novel HD hardware accelerators capable of up to three orders of magnitude improvement in energy efficiency relative to GPU implementations [HPCA'17, FCCM'19, FPGA'19], and (3) a software infrastructure that makes it easy for users to integrate HD computing as a part of any system and enable secure distributed learning on encrypted information [CLOUD'19]. Our research opened a new direction in the brain-inspired learning method that involves many different schools, government agencies, and companies. In addition, DARPA recently opened a new program influenced by my Ph.D. research (Link).
To achieve real-time performance with high energy-efficiency, we rethink not only how we accelerate machine learning algorithms in hardware, but also we redesign the algorithms themselves using strategies that more closely model the ultimate efficient learning machine: the human brain. My Ph.D. research developed brain-inspired HyperDimensional (HD) computing as an interdisciplinary research area emerged from theoretical neuroscience. HD computing is motivated by the understanding that the human brain operates on high-dimensional representations of data originated from the large size of brain circuits. It thereby models the human memory using points of a high-dimensional space. HD computing mimics several desirable properties of the human brain, including: robustness to noise and hardware failure and single-pass learning where training happens in one-shot without storing the training data points or using complex gradient-based algorithms. These features make HD computing a promising solution for: (1) today's embedded devices with limited storage, battery, and resources, as well as (2) future computing systems in deep nanoscaled technology, which will have high noise and variability. I exploited the mathematics and the key principles of brain functionalities to create cognitive platforms.
Our platform includes: (1) novel HD algorithms supporting classification, clustering, regression, and reinforcement learning which represent the most popular categories of algorithms used regularly by professional data scientist [D&T'17, DATE'19, DAC'19], (2) novel HD hardware accelerators capable of up to three orders of magnitude improvement in energy efficiency relative to GPU implementations [HPCA'17, FCCM'19, FPGA'19], and (3) a software infrastructure that makes it easy for users to integrate HD computing as a part of any system and enable secure distributed learning on encrypted information [CLOUD'19]. Our research opened a new direction in the brain-inspired learning method that involves many different schools, government agencies, and companies. In addition, DARPA recently opened a new program influenced by my Ph.D. research (Link).

Deep Learning and Big Data Processing Acceleration:
Running data/memory-intensive workloads on traditional cores results in high energy consumption and slow processing speeds, primarily due to a large amount of data movement between memory and processing units. I have designed a digital-based Processing in-memory (PIM) platform capable of accelerating fundamental big data applications in real-time with orders of magnitude higher energy efficiency [ISCA'19, HPCA'2020, DAC'17]. My design accelerates entire applications directly in storage-class memory without using extra processing cores. My platform opened a new direction towards making the PIM technology practical. In contrast to prior method that enable PIM functionality in analog domain, we design the first digital-based PIM architecture that (i) works on digital data; thus, it eliminates ADC/DAC blocks that dominate the area. (ii) it addresses internal data movement issue by enabling in-place computation where the big data is stored, (iii) it natively supports floating-point precision that is essential for many scientific applications, (iv) it is compatible with any bipolar memory technology, including Intel 3D XPoint. My proposed platform can also accelerate a wide range of big data applications including machine learning [ISCA'19, HPCA'20, TC'19], query processing [TCAD'18], graph processing [ISLPED'18], and bioinformatics [ISLPED'19]. One particularly successful application of my design is FloatPIM architecture [ISCA'19], which significantly accelerates state-of-the-art Convolutional Neural Networks (CNNs).
Running data/memory-intensive workloads on traditional cores results in high energy consumption and slow processing speeds, primarily due to a large amount of data movement between memory and processing units. I have designed a digital-based Processing in-memory (PIM) platform capable of accelerating fundamental big data applications in real-time with orders of magnitude higher energy efficiency [ISCA'19, HPCA'2020, DAC'17]. My design accelerates entire applications directly in storage-class memory without using extra processing cores. My platform opened a new direction towards making the PIM technology practical. In contrast to prior method that enable PIM functionality in analog domain, we design the first digital-based PIM architecture that (i) works on digital data; thus, it eliminates ADC/DAC blocks that dominate the area. (ii) it addresses internal data movement issue by enabling in-place computation where the big data is stored, (iii) it natively supports floating-point precision that is essential for many scientific applications, (iv) it is compatible with any bipolar memory technology, including Intel 3D XPoint. My proposed platform can also accelerate a wide range of big data applications including machine learning [ISCA'19, HPCA'20, TC'19], query processing [TCAD'18], graph processing [ISLPED'18], and bioinformatics [ISLPED'19]. One particularly successful application of my design is FloatPIM architecture [ISCA'19], which significantly accelerates state-of-the-art Convolutional Neural Networks (CNNs).

Online Memorization for Approximation:
Today’s computing systems are designed to deliver only exact solutions. However, many applications, i.e., machine learning applications, are statistical in nature and do not require exact answers. While prior research has explored approximate computing, most solutions to date are (1) isolated to only a few of the components in the system stack, and (2) do not learn to enable an intelligent approximation. The real challenge arises when developers want to employ approximation across multiple layers of the computing stack simultaneously. I proposed a novel architecture with software and hardware support for the acceleration of learning and multimedia applications on today’s computing systems [DATE'16]. Hardware components are enhanced with the ability to adapt self-learning approximation at a quantifiable and controllable cost in terms of accuracy. Software services complement hardware to ensure the user’s perception is not compromised while maximizing the energy savings due to approximations. I have enhanced the computing units (CPU, GPU, and DSPs) with a small associative memory placed close to each streaming core [ISLPED'16, TETC'16]. This associative memory with the capability of self-learning is placed beside each computing unit to remember frequent patterns and reduce redundant computations [ISLPED'16]. The main idea of the approximation is to return pre-computed results from the associative memory, not only for perfect matches of operands but also for inexact matches [ISLPED'16].
Today’s computing systems are designed to deliver only exact solutions. However, many applications, i.e., machine learning applications, are statistical in nature and do not require exact answers. While prior research has explored approximate computing, most solutions to date are (1) isolated to only a few of the components in the system stack, and (2) do not learn to enable an intelligent approximation. The real challenge arises when developers want to employ approximation across multiple layers of the computing stack simultaneously. I proposed a novel architecture with software and hardware support for the acceleration of learning and multimedia applications on today’s computing systems [DATE'16]. Hardware components are enhanced with the ability to adapt self-learning approximation at a quantifiable and controllable cost in terms of accuracy. Software services complement hardware to ensure the user’s perception is not compromised while maximizing the energy savings due to approximations. I have enhanced the computing units (CPU, GPU, and DSPs) with a small associative memory placed close to each streaming core [ISLPED'16, TETC'16]. This associative memory with the capability of self-learning is placed beside each computing unit to remember frequent patterns and reduce redundant computations [ISLPED'16]. The main idea of the approximation is to return pre-computed results from the associative memory, not only for perfect matches of operands but also for inexact matches [ISLPED'16].