Research Projects
Brain-Inspired Hyper-Dimensional Computing:
To achieve real-time performance with high energy-efficiency, we rethink not only how we accelerate machine learning algorithms in hardware, but also we redesign the algorithms themselves using strategies that more closely model the ultimate efficient learning machine: the human brain. We aim to developed brain-inspired HyperDimensional (HD) computing as an interdisciplinary research area emerged from theoretical neuroscience. HD computing is motivated by the understanding that the human brain operates on high-dimensional representations of data originated from the large size of brain circuits. It thereby models the human memory using points of a high-dimensional space. HD computing mimics several desirable properties of the human brain, including: robustness to noise and hardware failure and single-pass learning where training happens in one-shot without storing the training data points or using complex gradient-based algorithms. These features make HD computing a promising solution for: (1) today's embedded devices with limited storage, battery, and resources, as well as (2) future computing systems in deep nanoscaled technology, which will have high noise and variability. I exploited the mathematics and the key principles of brain functionalities to create cognitive platforms.
Our platform includes: (1) novel HDC algorithms supporting classification, clustering, regression, and reinforcement learning which represent the most popular categories of algorithms used regularly by professional data scientist [D&T'17, DATE'19, DAC'19], (2) novel HD hardware accelerators capable of up to three orders of magnitude improvement in energy efficiency relative to GPU implementations [HPCA'17, FCCM'19, FPGA'19], and (3) a software infrastructure that makes it easy for users to integrate HD computing as a part of any system and enable secure distributed learning on encrypted information [CLOUD'19]. Our research opened a new direction in the brain-inspired learning method that involves many different schools, government agencies, and companies.
To achieve real-time performance with high energy-efficiency, we rethink not only how we accelerate machine learning algorithms in hardware, but also we redesign the algorithms themselves using strategies that more closely model the ultimate efficient learning machine: the human brain. We aim to developed brain-inspired HyperDimensional (HD) computing as an interdisciplinary research area emerged from theoretical neuroscience. HD computing is motivated by the understanding that the human brain operates on high-dimensional representations of data originated from the large size of brain circuits. It thereby models the human memory using points of a high-dimensional space. HD computing mimics several desirable properties of the human brain, including: robustness to noise and hardware failure and single-pass learning where training happens in one-shot without storing the training data points or using complex gradient-based algorithms. These features make HD computing a promising solution for: (1) today's embedded devices with limited storage, battery, and resources, as well as (2) future computing systems in deep nanoscaled technology, which will have high noise and variability. I exploited the mathematics and the key principles of brain functionalities to create cognitive platforms.
Our platform includes: (1) novel HDC algorithms supporting classification, clustering, regression, and reinforcement learning which represent the most popular categories of algorithms used regularly by professional data scientist [D&T'17, DATE'19, DAC'19], (2) novel HD hardware accelerators capable of up to three orders of magnitude improvement in energy efficiency relative to GPU implementations [HPCA'17, FCCM'19, FPGA'19], and (3) a software infrastructure that makes it easy for users to integrate HD computing as a part of any system and enable secure distributed learning on encrypted information [CLOUD'19]. Our research opened a new direction in the brain-inspired learning method that involves many different schools, government agencies, and companies.
Cybersecurity:
The cybersecurity industry has never been more important than today. Security issues are becoming a day-to-day struggle for businesses and the department of defense. Global cybercrime costs are on the rise. According to the recent report, the demand of the global cybersecurity market is expected to reach from $176.6 Billion in 2020 to $398.3 Billion by 2026. Cybercrimes are also rising by 600% during the pandemic; businesses are more vulnerable than ever to the financial and reputational repercussions of cyber-attacks. |
We aim to address security issues from multiple perspective:
- Security monitoring with human-interpretable reasoning: To extract useful information, security monitoring systems need to rely on sophisticated and costly machine learning and artificial intelligence algorithms. At the same time, it is difficult to effectively deploy machine learning without a comprehensive, rich, and complete approach to the underlying data. We plan to develop a robust, real-time, and transparent security monitoring system. Our framework is able first represent security-related data into a holographic space to abstract the knowledge. Then, we develop cognitive learning algorithms capable of security monitoring and human-like reasoning.
- Crypto-based AI computing on Edge: Edge devices are becoming increasingly pervasive in everyday life. There is a crucial need for protecting data and security at the edge. Secure computation relies on costly cryptography methods, which add a significant computing burden on edge devices. To ensure theoretical security support for HDC, we introduce a novel framework that enhances HD computing encoding with state-of-the-art encryption methods. Our solution ensures ultra-lightweight encryption as well as hardware-friendly cognitive operations over encrypted data, thus enabling secure learning on edge.
- Privacy in AI computation: Privacy is one of the key challenges of machine learning algorithms. The lack of trustworthy learning limits machine learning application in real-world IoT systems. We aim to address two important challenges in privacy, data privacy and model privacy. Although there are a lot of existing solutions, our target is more on foundation, as we look at brain-inspired and symbolic AI models with computational transparency. We exploit this feature to develop fully mathematical model that not only mathematically ensures information leakage (privacy level) but also provide theoretical analysis on information loss.
Big Data Processing Acceleration:
Running data/memory-intensive workloads on traditional cores results in high energy consumption and slow processing speeds, primarily due to a large amount of data movement between memory and processing units. I have designed a digital-based Processing in-memory (PIM) platform capable of accelerating fundamental big data applications in real-time with orders of magnitude higher energy efficiency [ISCA'19, HPCA'2020, DAC'17]. My design accelerates entire applications directly in storage-class memory without using extra processing cores. My platform opened a new direction towards making the PIM technology practical. In contrast to prior method that enable PIM functionality in analog domain, we design the first digital-based PIM architecture that (i) works on digital data; thus, it eliminates ADC/DAC blocks that dominate the area. (ii) it addresses internal data movement issue by enabling in-place computation where the big data is stored, (iii) it natively supports floating-point precision that is essential for many scientific applications, (iv) it is compatible with any bipolar memory technology, including Intel 3D XPoint. My proposed platform can also accelerate a wide range of big data applications including machine learning [ISCA'19, HPCA'20, TC'19], query processing [TCAD'18], graph processing [ISLPED'18], and bioinformatics [ISLPED'19]. One particularly successful application of my design is FloatPIM architecture [ISCA'19], which significantly accelerates state-of-the-art Convolutional Neural Networks (CNNs).
Running data/memory-intensive workloads on traditional cores results in high energy consumption and slow processing speeds, primarily due to a large amount of data movement between memory and processing units. I have designed a digital-based Processing in-memory (PIM) platform capable of accelerating fundamental big data applications in real-time with orders of magnitude higher energy efficiency [ISCA'19, HPCA'2020, DAC'17]. My design accelerates entire applications directly in storage-class memory without using extra processing cores. My platform opened a new direction towards making the PIM technology practical. In contrast to prior method that enable PIM functionality in analog domain, we design the first digital-based PIM architecture that (i) works on digital data; thus, it eliminates ADC/DAC blocks that dominate the area. (ii) it addresses internal data movement issue by enabling in-place computation where the big data is stored, (iii) it natively supports floating-point precision that is essential for many scientific applications, (iv) it is compatible with any bipolar memory technology, including Intel 3D XPoint. My proposed platform can also accelerate a wide range of big data applications including machine learning [ISCA'19, HPCA'20, TC'19], query processing [TCAD'18], graph processing [ISLPED'18], and bioinformatics [ISLPED'19]. One particularly successful application of my design is FloatPIM architecture [ISCA'19], which significantly accelerates state-of-the-art Convolutional Neural Networks (CNNs).
Distributed IoT Systems:
Machine learning methods have been widely utilized to provide high quality for many cognitive tasks. Running the sophisticated learning tasks requires high computational costs to process a large amount of learning data. A common solution is to use cloud and data centers as the main central computing units. However, with the emergence of the Internet of Things (IoT), the centralized approach faces several scalability challenges towards high-performance computing. In the IoT systems, a large number of embedded devices are deployed to collect data from the environment and produce information. |
The partial data need to be aggregated to perform the target learning task in the IoT networks such as a home- or even city-scale. It consequently leads to a significant communication cost with high latency to transfer all data points to a centralized cloud.
However, effective learning in the IoT hierarchy is still an open question. We recognize the following technical challenges to scale the learning tasks for the IoT hierarchy. (i) In reality, each IoT device has different types of sensors that generate heterogeneous features. (ii) The edge devices often do not have sufficient resources for online processing of the sophisticated learning algorithms. (iii) To train and infer in the centralized fashion, the communication may dominate the total computing costs as the size of data generated in the swarm of the IoT devices increases. Even if the learning tasks could be distributed to the edge devices by deploying a costly hardware accelerator, a large amount of data requires to be transferred between different nodes, e.g., inputs and outputs of neurons for DNN models, during the model training procedure.
In addition, reliable communications are not granted, and IoT networks are often deployed assuming harsh network conditions.
However, effective learning in the IoT hierarchy is still an open question. We recognize the following technical challenges to scale the learning tasks for the IoT hierarchy. (i) In reality, each IoT device has different types of sensors that generate heterogeneous features. (ii) The edge devices often do not have sufficient resources for online processing of the sophisticated learning algorithms. (iii) To train and infer in the centralized fashion, the communication may dominate the total computing costs as the size of data generated in the swarm of the IoT devices increases. Even if the learning tasks could be distributed to the edge devices by deploying a costly hardware accelerator, a large amount of data requires to be transferred between different nodes, e.g., inputs and outputs of neurons for DNN models, during the model training procedure.
In addition, reliable communications are not granted, and IoT networks are often deployed assuming harsh network conditions.
- Distributed learning, beyond federated learning: In this work, we seek to enable a distributed learning using the data that heterogeneous sensors for each IoT device generate on the fly. We accelerate the learning tasks by utilizing the IoT devices as federated computing units, i.e., the learning tasks are processed on the local embedded devices located in the hierarchy.
- Novel communication protocol: to ensure end-to-end system efficiency, the communication and computation systems need to be integrated. This requires novel communication protocols that are compatible with machine learning algorithms. We also need to design machine learning and network protocols that have natural robustness to information loss.
Previous Research
Online Memorization for Approximation:
Today’s computing systems are designed to deliver only exact solutions. However, many applications, i.e., machine learning applications, are statistical in nature and do not require exact answers. While prior research has explored approximate computing, most solutions to date are (1) isolated to only a few of the components in the system stack, and (2) do not learn to enable an intelligent approximation. The real challenge arises when developers want to employ approximation across multiple layers of the computing stack simultaneously. I proposed a novel architecture with software and hardware support for the acceleration of learning and multimedia applications on today’s computing systems [DATE'16]. Hardware components are enhanced with the ability to adapt self-learning approximation at a quantifiable and controllable cost in terms of accuracy. Software services complement hardware to ensure the user’s perception is not compromised while maximizing the energy savings due to approximations. I have enhanced the computing units (CPU, GPU, and DSPs) with a small associative memory placed close to each streaming core [ISLPED'16, TETC'16]. This associative memory with the capability of self-learning is placed beside each computing unit to remember frequent patterns and reduce redundant computations [ISLPED'16]. The main idea of the approximation is to return pre-computed results from the associative memory, not only for perfect matches of operands but also for inexact matches [ISLPED'16].
Today’s computing systems are designed to deliver only exact solutions. However, many applications, i.e., machine learning applications, are statistical in nature and do not require exact answers. While prior research has explored approximate computing, most solutions to date are (1) isolated to only a few of the components in the system stack, and (2) do not learn to enable an intelligent approximation. The real challenge arises when developers want to employ approximation across multiple layers of the computing stack simultaneously. I proposed a novel architecture with software and hardware support for the acceleration of learning and multimedia applications on today’s computing systems [DATE'16]. Hardware components are enhanced with the ability to adapt self-learning approximation at a quantifiable and controllable cost in terms of accuracy. Software services complement hardware to ensure the user’s perception is not compromised while maximizing the energy savings due to approximations. I have enhanced the computing units (CPU, GPU, and DSPs) with a small associative memory placed close to each streaming core [ISLPED'16, TETC'16]. This associative memory with the capability of self-learning is placed beside each computing unit to remember frequent patterns and reduce redundant computations [ISLPED'16]. The main idea of the approximation is to return pre-computed results from the associative memory, not only for perfect matches of operands but also for inexact matches [ISLPED'16].