[Introduction to New Wisdom]Recently, a research team from MIT published the Ddog project, which allows four-legged robots to be controlled with just a pair of glasses, helping special people regain hope.
Remember the AI mind-reading technology from before? Recently, the ability to “make all your wishes come true” has evolved again – humans can directly control robots through their own thoughts!
Researchers from MIT announced the Ddog project, which uses a self-developed brain-computer interface (BCI) device to control Boston Dynamics' robot dog Spot. Dogs can move to specific areas, help people get things, or take photos according to human thoughts.
Moreover, compared to the previous need to use a headgear full of sensors to “read minds”, this time the brain-computer interface device appears in the form of a pair of wireless glasses (AttentivU)!
The behavior shown in the video may be simple, but the goal of this system is to transform Spot into a basic communication tool for people with diseases such as amyotrophic lateral sclerosis (ALS), cerebral palsy, or spinal cord injury.
The entire system only requires two iPhone and a pair of glasses can bring practical help and care to these people who have lost hope in life.
And, as we will see in related papers, this system is actually built on very complex engineering.
Paper address:Click here to view
The Ddog system uses AttentivU as a brain-computer interface system, with sensors embedded in the frame that measure a person's electroencephalogram (EEG), or brain activity, and electrooculogram, or eye movement.
The research is based on MIT's Brain Switch, a real-time closed-loop BCI that allows users to communicate nonverbally and in real time with caregivers.
The Ddog system has a success rate of 83.4%, and this is the first time a wireless, non-visual BCI system has been integrated with Spot in a personal assistant use case.
We can see the evolution of brain-computer interface devices and some of the thinking of developers.
Prior to this, the research team had completed the interaction of brain-computer interfaces with smart homes, and now has completed the control of robots that can move and operate.
These studies have given special groups a glimmer of light, giving them hope of survival and even a better life in the future.
Compared to the octopus-like sensor headgear, the glasses below are indeed much cooler.
According to the National Organization for Rare Diseases, there are currently 30,000 people with ALS in the United States, and an estimated 5,000 new cases are diagnosed each year. Additionally, approximately 1 million Americans have cerebral palsy, according to the Cerebral Palsy Guide.
Many of these people have lost or will eventually lose the ability to walk, dress, talk, write, and even breathe.
While communication aids do exist, most are eye-gazing devices that allow users to communicate using a computer. There aren't many systems that allow users to interact with the world around them.
This BCI quadruped robotic system serves as an early prototype and paves the way for the future development of modern personal assistant robots.
Hopefully, we’ll see even more amazing capabilities in future iterations.
Brain-controlled quadruped robot
In this work, researchers explore how wireless and wearable BCI devices can control a four-legged robot, Boston Dynamics' Spot.
The device developed by the researchers measures the user's electroencephalogram (EEG) and electrooculogram (EOG) activity through electrodes embedded in the frame of the glasses.
The user mentally answers a series of questions (“yes” or “no”), each of which corresponds to a set of preset Spot actions.
For example, prompt Spot to walk through a room, pick up an object (such as a bottle of water), and then retrieve it for the user.
Robots and BCI
To this day, EEG remains one of the most practical and applicable non-invasive brain-computer interface methods.
BCI systems can be controlled using endogenous (spontaneous) or exogenous (evoked) signals.
In exogenous BCIs, evoked signals occur when a person pays attention to external stimuli, such as visual or auditory cues.
The advantages of this approach include minimalist training and high bitrates of up to 60 bits/min, but this requires the user to remain focused on the stimulus, thus limiting its applicability in real-life settings. Furthermore, users tire quickly when using exogenous BCIs.
In endogenous BCIs, control signals are generated independently of any external stimulus and can be fully executed by the user on demand. For those users with sensory impairments, this provides a more natural and intuitive way of interacting, allowing users to spontaneously issue commands to the system.
However, this method usually requires longer training time and has a lower bit rate.
Robotic applications using brain-computer interfaces are often for people in need of assistance, and they often include wheelchairs and exoskeletons.
The figure below shows the latest progress in brain-computer interface and robotics technology as of 2023.
Quadruped robots are often used to support users in complex work environments or defense applications.
One of the most famous quadruped robots is Boston Dynamics' Spot, which can carry up to 15 kilograms of payload and iteratively map maintenance sites such as tunnels. The real estate and mining industries are also adopting four-legged robots like Spot to help monitor job sites with complex logistics.
This article uses the Spot robot controlled by the mobile BCI solution and is based on the mental arithmetic task. The overall architecture is named Ddog.
Ddog architecture
The following figure shows the overall structure of Ddog:
Ddog is an autonomous application that enables users to control the Spot robot through input from the BCI, while the application uses voice to provide feedback to the user and their caregivers.
The system is designed to work completely offline or completely online. The online version has a more advanced set of machine learning models, as well as better fine-tuned models, and is more power efficient for local devices.
The entire system is designed for real-world scenarios and allows for rapid iteration on most parts.
On the client side, users interact with the brain-computer interface device (AttentivU) through a mobile application that uses the Bluetooth Low Energy (BLE) protocol to communicate with the device.
The user's mobile device communicates with another phone controlling the Spot robot to enable agency, manipulation, navigation, and ultimately assistance to the user.
Communication between phones can be via Wi-Fi or mobile networks. The controlling mobile phone establishes a Wi-Fi hotspot, and both Ddog and the user's mobile phone are connected to this hotspot. When using online mode, you can also connect to models running on the cloud.
Server
The server side uses Kubernetes (K8S) clusters, each cluster is deployed in its own Virtual Private Cloud (VPC).
The cloud works within a dedicated VPC, typically deployed in the same Availability Zone closer to end users, minimizing response latency for each service.
Each container in the cluster is designed to be single-purpose (microservice architecture), and each service is a running AI model. Their tasks include: navigation, mapping, computer vision, manipulation, localization, and agency.
Mapping: A service that collects information about a robot's surroundings from different sources. It maps static, immovable data (a tree, a building, a wall) but also collects dynamic data that changes over time (a car, a person).
Navigation: Based on map data collected and augmented in previous services, the navigation service is responsible for constructing a path between point A and point B in space and time. It is also responsible for constructing alternative routes, as well as estimating the time required.
Computer Vision: Collect visual data from robot cameras and augment it with data from mobile phones to generate spatial and temporal representations. This service also attempts to segment each visual point and identify objects.
The cloud is responsible for training BCI-related models, including electroencephalogram (EEG), electrooculogram (EOG), and inertial measurement unit (IMU).
Offline models deployed on mobile phones run data collection and aggregation, while TensorFlow's mobile models (optimized for smaller RAM and ARM-based CPUs) are also used for real-time inference.
Vision and Operations
The original version for deploying segmentation models was a single TensorFlow 3D model leveraging LIDAR data. The authors then extended this to a few-shot model and enhanced it by running complementary models on Neural Radiation Field (NeRF) and RGBD data.
The raw data collected by Ddog is aggregated from five cameras. Each camera can provide grayscale, fisheye, depth and infrared data. There is also a sixth camera inside the arm's gripper, with 4K resolution and LED capabilities, that works with a pre-trained TensorFlow model to detect objects.
Point clouds are generated from lidar data and RGBD data from Ddog and mobile phones. After data acquisition is complete, it is normalized through a single coordinate system and matched to a global state that brings together all imaging and 3D positioning data.
Operation depends entirely on the quality of the robotic arm gripper mounted on the Ddog. The gripper pictured below is manufactured by Boston Dynamics.
Limit your use cases to basic interactions with objects in predefined locations.
The author drew a large laboratory space and set it up as an “apartment” with a “kitchen” area (with a tray with different cups and bottles), a “living room” area (a small sofa with pillows and a coffee table), and a “window lounge” area.
The number of use cases is constantly growing, so the only way to cover most of them is to deploy a system to run continuously for a period of time and use the data to optimize such sequences and experiences.
AttentivU
EEG data were collected from the AttentivU device. The electrodes of AttentivU glasses are made of natural silver and are located at TP9 and TP10 according to the international 10-20 electrode placement system. The glasses also include two EOG electrodes located on the nose pads and an EEG reference electrode located at Fpz.
These sensors can provide the information needed and enable real-time, closed-loop intervention when needed.
The device has two modes, EEG and EOG, which can be used to capture signals of attention, engagement, fatigue and cognitive load in real time. EEG has been used as a neurophysiological indicator of the transition between wakefulness and sleep,
EOG, on the other hand, is based on the measurement of bioelectrical signals induced due to corneal-retinal dipole properties during eye movements. Research shows that eye movements correlate with the type of memory access needed to perform certain tasks and are a good measure of visual engagement, attention, and drowsiness.
experiment
First the EEG data is divided into several windows. Define each window as a 1 second long duration of EEG data with 75% overlap with the previous window.
Then comes data preprocessing and cleaning. The data were filtered using a combination of a 50 Hz notch filter and a bandpass filter with a passband of 0.5 Hz to 40 Hz to ensure removal of power line noise and unwanted high frequencies.
Next, the authors created an artifact rejection algorithm. An epoch is rejected if the absolute power difference between two consecutive epochs is greater than a predefined threshold.
In the final step of classification, the authors mixed different spectral band power ratios to track each subject's task-based mental activity. For MA, the ratio is (alpha/delta). For WA, the ratio is (delta/low beta) and for ME, the ratio is (delta/alpha).
Then, change point detection algorithms are used to track changes in these ratios. Sudden increases or decreases in these ratios indicate a change in the user's mental state.
For subjects with ALS, our model achieved 73% accuracy in the MA task, 74% accuracy in the WA task, and 60% accuracy in the ME task.
References:
https://www.therobotreport.com/ddog-mit-project-connects-brain-computer-interface-spot-robot/