Detecting Cyber Attacks in the Python Package Index (PyPI)

Background

In the Fall of 2017 I was looking for a research project in the information security field that would also include aspects of software engineering. At the time SKCIRT (Slovakia’s National Security Authority) just published an advisory of 10 Python packages that were typo-squatting popular packages on PyPI (see: SKCIRT Advisory and Ars Technica Article). As I read through the report, I was surprised by how easy it is to pull off this type of attack and was wondering if there would be an easy way of detecting it. However, before I dive into how to potentially detect this type of attack, let’s look at how this attack works.

Anatomy of the Attack

A typo-squatting attack proceeds as follow:

  1. The attacker creates a fake Python package with a name similar to an existing package.
  2. The attacker adds malicious code to the setup.py file of the Python packages. The setup.py file is executed when the package is installed.
  3. The attacker uploads the package to PyPI and waits for victims to install it.
  4. When a victim installs the package using “pip install” the malicious code in setup.py executes.

Detecting the Attack

The Python community has been mostly focused on prevention techniques like checking and preventing the use of typo-squatted packages names. Package signing is not a good option for preventing this type of attack since it verifies only the identity of a package author, but it does not provide any information regarding the malicious intent of an author, even one with a verified identity. An author reputation system could possibly be added (centralized or crowd-based) to the signing part to mitigate attacks.

  1. Dynamic Analysis: Install the package in a sandbox and look for indicators of malicious code.
  2. Static Analysis: Analyze the code without executing it to check for indicators of malicious code.

Static Code Analysis Detection Strategy

The main pattern used for detecting malicious code in the Python installer code (setupy.py) is based on looking for code that attempts to establish an outbound network connection. Most malicious code attempts to exfiltrate data, check-in with “command and control” or both. Both those operations generally require an outbound connection.

Implementation

The detection tool was implemented in Python and uses the Abstract Syntax Tree (AST) library to parse Python source code. The main patterns the tool looks for in the source code is outbound network connections, strings executed as code and obfuscation techniques like base64 encoding.

Results

In my initial scan of PyPI (which included approximately 123,000 packages at the time) I detected 11 packages containing malicious code and reported it privately to the PyPI maintainers earlier this year. Based on the package names — several typo-squatted the popular django package while several others were typo-squatting Python standard libraries.

Future work

Work continues to improve the tool. Obvious avenues of improvement include detection of malicious code in the functional code of packages and reduction of false positives.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Bertus

Bertus

Software and Security Engineering