Skip to main content

Using Python’s subprocess module

Want to glue Python to the rest of your system  -  call a compiled program, run a shell script, invoke Node.js, call Java, or pipe data to/from R? Enter subprocess. It’s Python’s standard way to start external programs, control their input/output, check return codes, set environments and timeouts — all with a solid API.

Below is a practical, friendly guide: what subprocess is, why you’d use it, examples, best practices, pitfalls, and which languages/tools you can call from it.

What is subprocess?

subprocess is a standard Python module that lets your Python program spawn new processes, connect to their input/output/error pipes, and obtain their return codes. It replaces older modules like os.system, popen and friends with a unified, safer interface.

Key high-level primitives:

  • subprocess.run() — simple, recommended for most cases (Python 3.5+).

  • subprocess.Popen — lower-level, use when you need streaming IO, advanced control, or long-lived processes.

  • subprocess.check_output() — capture stdout (older convenience; run(..., capture_output=True) is now preferred).

Why use subprocess? (Advantages)

  • Language-agnostic interoperability: call any program or script that can run on the host (compiled binaries, Java JARs, Node scripts, shell scripts, R scripts, etc.).

  • Leverage existing tools: reuse battle-tested command-line utilities (ffmpeg, imagemagick, grep, custom compiled tools) instead of re-implementing functionality.

  • Process isolation: external programs run in separate processes — crashes or memory usage are isolated from your Python interpreter.

  • Performance / native code: heavy computation can run in native binaries (C, Rust, Go) and return results quickly to Python.

  • Flexible IO: easily pipe data to/from stdin/stdout/stderr, stream output, or capture output for parsing.

  • Portability: same Python API works across platforms; you only need to change the command arguments where platform differences exist.

  • Security (when used correctly): by passing argument lists rather than a shell string you avoid shell injection risks.

Which languages / tools are compatible with subprocess?

Short answer: any language or tool that can be invoked from the command line on your system. subprocess does not care about language — it launches executables or interpreter processes.

Common examples:

  • Compiled languages: C, C++, Rust, Go — call the produced executable:

    subprocess.run(["./my_c_program", "arg1"])
    
  • Java: run jars or classes via the java runtime:

    subprocess.run(["java", "-jar", "app.jar", "arg"])
    
  • Node.js / JavaScript:

    subprocess.run(["node", "script.js"])
    
  • Python (other interpreters or scripts):

    subprocess.run(["python3", "other_script.py"])
    
  • Shell scripts / POSIX tools (bash, awk, sed, grep, ls, etc.):

    subprocess.run(["/bin/bash", "script.sh"])
    
  • PowerShell / Batch (Windows):

    subprocess.run(["powershell", "-File", "script.ps1"])
    subprocess.run(["cmd", "/c", "script.bat"])
    
  • Scripting languages: Ruby (ruby script.rb), Perl (perl script.pl), R (Rscript analysis.R) — same pattern.

  • Command-line programs: ffmpeg, curl, git, imagemagick tools, DB clients, etc.

Because subprocess launches processes, any program available in PATH or by absolute path is callable. The only requirement is that the target is runnable on your OS (an .exe on Windows, ELF binary on Linux, or an interpreter that can run the script).

Practical examples

1) Simple command, check exit status

import subprocess

# list files (POSIX)
subprocess.run(["ls", "-la"], check=True)

2) Capture output (text mode)

result = subprocess.run(
    ["git", "rev-parse", "HEAD"],
    capture_output=True,
    text=True,
    check=True
)
commit_hash = result.stdout.strip()

3) Run a Java JAR

subprocess.run(["java", "-jar", "myapp.jar", "--config", "cfg.yml"], check=True)

4) Run Node script and capture stdout

result = subprocess.run(["node", "generate-json.js"], capture_output=True, text=True)
data = result.stdout

5) Stream output (useful for long-running commands)

from subprocess import Popen, PIPE

p = Popen(["tail", "-f", "/var/log/syslog"], stdout=PIPE, stderr=PIPE, text=True)
for line in p.stdout:
    print("log>", line, end="")
# p.terminate() when done

6) Use timeout and handle exceptions

import subprocess

try:
    subprocess.run(["sleep", "10"], timeout=5)
except subprocess.TimeoutExpired:
    print("Process timed out and was killed")

Best practices & security tips

  • Prefer passing args as a list (["cmd", "arg1"]) rather than a single shell string. This avoids shell interpretation and avoids many injection risks.

  • Avoid shell=True unless you must run shell features (pipes, redirection, shell builtins). If you use it, never pass unsanitized user input into the shell string.

  • Use check=True to raise CalledProcessError when a command fails — makes error-handling explicit.

  • Use capture_output=True or stdout=PIPE/stderr=PIPE carefully; large outputs can fill OS buffers and cause deadlocks. For streaming, read incrementally from pipes.

  • Always handle exceptions: subprocess.CalledProcessError, TimeoutExpired, etc.

  • Prefer shutil.which() to check that an external program exists before calling it.

  • Set env= if you need a custom environment; by default the child inherits the parent’s environment.

  • Be mindful of cross-platform differences: command names and flags differ between Windows and Unix; detect OS with sys.platform and branch accordingly.

Common pitfalls

  • Deadlocks: reading stdout and stderr incorrectly with Popen can block. Use communicate() to avoid deadlocks when both pipes are used.

  • Shell injection: using shell=True with user input is dangerous.

  • Wrong encoding: use text=True (or encoding="utf-8") to get strings instead of bytes.

  • Platform mismatches: ls doesn't exist on Windows; dir is a shell builtin. Use portable alternatives or conditional logic.

  • Large outputs: capturing a multi-GB output in memory will crash your process. Stream instead.

When to use Popen vs run

  • Use subprocess.run() for simple commands where you want to wait for completion and optionally capture output. It's concise and safe.

  • Use subprocess.Popen when you need non-blocking interaction, streaming output, continuous processes, or piping between multiple processes under fine-grained control.

Alternative/related options

  • asyncio.create_subprocess_exec for asynchronous code (when using asyncio).

  • Third-party libraries that wrap process execution and add convenience features (logging, retries, richer streams), but for most tasks subprocess is enough.

Example use case: call an R script, read results in Python

import subprocess
import json

# Rscript writes JSON to stdout
proc = subprocess.run(["Rscript", "compute.R", "input.csv"], capture_output=True, text=True, check=True)
r_output = proc.stdout
result = json.loads(r_output)

Summary

subprocess is the go-to tool when your Python app needs to call out to other programs — whether those are binaries compiled from C/Go/Rust, scripts written in Perl/Ruby/Node/Python, or JVM programs. It gives control over IO, environment, and lifecycle, while being language-agnostic: if it can run on the system, Python can launch it, feed it data, and capture its output.

Start small: try subprocess.run(["echo", "hello"], capture_output=True, text=True) and build from there. Remember to avoid shell=True unless necessary, check for executables with shutil.which, and always handle exceptions. Happy process orchestration! 🚀

Comments