Go for heavy tasks: when it beats Python and when it doesn't
Practical comparison of Go and Python in heavy scripts, concurrency, scraping, ETL and batch processes. When the switch is worth it.

I rewrote a Python scraper in Go and it went from 45 minutes to 3. Three minutes. The scraper crawled around 80,000 pages of a public catalogue, parsed the HTML and saved the results in a JSON file in batches. In Python, with requests and BeautifulSoup, it worked fine but was slow. Requests were made sequentially, the GIL limited any attempt at real parallelism and memory usage grew steadily. In Go, with goroutines, net/http and a worker pool, the same job finished in a fraction of the time with stable memory consumption.
But I also tried to rewrite a data pipeline that transformed CSVs with pandas, applied cleaning with numpy and generated reports with matplotlib. And I regretted it. It took me three times as long to write, the code was twice as long and the result was not significantly faster because the bottleneck was disk I/O, not CPU.
And honestly, this is the part nobody tells you when you read the benchmarks: Go is not better than Python at everything. It’s better at specific things. I think knowing when the switch is worth it is what separates a good technical decision from a rewrite that adds nothing.
Where Python starts to struggle
Python is an extraordinary language for prototyping, automation and data science. I have no doubt about that. But it has real limitations when you ask it for sustained performance on heavy tasks.
The GIL: the elephant in the room
The Global Interpreter Lock (GIL) is the mechanism that guarantees only one thread executes Python bytecode at a time. This simplifies the interpreter’s memory management, but has a direct consequence: multithreading in Python doesn’t take advantage of multiple cores for CPU-bound work.
import threading
import time
def cpu_heavy(n):
"""Simulates CPU-bound work."""
total = 0
for i in range(n):
total += i * i
return total
start = time.time()
threads = []
for _ in range(4):
t = threading.Thread(target=cpu_heavy, args=(50_000_000,))
threads.append(t)
t.start()
for t in threads:
t.join()
print(f"4 threads: {time.time() - start:.2f}s")
# Sequential comparison
start = time.time()
for _ in range(4):
cpu_heavy(50_000_000)
print(f"Sequential: {time.time() - start:.2f}s")Run this and you’ll see the version with 4 threads takes the same or longer than the sequential one. The GIL serializes CPU-bound work. You can use multiprocessing, yes, but that has serialization overhead between processes and consumes much more memory because each process copies the full interpreter.
For I/O-bound work (HTTP requests, file reads), asyncio and aiohttp are good solutions. But when you mix I/O with significant processing of the received data, Python’s model starts to creak.
Deployment: the dependency hell
Anyone who has deployed Python scripts in production knows the ritual:
- You create a
venvor usepoetry/pipenv/uv. - You install dependencies. Some need C compilers because they’re wrappers for native libraries.
- You make sure the Python version on the server matches the development one.
- You package everything in a Docker container to keep your sanity.
- The Docker image weighs 800 MB because it includes the interpreter, dependencies and system libraries.
It works, but it has friction. For a production service with CI/CD, it’s manageable. But for a batch processing script that you need to run on 10 different servers, the situation changes. That’s where it starts to hurt.
Memory consumption
Python is not memory-efficient. A dict in Python consumes significantly more memory than an equivalent structure in a statically typed language. When you process millions of records in memory, this matters. I’ve seen batch processing scripts in Python that needed 8 GB of RAM for a job that Go did with 500 MB.
Where Go makes the difference
Go is not a pretty language. It doesn’t have the expressiveness of Python, the ergonomics of Kotlin, or the power of Rust’s type system. I’m not going to pretend otherwise. But for certain heavy tasks, it has real technical advantages worth understanding.
Native concurrency with goroutines
Goroutines are lightweight threads managed by Go’s runtime. They’re cheap to create (a few KB of initial stack), the scheduler distributes them among available cores and communication between them happens with channels. No GIL. No serialization overhead between processes. They simply work.
package main
import (
"fmt"
"sync"
"time"
)
func cpuHeavy(n int) int {
total := 0
for i := 0; i < n; i++ {
total += i * i
}
return total
}
func main() {
start := time.Now()
var wg sync.WaitGroup
for i := 0; i < 4; i++ {
wg.Add(1)
go func() {
defer wg.Done()
cpuHeavy(50_000_000)
}()
}
wg.Wait()
fmt.Printf("4 goroutines: %v\n", time.Since(start))
// Sequential comparison
start = time.Now()
for i := 0; i < 4; i++ {
cpuHeavy(50_000_000)
}
fmt.Printf("Sequential: %v\n", time.Since(start))
}Here you’ll see a real difference. The 4 goroutines run in parallel on different cores. On a 4-core machine, the time with goroutines will be approximately one quarter of the sequential time. In Python, remember, it was the same.
If you’re coming from Python and want to understand the concurrency model in Go well, there’s a complete guide. But the summary is: concurrency in Go is a first-class citizen of the language, not a patch on top of a runtime that wasn’t designed for it.
Single binary: compile and run
GOOS=linux GOARCH=amd64 go build -o scraper .
scp scraper server:/usr/local/bin/
ssh server "scraper --config /etc/scraper.yaml"That’s it. A static binary that includes everything it needs. No runtime, no dependencies, no mandatory containers. It works on any machine with the same OS and architecture. For command-line tools, batch processing scripts and utilities you need to distribute to multiple servers, this is a paradigm shift compared to Python.
Memory efficiency
Go uses structs with fixed memory layout. No dictionary overhead, no boxing of primitive types, no generational garbage collector like Python’s (Go uses a concurrent mark-and-sweep GC with very low pauses). For processing large volumes of data in memory, the difference is significant.
Practical comparison: HTTP scraping
Let’s go to the case I mentioned at the beginning. Imagine you need to scrape 10,000 URLs, parse the HTML and extract specific data.
Python with asyncio and aiohttp
import asyncio
import aiohttp
from bs4 import BeautifulSoup
async def fetch_and_parse(session, url):
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
if resp.status == 200:
html = await resp.text()
soup = BeautifulSoup(html, "lxml")
title = soup.find("title")
return {"url": url, "title": title.text if title else ""}
except Exception as e:
return {"url": url, "error": str(e)}
async def main():
urls = [f"https://example.com/page/{i}" for i in range(10_000)]
connector = aiohttp.TCPConnector(limit=50)
async with aiohttp.ClientSession(connector=connector) as session:
semaphore = asyncio.Semaphore(50)
async def bounded_fetch(url):
async with semaphore:
return await fetch_and_parse(session, url)
results = await asyncio.gather(*[bounded_fetch(u) for u in urls])
print(f"Processed: {len(results)}")
asyncio.run(main())This code works well. asyncio is efficient for concurrent I/O. But there are nuances:
BeautifulSoupis synchronous. HTML parsing blocks the event loop. With 10,000 pages, that time accumulates.- If parsing is heavy (extracting multiple elements, navigating the DOM), you’re CPU-bound inside an async model.
- Memory consumption grows because
asyncio.gatherkeeps all coroutines and their results in memory.
Go with goroutines and worker pool
package main
import (
"fmt"
"io"
"net/http"
"strings"
"sync"
"time"
"golang.org/x/net/html"
)
type Result struct {
URL string
Title string
Err error
}
func extractTitle(body io.Reader) string {
tokenizer := html.NewTokenizer(body)
inTitle := false
for {
tt := tokenizer.Next()
switch tt {
case html.ErrorToken:
return ""
case html.StartTagToken:
t := tokenizer.Token()
if t.Data == "title" {
inTitle = true
}
case html.TextToken:
if inTitle {
return strings.TrimSpace(tokenizer.Token().Data)
}
}
}
}
func worker(id int, urls <-chan string, results chan<- Result, wg *sync.WaitGroup, client *http.Client) {
defer wg.Done()
for url := range urls {
resp, err := client.Get(url)
if err != nil {
results <- Result{URL: url, Err: err}
continue
}
title := extractTitle(resp.Body)
resp.Body.Close()
results <- Result{URL: url, Title: title}
}
}
func main() {
start := time.Now()
urls := make([]string, 10_000)
for i := range urls {
urls[i] = fmt.Sprintf("https://example.com/page/%d", i)
}
urlChan := make(chan string, 100)
results := make(chan Result, 100)
client := &http.Client{Timeout: 10 * time.Second}
var wg sync.WaitGroup
for i := 0; i < 50; i++ {
wg.Add(1)
go worker(i, urlChan, results, &wg, client)
}
// Send URLs
go func() {
for _, u := range urls {
urlChan <- u
}
close(urlChan)
}()
// Close results when workers finish
go func() {
wg.Wait()
close(results)
}()
var collected []Result
for r := range results {
collected = append(collected, r)
}
fmt.Printf("Processed: %d in %v\n", len(collected), time.Since(start))
}The key difference here is not just the speed of HTTP requests (both can do 50 concurrent). It’s that HTML parsing in Go happens in real parallel inside each goroutine, while in Python BeautifulSoup parsing serializes work on the event loop. With 10,000 pages, that accumulated parsing makes an important difference.
In my real tests with a catalogue scraper:
| Metric | Python (asyncio) | Go (worker pool) |
|---|---|---|
| Total time | ~45 min | ~3 min |
| Peak memory | ~1.2 GB | ~180 MB |
| CPU used | 1 core (GIL) | 4 cores |
| Deployable size | Docker ~800 MB | Binary ~12 MB |
The numbers vary depending on the case, but the ratio is representative. If your scraper has a significant parsing phase, Go will be faster because it truly parallelises.
Practical comparison: CSV/JSON file processing
Another common case: reading a CSV with several million rows, transforming the data and writing the result.
Python with standard csv
import csv
import json
from datetime import datetime
def process_csv(input_path, output_path):
results = []
with open(input_path, "r") as f:
reader = csv.DictReader(f)
for row in reader:
# Transformation: clean, convert types, filter
amount = float(row["amount"])
if amount <= 0:
continue
results.append({
"id": row["id"],
"amount": round(amount * 1.21, 2), # Apply VAT
"date": datetime.strptime(row["date"], "%Y-%m-%d").isoformat(),
"category": row["category"].strip().lower(),
})
with open(output_path, "w") as f:
json.dump(results, f)
print(f"Processed: {len(results)} records")Go equivalent
package main
import (
"encoding/csv"
"encoding/json"
"fmt"
"math"
"os"
"strconv"
"strings"
"time"
)
type Record struct {
ID string `json:"id"`
Amount float64 `json:"amount"`
Date string `json:"date"`
Category string `json:"category"`
}
func processCSV(inputPath, outputPath string) error {
f, err := os.Open(inputPath)
if err != nil {
return err
}
defer f.Close()
reader := csv.NewReader(f)
headers, err := reader.Read()
if err != nil {
return err
}
// Map column indices
idx := make(map[string]int)
for i, h := range headers {
idx[h] = i
}
var results []Record
for {
row, err := reader.Read()
if err != nil {
break
}
amount, _ := strconv.ParseFloat(row[idx["amount"]], 64)
if amount <= 0 {
continue
}
date, _ := time.Parse("2006-01-02", row[idx["date"]])
results = append(results, Record{
ID: row[idx["id"]],
Amount: math.Round(amount*1.21*100) / 100,
Date: date.Format(time.RFC3339),
Category: strings.ToLower(strings.TrimSpace(row[idx["category"]])),
})
}
out, err := os.Create(outputPath)
if err != nil {
return err
}
defer out.Close()
return json.NewEncoder(out).Encode(results)
}
func main() {
start := time.Now()
if err := processCSV("data.csv", "output.json"); err != nil {
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
os.Exit(1)
}
fmt.Printf("Processed in %v\n", time.Since(start))
}For a CSV with 5 million rows, Go will be faster (typically 2-4x) and use less memory. But the interesting question is different: for this case, is it worth it?
If your CSV processing script runs once a day as a cron job, and Python does it in 2 minutes, I don’t think there’s any reason to rewrite it in Go so it takes 40 seconds. The Python code is shorter, easier to modify and anyone on the team with data knowledge understands it.
But the situation changes if you process 50-million-row files every hour and the Python process takes 30 minutes and consumes 12 GB of RAM. There the conversion to Go starts to make real economic sense: less compute time, smaller instances, lower infrastructure cost.
When Go does NOT help: Python’s data ecosystem
And here comes the part that I think Go evangelists tend to omit, and which is important to say clearly. For data, machine learning and analysis tasks, Python has no rival and Go is not a viable alternative.
pandas, numpy and the scientific ecosystem
import pandas as pd
df = pd.read_csv("sales.csv")
# In one line: group, aggregate, filter and sort
summary = (
df.groupby(["region", "product"])
.agg(total=("amount", "sum"), orders=("id", "count"))
.query("total > 10000")
.sort_values("total", ascending=False)
)Trying to do this in Go requires implementing the grouping manually, writing the aggregation functions, managing the types of each column. It’s perfectly possible, but the code will be 10 times longer and won’t deliver significantly better performance because pandas and numpy use C and Fortran under the hood. numpy’s vectorized operations are not limited by the GIL. When you use df.groupby().sum(), the heavy work is done by optimized C code, not Python.
Machine learning and AI
There’s no Go equivalent of scikit-learn, PyTorch, TensorFlow, Hugging Face or any mature ML framework. There are projects like gonum for linear algebra and gorgonia for neural networks, but they’re years behind Python’s ecosystem. If your heavy task involves training models, doing inference or processing data for ML, the conversation starts and ends in Python.
ETL with complex transformations
For ETL where the transformation requires complex business logic, joins between datasets, data cleaning with domain-specific rules, Python with pandas (or polars, if you need more performance) is more productive. The time you save in execution with Go you lose many times over in development and maintenance time.
The general rule: if the bottleneck is vectorized computation or the library ecosystem, stay in Python. If the bottleneck is concurrency, I/O or deployment, consider Go.
The hybrid approach: the best of both worlds
In practice, and this is something I’ve taken time to accept because you always want a clean stack, the best solution is usually to combine both languages. You don’t have to choose one or the other for everything.
Pattern 1: orchestration in Python, workers in Go
# orchestrator.py
import subprocess
import json
def run_go_worker(input_file, output_file):
"""Calls the Go binary for heavy work."""
result = subprocess.run(
["./go-processor", "--input", input_file, "--output", output_file],
capture_output=True, text=True
)
if result.returncode != 0:
raise RuntimeError(f"Worker failed: {result.stderr}")
return json.loads(result.stdout)
# Python handles the orchestration logic
files = get_pending_files() # Query to DB, S3, etc.
for f in files:
stats = run_go_worker(f.input_path, f.output_path)
update_status(f.id, stats) # Update DB with results
notify_if_needed(f, stats) # Send Slack, email, etc.Python does what it does best: orchestrate, integrate with services, apply business logic. Go does what it does best: process data quickly with real parallelism.
Pattern 2: API in Go, analysis in Python
If you have a service that receives data in real time and needs to process it with low latency, the API can be in Go. But when you need to analyze historical data, generate reports or train models, that part lives in Python.
[Clients] → [Go API: reception + validation + storage]
↓
[Database]
↓
[Python scripts: analysis + reports + ML]Pattern 3: prototype in Python, rewrite the bottleneck in Go
This is my favorite pattern, and probably the one that has given me the most value. I write the complete script in Python first. I run it, measure it, identify where the real bottleneck is (not the one I imagine, the real one). If that bottleneck is something Go solves well (concurrency, massive I/O, heavy parsing), I rewrite it in Go. If not, I leave it in Python and optimize there.
The key is measure before rewriting. And honestly, I’ve seen teams (myself included) rewrite an entire system in Go because “Python is slow” and discover that the bottleneck was a poorly optimized SQL query that took the same time in any language.
The deployment story: pip+venv+Docker vs binary
This is a point that, in my experience, many people underestimate until they suffer it. But for heavy tasks that run in batch, deployment matters a lot.
The Python path
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "process_batch.py"]# docker-compose.yml or task definition in ECS/K8s
services:
batch-processor:
build: .
volumes:
- ./data:/data
environment:
- DB_URL=postgresql://...
- S3_BUCKET=my-bucketIt works. But:
- The image weighs 300-800 MB depending on dependencies.
- The build takes minutes if there are dependencies with native compilation.
- Every time you update a dependency, you rebuild the layers.
- If you need to run it outside Docker (on a bare-metal server, in a cron on a development machine), you have to manage the Python environment.
The Go path
# Makefile
build:
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o bin/processor ./cmd/processor
deploy:
scp bin/processor server:/opt/batch/processor
ssh server "systemctl restart batch-processor"Or if you prefer Docker, the multi-stage build is minimal:
FROM golang:1.23 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go download
COPY . .
RUN CGO_ENABLED=0 go build -o /processor ./cmd/processor
FROM scratch
COPY --from=builder /processor /processor
ENTRYPOINT ["/processor"]Final image: less than 15 MB. No operating system, no runtime, no dependencies. The binary is all you need.
For a batch process that you need to run on multiple servers, the difference is enormous. With Go, you copy a file and you’re done. With Python, you need to replicate the environment on each machine or use Docker everywhere.
Decision framework: when to rewrite and when not to
After having made this transition in several projects, I’ve developed a criterion that tries to be pragmatic. I don’t claim it’s a universal rule, but it works as a starting point.
Rewrite in Go when
- The bottleneck is CPU concurrency. Your script needs to do real parallel work (parsing, compressing, transforming) on multiple cores. Python’s GIL prevents it and
multiprocessinghas too much overhead. - The bottleneck is massive I/O with processing. You need to make thousands of HTTP requests, read hundreds of files or process data streams, and also do significant work with each result.
asynciohandles the I/O well, but processing is still sequential. - Deployment is a pain. You need to distribute the tool to multiple servers, run it in heterogeneous environments or minimize the production footprint. A static binary simplifies everything.
- Memory consumption is a problem. Your Python script needs large (and expensive) instances just because of interpreter and data structure overhead. Go uses a fraction of the memory for the same work.
- Latency matters. If it’s a service that must respond in milliseconds, Go has a clear advantage over Python due to its instant startup time and predictable performance.
Don’t rewrite when
- You use pandas, numpy or any vectorized computing library. These libraries are already optimized in C/Fortran. You won’t gain significant performance by rewriting in Go.
- Business logic is complex and changes frequently. Python is faster to iterate. If the script changes every week, development speed matters more than execution speed.
- The team doesn’t know Go. And this is something that’s sometimes overlooked: a Python script that everyone on the team can maintain is better than a Go script that only one person understands. The technical debt of a language nobody masters is worse than a few extra minutes of execution.
- The bottleneck is not in the code. If your script takes 10 minutes but 9 are waiting for network or database queries, rewriting in Go saves you one minute. Not worth it.
- The script is simple and runs infrequently. A cron that runs once a day and takes 5 minutes in Python doesn’t need optimization. The rewrite cost never pays off.
The question you should always ask yourself
If I rewrite this in Go, how much compute time do I save per month and what does that time cost versus the development hours of the rewrite?
If the monthly savings don’t cover the rewrite cost in less than 3-6 months, it’s probably not worth it. Unless there are other factors (reliability, deployment, maintainability) that tip the balance.
What moving heavy tasks from Python to Go has taught me
I think Go and Python don’t compete, even though they’re sometimes presented that way. They solve different problems in different ways. The general comparison between Go and Python covers the differences in philosophy and broad use cases. This article focuses on the specific case of heavy tasks, where the decision has direct impact on costs and infrastructure.
After moving several heavy tasks from Python to Go, the picture I’m left with is, I think, quite clear. Although I acknowledge every case has its nuances. For scrapers and crawlers, Go wins clearly: real concurrency, parallel parsing and a deployable binary without dependencies. For batch processing of files, Go pays off when the volume is high and the work per row is significant, but if it’s simple transformation, Python with polars or DuckDB is still sufficient. CLI tools for distribution are another case where Go’s static binary is hard to beat.
On the other hand, for ETL with complex business logic, Python remains my choice. The productivity of pandas and the flexibility of the language compensate for the performance difference. And for ML or data science pipelines, there’s no discussion: Python’s ecosystem has no rival in Go.
Don’t rewrite everything in Go because benchmarks say it’s faster. Measure your specific case, identify the real bottleneck and make the decision based on data, not hype. Not because Go isn’t a good tool. But because the decision to rewrite is expensive, and incremental migration almost always works better than a full rewrite. Start with the most critical component, not the entire system.


