All the comparisons are with scripting and untyped languages perhaps for faster development and more intuitive eco-system to increase developer productivity.
In the age of vibe coding and AI assisted coding, does the choice of scripting and untyped language justifiable?
If you're building data system not just for exploratory, surely modern compiled and typed system languages like Rust and D language make more sense for robustness and stability for the end users?
Even more so with D language where you can even have scripting for exploratory stage with built-in REPL facility [1],[2]. This is feasible due to its very fast compile time unlike Rust, and its more intuitive syntax compared to other typed languages. You can also program with GC on by default [3].
Couldn't agree more. R and dplyrs ability to pass column names as unquoted objects actually reduces cognitive load for new people so much (pure anecdata, nothing to back this up except lots of teaching people).
And that's on top of the vastly simpler syntax compared to what's being shown here
I’ve built many different kinds of software (backend, frontend, 3D games, cli tools, code editor, and more) with Clojure and have been using it for over a decade now.
I can confidently say that, among the list I mentioned, it’s the best for data manipulation/transformation. Thanks to the author for presenting it clearly and showing how the libraries and code look across different languages, all of which do a great job.
But Clojure has its own special place (maybe in my heart as well :). I think Clojure should be used more in the data science space. Thanks to the JVM, it can be very performant (I’m looking at you, Python).
Agree. While it is common to see code like these pandas examples, it is very possible to write these manipulations so that they return a new frame or view without changing the inputs.
Clojure never got the data science crowd even though the language is genuinely good for it. Always felt like a distribution problem more than a technical one.
In this very post you can see why: the dplyr code is just so much more readable. Like a lot of python, dplyr reads almost like pseudocode: take this dataset, select the columns that start with "bill", then filter so that bill_length is less than 30. So simple and so little fluff!
Julia's Tidier.jl ecosystem is getting there too. It uses macros to mimic this 'special' evaluation framework of R, so the code is also readable in a similar way.
Unfortunately, having to mess around with a JVM is a tough sell for a lot of data analysis folks. I'm not saying it's rational or right, but a lot of people hear "JVM" and they go "no thank you". Personally I think it's a non-issue, but you have to meet people where they are.
I dunno, if you can slog through the Python ecosystem then the JVM is starting to look not so bad. Plus with Clojure you don't need to deal with the headache and heartache that is Maven.
Meanwhile, I find it very annoying to deal with the litany of Python versions and the distinction between global packages and user packages, and needing to manage virtual environments just to run scripts. That being said, I am not an expert but that's always been my experience when I need to do anything Python related.
idk, I don't think I've had to do anything beyond install the JVM to work with Clojure. I'm not really a fan of the clj commands flag choices though (-M, -X, etc. all make no sense)
Interesting perspective Clojure’s immutable, functional approach makes data wrangling feel very different from the more imperative style of R and Python.
All the comparisons are with scripting and untyped languages perhaps for faster development and more intuitive eco-system to increase developer productivity.
In the age of vibe coding and AI assisted coding, does the choice of scripting and untyped language justifiable?
If you're building data system not just for exploratory, surely modern compiled and typed system languages like Rust and D language make more sense for robustness and stability for the end users?
Even more so with D language where you can even have scripting for exploratory stage with built-in REPL facility [1],[2]. This is feasible due to its very fast compile time unlike Rust, and its more intuitive syntax compared to other typed languages. You can also program with GC on by default [3].
[1] drepl:
https://github.com/dlang-community/drepl
[2] Why I use the D programming language for scripting:
https://opensource.com/article/21/1/d-scripting
[3] All in on DLang: Why I pivoted to D for web, teaching, and graphics in 2025 and beyond! [PDF]
https://dconf.org/2025/slides/shah.pdf
Seems like it's going to be a tough sell to get people to want to write
instead of They seem to ignore the existance of Spark, so even if you specifically want to use JVM it feels clearer and simpler:Couldn't agree more. R and dplyrs ability to pass column names as unquoted objects actually reduces cognitive load for new people so much (pure anecdata, nothing to back this up except lots of teaching people).
And that's on top of the vastly simpler syntax compared to what's being shown here
I’ve built many different kinds of software (backend, frontend, 3D games, cli tools, code editor, and more) with Clojure and have been using it for over a decade now.
I can confidently say that, among the list I mentioned, it’s the best for data manipulation/transformation. Thanks to the author for presenting it clearly and showing how the libraries and code look across different languages, all of which do a great job.
But Clojure has its own special place (maybe in my heart as well :). I think Clojure should be used more in the data science space. Thanks to the JVM, it can be very performant (I’m looking at you, Python).
There was XLISP-STAT before R, but the scientists have spoken. They don't like the parentheses.
Having "NA" being treated as nil/null/None by default seems like it would cause the Namibia problem!
Good pandas and polars code should also be written in an immutable way...
Good python code can exist, but python makes it so easy to write bad code that good python rarely exists.
Agree. While it is common to see code like these pandas examples, it is very possible to write these manipulations so that they return a new frame or view without changing the inputs.
Clojure never got the data science crowd even though the language is genuinely good for it. Always felt like a distribution problem more than a technical one.
In this very post you can see why: the dplyr code is just so much more readable. Like a lot of python, dplyr reads almost like pseudocode: take this dataset, select the columns that start with "bill", then filter so that bill_length is less than 30. So simple and so little fluff!
Julia's Tidier.jl ecosystem is getting there too. It uses macros to mimic this 'special' evaluation framework of R, so the code is also readable in a similar way.
> is just so much more readable
I thought that too before I learned Clojure, now I find them equally readable.
Unfortunately, having to mess around with a JVM is a tough sell for a lot of data analysis folks. I'm not saying it's rational or right, but a lot of people hear "JVM" and they go "no thank you". Personally I think it's a non-issue, but you have to meet people where they are.
The irony given the mess of Python setup where there are companies whose business is to solve Python tooling.
I dunno, if you can slog through the Python ecosystem then the JVM is starting to look not so bad. Plus with Clojure you don't need to deal with the headache and heartache that is Maven.
Meanwhile, I find it very annoying to deal with the litany of Python versions and the distinction between global packages and user packages, and needing to manage virtual environments just to run scripts. That being said, I am not an expert but that's always been my experience when I need to do anything Python related.
idk, I don't think I've had to do anything beyond install the JVM to work with Clojure. I'm not really a fan of the clj commands flag choices though (-M, -X, etc. all make no sense)
I always wished Incanter took off.
Interesting perspective Clojure’s immutable, functional approach makes data wrangling feel very different from the more imperative style of R and Python.