My policy is to never let pipeline DSLs contain any actual logic outside orchestration for the task, relying solely on one-liner build or test commands. If the task is more complicated than a one-liner, make a script for it in the repo to make it a one-liner. Doesn't matter if it's GitHub Actions, Jenkins, Azure DevOps (which has super cursed yaml), etc.
This in turn means that you can do what the pipeline does with a one-liner too, whether manually, from a vscode launch command, a git hook, etc.
This same approach can fix the mess of path-specific validation too - write a regular script (shell, python, JS, whatever you fancy) that checks what has changed and calls the appropriate validation script. The GitHub action is only used to run the script on PR and to prepare the CI container for whatever the script needs, and the same pipeline will always run.
The reason why many CI configs devolve into such a mess isn't typically that they don't extract complicated logic into scripts, it's about all the interactions with the CI system itself. This includes caching, sharing of artifacts, generating reports, configuring permissions, ordering of jobs, deciding when which jobs will run, deciding what to do when jobs fail, etc. All of this can get quite messy in a large enough project.
It never becomes unbearably messy this way though.
The reason it gets unbearably messy is because most people google "how to do x in github actions" (e.g. send a slack message) and there is a way and it's almost always worse than scripting it yourself.
The reason it gets unbearably messy is that GitHub has constructed an ecosystem that encourages developers to write Turing complete imperative behavior into YAML without providing the same language constructs/tooling that a proper adult language provides to encourage code reuse and debugging.
Without tooling like this any sufficiently complex system is guaranteed to evolve into a spaghetti mess, because no sane way exists to maintain such a system at scale without proper tooling, which one would need to hand roll themselves against a giant, ever changing mostly undocumented black box proprietary system (GitHub Actions). Someone tried to do this, the project is called “act”. The results are described by the author in the article as “subpar”.
The only sane way to use GitHub Actions at scale is to take the subset of its features that you leverage to perform the execution (event triggers, runs-on, etc) and only use those features, and farm out all the rest of the work in something that is actually maintainable eg Buildkit, Bazel, Gradle etc
Which I feel is a recurring lesson in CI in general. CI systems have to be scriptable to get their job done because they can't anticipate every build system. With even terrible scriptability comes Turing completeness, because it is hard to avoid, a natural law. Eventually someone figures out how to make it do wild things. Eventually those wild things becomes someone's requirements in their CI pipeline. Eventually someone is blog posting about how terrible that entire CI system is because of how baroque their pipeline has become and how many crazy scripts it has that are hard to test and harder to fix.
Caching and sharing artifacts is usually the main culprit. My company has been using https://nx.dev/ for that. It works locally as well and CI and it just works.
Our NX is pointed to store artifacts in GHA, but our GHA scripts don't do any caching directly, it is all handled by NX. It works so well I would even consider pulling a nodejs environment to run it in non-nodejs projects (although I haven't tried, probably would run into some problems).
It is somewhat heavy on configuration, but it just moves the complexity from CI configuration to NX configuration (which is nicer IMO). Our CI pipelines are super fast if you don't hit one of one of our slow compilling parts of the codebase.
With NX your local dev environment can pull cached items that were built from previous CI ran-jobs or other devs. We have some native C++ dependencies that are kind of a pain to build locally, our dev machines can pull the built binaries built by other devs (since all devs and CI also share the same cache-artifacts storage). So it makes developing locally a lot easier as well, I don't even remember last time I had to build the native C++ stuff myself since I don't work on it.
Do you know the criteria used to pick the nx.dev? That is, do you pay for their Cloud, or do you do some plumbing yourselves to make it work on GitHub and other things?
Looks interesting. We’ve picked tools based on time saved without too much extra knowledge or overhead required, so this may prove promising.
To be honest, I wasn't the one who added it and have only occasionally done some small changes to the NX configuration. I don't think we pay for their Cloud, I think all our stored artifacts are stored in GHA caching system and we pull them using our github SSH keys. Although I don't know exactly how that was set up. The fact that someone set it up and I just started using it and it just works is a testament of how good it works.
NX is good because it does the caching part of CI in a way that works both locally and on CI. But of course it doesn't really help at all with the other points raised by the OP.
One interesting thing about NX as well is that it helps with you managing your own local build chain, like in the example I mentioned above, when I run a project that requires the C++ native dependency, that project gets built automatically (or rather my computer pulls the built binaries from the remote cache).
But for all of this to work you need to set up these dependency chains explicitly in your NX configuration, but that is formalizing an actual requirement instead of leaving it implicit (or in Makefiles or in scripts that only run in CI).
I do have to say that our NX configuration is quite long though, but I feel that once you start using NX it is just too tempting to split your project up in individual cacheable steps even if said steps are very fast to run and produce no artifacts. Although you don't have to.
For example we have separate steps for linting, typescript type-checking, code formatting, unit testing for each unique project in our mono-repo. In practice they could be all the same step because they all get invalidated at the same time (basically on any file change).
It works just fine - you have ci scripts for tests-group-1, test-group-2, and so on. You data collection will need to aggregate data from them all, but that is something most data collection systems have (and at least the ones I know of also allow individuals to upload their local test data thus meaning you can test that upload locally). If you break those test groups up right most developers will know which they should run as a result of their changes (if your tests are not so long that developers wouldn't run them all before pushing then you shouldn't shard anyway, though it may be reasonable to say your CI shards are still longer than what a developer would run locally)
I have had many tests which manipulate the global environment (integration tests should do this, though I'm not convinced the distinction between integration and unit tests is valuable) so the ability run the same tests as CI is very helpful in finding and verifying you fixed these.
I’ll go so far as to say the massive add on/plugin list and featuritis of CI/CD tools is actively harmful to the sanity of your team.
The only functionality a CI tool should be providing is:
- starting and running an environment to build shit in
- accurately tracking success or failure
- accurate association of builds with artifacts
- telemetry (either their own or integration) and audit trails
- correlation with project planning software
- scheduled builds
- build chaining
That’s a lot, but it’s a lot less than any CI tool made in the last 15 years does, and that’s enough.
There’s a big difference for instance between having a tool that understands Maven information enough to present a build summary, and one with a Maven fetch/push task. The latter is a black box you can’t test locally, and your lead devs can’t either, so when it breaks, it triggers helplessness.
If the only answer to a build failure is to stare at config and wait for enlightenment, you fucked up.
100%. The ci/cd job should be nothing more than a wrapper around the actual logic which is code in your repo.
I write a script called `deploy.sh` which is my wrapper for my ci/cd jobs. It takes options and uses those options to find the piece of code to run.
The ci/cd job can be parameterized or matrixed. The eventually-run individual jobs have arguments, and those are passed to deploy.sh. Secrets/environment variables are set from the ci/cd system, also parameterized/matrixed (or alternately, a self-hosted runner can provide deploy.sh access to a vault).
End result: from my laptop I can run `deploy.sh deploy --env test --modules webserver` to deploy the webserver to test, and the CI/CD job also runs the same job the same way. The only thing I maintain that's CI/CD-specific is the GitHub Action-specific logic of how to get ready to run `deploy.sh`, which I write once and never change. Thus I could use 20 different CI/CD systems, but never have to refactor my actual deployment code, which also always works on my laptop. Vendor lock-in is impossible, thanks to a little abstraction.
(If you have ever worked with a team with 1,000 Jenkins jobs and the team has basically decided they can never move off of Jenkins because it would take too much work to rewrite all the jobs, you'll understand why I do it this way)
Hey if you’ve never heard of it consider using just[0], it’s a better makefile and supports shell scripting explicitly (so at least equivalent in power, though so is Make)
The shell also supports shell scripting! You don't need Just or Make
Especially for Github Actions, which is stateless. If you want to reuse computation within their VMs (i.e. not do a fresh build / test / whatever), you can't rely on Just or Make
A problem with Make is that it literally shells out, and the syntax collides. For example, the PID in Make is $$$$, because it's $$ in shell, and then you have to escape $ as $$ with Make.
I believe Just has similar syntax collisions. It's fine for simple things, but when it gets complex, now you have {{ just vars }} as well as $shell_vars.
It's simpler to "just" use shell vars, and to "just" use shell.
Shell already has a lot of footguns, and both Just and Make only add to that, because they add their own syntax on top, while also depending on shell.
I don't typically use .PHONY as my targets aren't the same name as files and performance isn't an issue.
Here is an example of a "complex" Makefile I use to help manage Helm deployments (https://github.com/peterwwillis/devops-infrastructure/blob/m...). It uses canned recipes, functions (for loops), default targets, it includes targets and variables from other Makefiles, conditionally crafts argument lists, and more. (It's intended to be a "default" Makefile that is overridden by an additional Makefile.inc file)
I could absolutely rewrite that in a shell script, but I would need to add a ton of additional code to match the existing functionality. Lines of code (and complexity) correlates to bugs, so fewer lines of code = less bugs, so it's easier to maintain, even considering the Make-specific knowledge required.
They say "use the best tool for the job". As far as I've found, for a job like that, Make fits the best. If some day somebody completely re-writes all the functionality of Make in a less-obnoxious way, I'll use that.
> I don't typically use .PHONY as my targets aren't the same name as files and performance isn't an issue.
They are still phony at heart in this case, even if you don't declare them .PHONY.
Make really wants to produce files; if your targets don't produce the files they are named for you are going to run into trouble (or have to be rather careful to avoid the sharp edges).
> They say "use the best tool for the job". As far as I've found, for a job like that, Make fits the best. If some day somebody completely re-writes all the functionality of Make in a less-obnoxious way, I'll use that.
I discovered Just with a similar comment in Hacker News and I want to add my +1.
It is so much better to run scripts with Just than it is doing it with Make. And although I frankly tend to prefer using a bash script directly (much as described by the parent commenter), Just is much less terrible than Make.
Now the only problem is convincing teams to stop following the Make dogma, because it is so massively ingrained and it has so many probems and weirdnesses that just don't add anything if you just want a command executor.
The PHONY stuff, the variable scaping, the every-line-is-a-separate-shell, and just a lot of stuff that don't help at all.
Make has a lot of features that you don't use at first, but do end up using eventually, that Just does not support (because it's trying to be simpler). If you learn it formally (go through the whole HTML manual) it's not hard to use, and you can always refer back to the manual for a forgotten detail.
I don't understand why this is not the evident approach for everyone writing GitHub Actions/GitLab CI/CD yaml etc....
I've struggled in some teams to explained why it's better to extract your command in scripts (ShellCheck on it, scripts are simple to run locally etc...) instead of writing a Frankenstein of YAML and shell commands. I hope someday to find an authoritative guidelines on writing pipeline that promote this approach so at least I can point to this link instead of defending myself being a dinosaur!
In a previous job we had a team tasked with designing these "modern" CI/CD pipeline solutions, mostly meant for Kubernetes, but it was suppose to work for everything. They had such a hard on for tools that would run each step as a separate isolated task and did not want pipelines to "devolve" into shell scripts.
Getting anything done in such environments are just a pain. You spend more time fighting the systems than you do actually solving problems. It is my opinion that a CI/CD system needs just the following features: Triggers (source code repo, http endpoints or manually triggered), secret management and shell script execution. That's it, you can build anything using that.
I think what they really wanted was something like bazel. The only real benefit I can think right now for not "devolving" into shell scripts is distributed caching with hermetic builds. It has very real benefits but it also requires real effort to work correctly.
I just joined as the enterprise architect for company that has never had one. There is an existing devops team that is making everyone pull their hair out and I haven't had a single spare minute to dig in on their mess but this sounds early familiar.
The job of senior people should mostly be to make sure the organisation runs smoothly.
If no one else is doing anything about the mess, then it falls to the senior person to sort it out.
As a rule of thumb:
- Ideally your people do the Right Thing by themselves by the magic of 'leadership'.
- Second best: you chase the people to do the Right Thing.
- Third: you as the senior person do the Right Thing.
- Least ideal: no one fixes the mess nor implements the proper way.
I guess some people can achieve the ideal outcome with pure charisma (or fear?) alone, but I find that occasionally getting your hands dirty (option 3) helps earn the respect to make the 'leadership' work. It can also help ground you in the reality of the day to day work.
However, you are right that a senior person shouldn't get bogged down with such work. You need to cut your losses at some point.
Where I work, which granted is a very large company, the enterprise architects focus on ERP processes, logistic flows, how prices flow from the system where they are managed to the places that need them, and so on. They are several levels removed from devops teams. DevOps concerns are maybe handled by tech leads, system architects or technical product managers.
Makes sense. From datavirtue's comment is sounded like they joined a much smaller outfit without much in terms of established _working_ procedures here.
Mostly agreed, but (maybe orthogonal) IME, popular CI/CD vendors like TeamCity* can make even basic things like shell script execution problematic.
* TC offers sh, full stop. If you want to script something that depends on bash, it's a PITA and you end up with a kludge to run bash in sh in docker in docker.
Your "docker in docker" comment makes me wonder if you're conflating the image that you are using, that just happens to be run by TeamCity, versus some inherent limitation of TC. I believe a boatload of the Hashicorp images ship with dash, or busybox or worse, and practically anything named "slim" or "alpine" similarly
My "favorite" is when I see people go all in, writing thousands of lines of Jenkins-flavor Groovy that parses JSON build specifications of arbitrary complexity to sort out how to build that particular project.
"But then we can reuse the same pipeline for all our projects!"
> "But then we can reuse the same pipeline for all our projects!"
oh god just reading that gave me PTSD flash backs.
At $priorGig there was the "omni-chart". It was a helm chart that was so complex it needed to be wrapped in terraform and used composable terraform modules w/ user var overrides as needed.
Debugging anything about it meant clearing your calendar for the day and probably the following day, too.
I think I can summarize it in a rough, general way.
CI/CD is a method to automate tasks in the background that you would otherwise run on your laptop. The output
of the tasks are used as quality gates for merging commits, and for deployments.
- Step 1. Your "laptop in the cloud" requires some configuration (credentials, installed software, cached artifacts)
before a job can be run.
- Requires logic specific to the CI/CD system
- Step 2. Running many jobs in parallel, passing data from step to step, etc requires some instructions.
- Requires logic specific to the CI/CD system
- Step 3. The job itself is the execution of a program (or programs), with some inputs and outputs.
- Works the same on any computer (assuming the same software, environment, inputs, etc)
- Using a container in Step 1. makes this practical and easy
- Step 4. After the job finishes, artifacts need to be saved, results collected, and notifications sent.
- Some steps are specific to the CI/CD system, others can be a reusable job
Step 3 does not require being hard-coded into the config format of the CI/CD system. If it is instead
just executable code in the repo, it allows developers to use (and work on) the code locally without
the CI/CD system being involved. It also allows moving to a different CI/CD system without ever rewriting
all the jobs; the only thing that needs to be rewritten are the CI/CD-specific parts, which should be
generic and apply to all jobs pretty much the same.
Moving the CI/CD-specific parts to a central library of configuration allows you to write some code
once and reuse it many times (making it DRY). CircleCI Orbs, GitHub Actions, Jenkins Shared Libraries/
Groovy Libraries, etc are examples of these. Write your code once, fix a bug once, reuse it everywhere.
To make the thing actually fast at scale, a lot of the logic ends up being specific to the provider; requiring tokens, artifacts etc that aren't available locally. You end up with something that tries to detect if you're running locally or in CI, and then you end up in exactly the same situation.
You are right, and this is where a little bit of engineering comes in. Push as much of the logic to scripts (either shell or python or whatever) that you can run locally. Perhaps in docker, whatever. All the token, variables, artifacts etc should act as inputs or parameters to your scripts. You have several mechanisms at your disposal, command line arguments, environment variables, config files, etc. Those are all well understood, universal, language and environment agnostic, to an extent.
The trick is to NOT have your script depend on the specifics of the environment, but reverse the dependency. So replace all `If CI then Run X else if Local Run Y` with the ability to configure the script to run X or Y, then let the CI configure X and local configure Y. For example.
I'm not saying it is always easy and obvious. For bigger builds, you often really want caching and have shitloads of secrets and configurations going on. You want to only build what is needed, so you need something like a DAG. It can get complex fast. The trick is making it only as complex as it needs be, and only as reusable as and when it is actually re-used.
> The trick is to NOT have your script depend on the specifics of the environment, but reverse the dependency. So replace all `If CI then Run X else if Local Run Y` with the ability to configure the script to run X or Y, then let the CI configure X and local configure Y. For example.
> I'm not saying it is always easy and obvious. For bigger builds, you often really want caching and have shitloads of secrets and configurations going on.
Here's the thing. When you don't want secrets, or caching, or messaging, or conditional deploys, none of this matters. Your build can be make/mvn/go/cargo and it just works, and is easy. It only gets messy when you want to do "detect changes since last run and run tests on those components", or "don't build the moon and the stars, pull that dependency in CI, as local users have it built already and it won't change." And the way to handle those situations involves running different targets/scripts/whatever and not what is actually in your CI environment.
I've lost count of how many deploys have been marked as failed in my career because the shell script for posting updates to slack has an error, and that's not used in the local runs of CI.
What you _actually_ need is a staging branch of your CI and a dry-run flag on all of your state modifying commands. Then, none of this matters.
A shell script has many extremely sharp edges like dealing with stdin, stderr, stdout, subprocesses, exit codes, environmental variables, etc.
Most programmers have never written a shell script and writing CI files is already frustrating because sometimes you have to deploy, run, fix, deploy, run, fix, which means nobody is going to stop in the middle of that and try to learn shell scripting.
Instead, they copy commands from their terminal into the file and the CI runner takes care of all the rough edges.
I ALWAYS advise writing a shell script but I know it's because I actually know how to write them. But I guess that's why some people are paid more big bux.
GitHub's CI yaml also accepts eg Python. (Or anything else, actually.)
That's generally a bit less convenient, ie it takes a few more lines, but it has significantly fewer sharp edges than your typical shell script. And more people have written Python scripts, I guess?
I find that Python scripts that deal with calling other programs have even more sharp edges because now you have to deal with stdin, stderr and stdout much more explicitly. Now you need to both know shell scripting AND Python.
Python’s subprocess has communicate(), check_output() and other helpers which takes care of a lot but (a) you need to know what method you should actually call and (b) if you need to do something outside of that, you have to use Popen directly and it’s much more work than just writing a shell script. All doable if you understand pipes and all that but if you don’t, you’ll be just throwing stuff at the wall until something sticks.
1) If possible, don't run shell scripts with Python. Evaluate why you are trying to do that and don't.
2) Python has a bunch of infrastructure compared to shell, you can use it. Shell scripts don't.
3) Apply the same you used for the script to what it calls. CI calls control script for job, script calls tools/libraries for heavy lifting.
Often the shell script just calls a python/Ruby/rust exec anyways...
Shell scripts are for scripting...the shell. Nothing else.
Your average person will be blind sighted either way, at least one way they have a much better set of tools to help them out, once they are blind sighted.
Yes, but at least that's all fairly obvious---you might not know how to solve the problem, but at least you know you have a problem that needs solving. Compare that to the hidden pitfalls of eg dealing with whitespace in filenames in shell scripts. Or misspelled variable names that accidentally refer to non-existent variables but get treated as if they are set to be "".
This all reminds me of the systemd ini-like syntax vs shell scripts debate. Shell scripts are superior, of course, but they do require deeper knowledge of unix-like systems.
I've been working with Linux since I was 10 (I'm much older now), and I still don't think I "know Linux". The upper bound on understanding it is incredibly high. Where do you draw the line?
> [...] instead of writing a Frankenstein of YAML and shell commands.
The 'Frankenstein' approach isn't what makes it necessarily worse. Eg Makefiles work like that, too, and while I have my reservations about Make, it's not really because they embed shell scripts.
it can be quite hard to write proper scripts that work consistently... different shells have different behaviours, availability of local tools, paths, etc
and it feels like fighting against the flow when you're trying to make it reusable across many repos
If you aim for quicker turn-around (eg. just running a single test in <1s), you'll have to either aggressively optimize containers (which is pretty non-idiomatic with Docker containers in particular), or do away with them.
> I've rarely seen a feedback loop with containers that's not longer than 10s only due to containerization itself
Sounds like a skill issue tbh.
`time podman run —-rm -it fedora:latest echo hello` will return in a few milliseconds, whatever delay you are complaining about would be from the application running in the container.
I am talking about either, because the GP post was about "containerizing a build environment": you need your project built to either run it in CI or locally.
Why would it be slow?
It needs to be rebuilt? (on a fast moving project with mid-sized or large team, you'll get dependency or Dockerfile changes frequently)
It needs to restart a bunch of dependent services?
Container itself is slow to initialize?
Caching of Docker layers is tricky, silly (you re-arrange a single command line and poof, it's invalidated, including all the layers after) and hard to make the most of.
If you can't get a single test running in <1s, you are never going to get a full test suite running in a couple of seconds, and never be able to do an urgent deploy in <30s.
That might be the case if Docker did in fact guarantee (or at least make it easy to guarantee) deterministic builds -- but it doesn't really even try:
1. Image tags ("latest", etc.) can change over time. Does any layer in your Dockerfile -- including inside transitive deps -- build on an existing layer identified by tag? If so, you never had reproducibility.
2. Plenty of Dockerfiles include things like "apt-get some-tool" or its moral equivalent, which will pull down whatever is the latest version of that tool.
It's currently common and considered normal to use these "features". Until that changes, Docker mostly adds only the impression of reproducibility, but genuine weight and pain.
The advantage that Docker brings isn't perfect guaranteed reusability, it's complete independence (or as close you can easily get while not wasting resources on VMs) from the system on which the build is running, plus some resuability in practice in a certain period of time, and given other good practices.
Sure, if I try to rebuild a docker image from 5 years ago, it may fail, or produce something different, because it was pulling in some packages that have changed significantly in apt, or perhaps pip has changed encryption and no longer accepts TLS1.1 or whatever. And if you use ':latest' or if your team has a habit of reusing build numbers or custom tags, you may break the assumptions even sooner.
But even so, especially if using always incremented build numbers as image tags, a docker image will work the same way om the Jenkins system, the GitHub Actions pipeline, and every coworker's local build, for a while.
1. Use a hash for the base images.
2. The meaning of “latest” is dependent on what base images you are using. Using UBI images for example means your versions are not going to change because redhat versions don’t really change.
But really containerizing the build environment is not related to deterministic builds as there’s a lot more work needed to guarantee that. Including possible changes in the application itself.
Lastly you don’t need to use docker or dockerfiles to build containers. You can use whatever tooling you want to create a rootfs and then create an image out of that. A nifty way of actually guaranteeing reproducibility is to use nix to build a container image.
They didn't say reproducibility, they said determinism.
If you use a container with the same apt-get commands and the same OS, you already separate out almost all the reproducibility issues. What you get will be a very similar environment.
But it's not deterministic as you point out, that takes a lot more effort.
You know, I'd love to run MSVC in Wine on a ubuntu container. I bet it would be quicker.
I've had the unfortunate pleasure of working with Windows Containers in the past. They are Containers in the sense that you can write a dockerfile for them, and they're somewhat isolated, but they're no better than a VM in my experience.
> And in any case, you can use VMs instead of containers.
They're not the same thing. If that's the case running linux AMI's on EC2 should be the same as containers.
It’s the same thing for the purposes of capturing the build environment.
It doesn’t really matter if you have to spin up a Linux instances and then run your build environment as a container there vs spinning up a windows VM.
Only if your tech stack is bad (i.e. Python). My maven builds work anywhere with an even vaguely recent maven and JVM (and will fail-fast with a clear and simple error if you try to run them in something too old), no need to put an extra layer of wrapping around that.
It's trivial to control your Python stack with things like virtualenv (goes back to at least 2007) and has been for ages now (I don't really remember the time when it wasn't, and I've been using Python for 20+ years).
What in particular did you find "bad" with Python tech stack?
(I've got my gripes with Python and the tooling, but it's not this — I've got bigger gripes with containers ;-))
> What in particular did you find "bad" with Python tech stack?
Stateful virtualenvs with no way to check if they're clean (or undo mistakes), no locking of version resolution (much less deterministic resolution), only one-way pip freeze that only works for leaf projects (and poorly even then), no consistency/standards about how the project management works or even basic things like the directory layout, no structured unit tests, no way to manage any of this stuff because all the python tooling is written in python so it needs a python environment to run so even if you try to isolate pieces you always have bootstrap problems... and most frustrating of all, a community that's ok with all this and tries to gaslight you that the problems aren't actually problems.
Sounds a lot like nitpicking, and I'll demonstrate why.
With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right? You resolve both by recreating them from scratch (and you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally).
The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.
As for directory layout, it's pretty clear it's guided by Python import rules: those are tricky, but once you figure them out, you know what you can and should do.
Can you clarify what do you mean with "structured unit tests"? Python does not really limit you in how you organize them, so I am really curious.
Sure, a bootstrapping problem does exist, but rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground, after which you can easily control all the deps in them (again a requirements-dev.txt.in + requirements-dev.txt pattern will help you).
And there's a bunch of new dev tools springing up recently that are written in Rust for Python, so even that points at a community that constantly works to improve the situation.
I am sorry that you see this as "gaslighting" instead of an opportunity to learn why someone did not have the same negative experience.
> With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right?
I guess theoretically you could, but I don't think that's part of anyone's normal workflow. Whereas it's extremely easy to run "pip install" from project A's directory with project B's virtualenv active (or vice versa). You might not even notice you've done it.
> You resolve both by recreating them from scratch
But with Docker you can wipe the container and start again from the image, which is fixed. You don't have to re-run the Dockerfile and potentially end up with different versions of everything, which is what you have to do with virtualenv (you run pip install and get something completely different from the virtualenv you deleted).
> you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally
But you have to undo it every time you want to add or update a dependency. In other ecosystems it's easy to keep my dependencies in line with what's in the equivalent of requirements.txt, but hard to install some random other unmanaged dependency. In the best ecosystems there's no need to "install" your dependencies at all, you just always have exactly the packages listed in the requirements.txt equivalent available at runtime when you run things.
> The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.
I've literally never seen a project that does that. And even if you do that, it's still harder to work with because you can't upgrade one dependency without unlocking all of your dependencies, right?
> As for directory layout, it's pretty clear it's guided by Python import rules
I don't mean within my actual code, I mean like: where does source code go, where does test code go, where do non-code assets go.
> Can you clarify what do you mean with "structured unit tests"?
I mean, like, if I'm at looking at a particular module in the source code, where do I go to find the tests for that module? Where's the test-support code as distinct from the specific tests?
> rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground
Whether virtualenv is available is a relatively recent change, so you already have a fractal problem. Having an uncontrolled way of installing your build environment is another of those things that's fine until it isn't.
> And there's a bunch of new dev tools springing up recently that are written in Rust for Python
Yeah, that's the one thing that gives me some hope that there might be light at the end of the tunnel, since I hear they mostly ignore all this idiocy (and avoid e.g. having user-facing virtualenvs at all) and just do the right thing. Hopefully once they catch on we'll see Python start to be ok without containers too and maybe the container hype will die down. But it's certainly not the case that everything has been fine since 2007; quite the opposite.
except you need to install the correct Java version and maven (we really should be using gradle by now)
Also in many projects there’s things other than code that need to be “built” (assets, textures, translations, etc). Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there. That’s objectively worse than the YAML hell we were complaining about at the top of the thread.
> you need to install the correct Java version and maven
Like I said, any version from the last, like, 10+ years (Java and Maven are both serious about backward compatibility), and if you install an ancient version you at least get fail-fast with a reasonable error.
> we really should be using gradle by now
We really shouldn't.
> Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there.
Maven doesn't have a build.xml, are you thinking of ant? With maven you write your custom build steps as build plugins, and they're written in plain old Java (or Kotlin, or Scala, or...) code, as plain old Maven modules, with the same kind of ordinary unit testing as your regular code; all your usual code standards apply (e.g. if you want to check your test coverage, you do it the same way as for your regular code - indeed, probably the same configuration you set up for your regular code is already getting applied). That's a lot better than YAML.
The reason for this is that nobody took the time to write a proper background document on Github Actions. The kind of information that you or I might convey if asked to explain it at the whiteboard to junior hires, or senior management.
This syndrome is very common these days. Things are explained differentially: it's like Circle CI but in the GH repo. Well that's no use if the audience wasn't around when Circle CI was first new and readily explained (It's like Jenkins but in the cloud...).
> My policy is to never let pipeline DSLs contain any actual logic outside orchestration for the task,
I call this “isomorphic CI” — ie: as long as you set the correct env vars, it should run identically on GitHub actions, Jenkins, your local machine, a VM etc
And yet, you would be surprised at the amount of people who react like that's an ignorant statement ("not feasible in real world conditions"), an utopic goal ("too much time to implement"), an impossible feat ("automation difficults human oversight"), or, my favorite, the "this is beneath us" excuse ("see, we are special and this wouldn't work here").
Automation renders knowledge into a set of executable steps, which is much better than rendering knowledge into documentation, or leaving it to rot in people's minds. Compiling all rendered knowledge into a single step is the easiest way to ensure all elements around the build and deployment lifecycle work in unison and are guarded around failures.
Building a Github-specific CI pipeline similarly transfer it into a set of executable steps.
The only difference is that you are now tied to a vendor for executing that logic, and the issue is really that this tooling is proprietary software (otherwise, you could just take their theoretical open source runner system and run it locally).
To me, this is mostly a question of using non-open-source development tools or not.
Yep. I remember at a previous company multiple teams had manually created steps in TeamCity (and it wasn't even being backed up in .xml files).
I just did my own thing and wrapped everything deploy.sh and test.sh and when the shift to another system came... well it was still kind of annoying, but at least I wasn't recreating the whole thing.
That’s usually very hard or impossible for many things. The AzDo yaml consists of a lot of steps that are specific to the CI environment (fetching secrets, running tests on multiple nodes, storing artifacts of various kinds).
Even if the ”meat” of the script is a single build.ps oneliner, I quickly end up with 200 line yaml scripts which have no chance of working locally.
Azure DevOps specifically has a very broken approach to YAML pipelines, because they effectively took their old graphical pipeline builder and just made a YAML representation of it.
The trick to working with this is that you don't need any of their custom Azure DevOps task types, and can use the shell type (which has a convenient shorthand) just as well as in any other CI environment. Even the installer tasks are redundant - in other CI systems, you either use a container image with what you need, or install stuff at the start, and Azure DevOps works with both of these strategies.
So no, it's neither hard nor impossible, but Microsoft's half-assed approach to maintaining Azure DevOps and overall overcomplicated legacy design makes it a bit hard to realize that doing what their documentation suggests is a bad idea, and that you can use it in a modern way just fine. At least their docs do not recommend that you use the dedicated NPM-type task for `npm install` anymore...
(I could rant for ages about Azure DevOps and how broken and unloved it is from Microsoft's side. From what I can tell, they're just putting in the minimum effort to keep old Enterprise customers that have been there through every rename since Team Foundation Server from jumping ship - maybe just until Github's enterprise side has matured enough? Azure DevOps doesn't even integrate well with Azure, despite its name!)
It has been on life support for a long time AFAIK. I designed Visual Studio Online (the first launch of AzDO) - and every engineer, PM, and executive I worked with is either in leadership at GitHub or retired.
It feels clear from an outside perspective that all the work on AzDO Pipelines has shifted to focus on GitHub Actions and Actions is now like 3 or 4 versions ahead. Especially because the public Issue trackers for some of AzDO Pipelines "Roadmap" are still up (on GitHub, naturally) and haven't been updated since ~2020.
I wish Microsoft would just announce AzDO's time of death and save companies increasingly crazy AzDO blinders and/or weird mixes of GitHub and AzDO as GitHub-only is clearly the present/future.
Yeah feels like they should be able to converge actions and pipelines.
Keeping some separation between AzDo itself and GH also requires some balancing. But so far I’m pretty sure I could never sell our enterprise on a shift to GH. Simply not enough jira-esque features in GH with complex workflows, time reporting etc so I can’t see them doing the bigger GH/AzDo merger.
This month's rollout of sub-issues and issue types would be most of what my organization thinks it needs to shift to GH Issues, I believe, barring however long it would take to rewrite some sync up automation with ServiceNow based on those issue types. Of course it will take another 6 months to a year before those kinds of features make it to GitHub Enterprise, so it is still not happening any time soon. (Though that gets back to my "weird" mixes. I don't entirely know why my company is using AzDO SaaS for Issue Tracking but decided GHE over "normal" cloud GH for Repos. But that's not the weirdest mix I've seen.)
I definitely get the backwards compatibility thing and "don't break someone's automation", but at the same time, Microsoft could at least mark AzDO's official Roadmap as "Maintenance Only" and send the message that feels obvious as a user that GitHub is getting far more attention than AzDO can, but is hard to convince management and infosec that a move to GitHub is not just "the future" but "the present" (and also maybe "the past", now, given AzDO seems to have been frozen ~2020).
This doesn’t seem to address the parent comment’s point at all, which was about required non-shell configuration such as for secrets, build parallelism, etc.
I'm increasingly designing CI stuff around rake tasks. Then I run rake in the workflow.
But that caters only for each individual command... as you mention the orchestration is still coded in, and duplicated from what rake knows and would do.
So I'm currently trying stuff that has a pluggable output: one output (the default) is that it runs stuff, but with just a rake var, instead of generating then running commands it generates workflow content that ultimately gets merged in an ERB workflow template.
The model I like the most though is Nix-style distributed builds: it doesn't matter if you do `nix build foo#bar` (local) or `nix build -j0 foo#bar` (zero local jobs => use a remote builder†), the `foo#bar` "task" and its dependents gets "built" (a.k.a run).
† builders get picked matching target platform and label-like "features" constraints.
Ever since there has been gitlab-runner, I've wondered why the hell can't I just submit some job to a (list of) runner(s) - some of which could be local - without the whole push-to-repo+CI orchestrator? I mean I don't think it would be out of this world to write a CLI command that locally parses whatever-ci.yml, creates jobs out of it, and submit them to a local runner.
The actual subtle issue here is that sometimes you actually need CI features around caching and the like, so you are forced to engage with the format a bit.
You can, of course, chew it down to a bare minimum. But I really wish more CI systems would just show up with "you configure us with scripts" instead of the "declarative" nonsense.
CI that isn't running on your servers wants very deep understanding of how your process works so they can minimize their costs (this is true whether or not you pay for using CI)
Totally! It's a legitimate thing! I just wish that I had more tools for dynamically providing this information to CI so that it could work better but I could also write relatively general tooling with a general purpose language.
The ideal for me is (this is very silly and glib and a total category error) LSP but for CI. Tooling that is relatively normalized, letting me (for example) have a pytest plugin that "does sharding" cleanly across multiple CI operators.
There's some stuff and conventions already of course, but in particular caching and spinning up jobs dynamically are still not there.
I agree with wrapping things like build scripts to test locally.
Still, some actions or CI steps are also not meant to be run locally. Like when it publishes to a repo or needs any credentials that are used by more than one person.
Btw, Github actions and corresponding YAML are derived from Azure DevOps and are just as cursed.
The whole concept of Github CI is just pure misuse of containers when you need huge VM images - container is technically correct, but a far fetched word for this - that have all kinds of preinstalled garbage to run typescript-wrapped code to call shell scripts.
This is the right way to use CI/CD systems, as dumb orchestrators without inherent knowledge of your software stack. But the problem is, everything from their documentation, templates, marketplace encourage you to do exactly the opposite and couple your build tightly with their system. It's poor product design imo, clearly optimising for vendor lock-in over usability.
Oh, yeah, I remember looking at that a while back. I don't recall how much it had implemented at the time but it seems that firecow took a vastly different approach than nektos/act did, going so far as to spend what must have been an enormous amount of time/energy to cook up https://github.com/firecow/gitlab-ci-local/blob/4.56.2/src/s... (and, of course, choosing a dynamically typed language versus golang)
>Lack of local development. It's a known thing that there is no way of running GitHub Actions locally.
This is one thing I really love about Buildkite[0] -- being able to run the agent locally. (Full disclosure: I also work for Buildkite.) The Buildkite agent runs as a normal process too (rather than as a Docker container), which makes the process of workflow development way simpler, IMO. I also keep a handful of agents running locally on my laptop for personal projects, which is nice. (Why run those processes on someone else's infra if I don't have to?)
>Reusability and YAML
This is another one that I believe is unique to Buildkite and that I also find super useful. You can write your workflows in YAML of course -- but you can also write them in your language of choice, and then serialize to YAML or JSON when you're ready to start the run (or even add onto the run as it's running if you need to). This lets you encapsulate and reuse (and test, etc.) workflow logic as you need. We have many large customers that do some amazing things with this capability.
Are you two talking about the same thing? I believe the grandparent is talking about running it locally on development machines, often for testing purposes.
Asking because Github Action also supports Self-Hosted runners [1].
Same thing, yeah, IIUC (i.e., running the agent/worker locally for testing). It's conceptually similar to self-hosted runners, yes, but also different in a few practical ways that may matter to you, depending on how you plan to run in production.
For one, with GitHub Actions, hosted and self-hosted runners are fundamentally different applications; hosted runners are fully configured container images, (with base OS, tools, etc., on board), whereas self-hosted runners are essentially thin, unconfigured shell scripts. This means that unless you're planning on using self-hosted runners in production (which some do of course, but most don't), it wouldn't make sense to dev/test with them locally, given how different they are. With Buildkite, there's only one "way" -- `buildkite-agent`, the single binary I linked to above.
The connection models are also different. While both GHA self-hosted runners and the Buildkite agent connect to a remote service to claim and run jobs, GHA runners must first be registered with a GitHub org or repository before you can use them, and then workflows must also be configured to use them (e.g., with `runs-on` params). With Buildkite, any `buildkite-agent` with a proper token can connect to a queue to run a job.
There are others, but hopefully that gives you an idea.
I think it's mostly obvious how you could implement CI with "local-first" programs (scripts?), but systems like Github provide value on top by making some of the artifacts or steps first-class objects.
Not sure how much of that does Github do, but it could parse test output (I know we used that feature in Gitlab back when I was using that on a project), track flaky tests for you or give you a link to code for a failing test directly.
Or it could highlight releases on their "releases" page, with release notes prominently featured.
And they allow you to group pipelines by type and kind, filter by target environment and such.
On top of that, they provide a library of re-usable workflows like AWS login, or code checkout or similar.
With all that, the equation is not as clear cut: you need to figure out how to best leverage some of those (or switch to external tools providing them), and suddenly, with time pressure, just going with the flow is a more obvious choice.
100% agree and that has been my experience too. It also makes testing the logic locally much easier, just run the script in the appropriate container.
The pipeline DSLs, because they are not full programming languages have to include lots of specific features and options and if you want something slightly outside of what they are designed for, you are out of luck. In a way it feels like how graphics were in the age of fixed-pipeline, when there had to be a complex API to cover all use cases yet it was not flexible enough.
This is my preferred way of doing things as well. Not being able to run the exact same thing that's running in CI easily locally is a bit of a red flag in my opinion. I think the only exception I've ever encountered to this is when working on client software for HSMs, which had some end-to-end tests that couldn't be run without actually connecting to the specific hardware that took some setup to be able to access when running tests locally.
That’s my policy too. I see way too many Jenkins/Actions scripts with big logic blocks jammed into YAML. If the entire build and test process is just a single script call, we can run it locally, in a GitHub workflow, or anywhere else. Makes it less painful to switch CI systems, and devs can debug easily without pushing blind commits. It’s surprising how many teams don’t realize local testing alone saves huge amounts of time.
When I automate my github actions I keep everything task orientated and if anything is pushing code or builds it has user step to verify the work by the automation. You approve and then merge it kicks off promotional pipelines not necessarily for deployment but to promote a build as stable through tagging.
So for example, my action builds on 4 different platforms (win-64, linux-amd64, mac-intel, mac-arm), it does this in parallel then gets the artifacts for all for and bundles them into a single package.
How would you suggest I do this following your advice?
While youre correct, environmental considerations are another advantage that testing locally SHOULD be able to provide (i.e. you can test your scripts or Make targets or whatever in the same runner that runs in the actual build system.)
Of course you can, just specify a container image of your choice and run the same container for testing locally.
However, replicating environmental details is only relevant where the details are known to matter. A lot of effort has been wasted and workflows crippled by the idea that everything must be 100% identical irrespective of actual dependencies and real effects.
Although there are definitely merits in moving the complex logic outside of the CI/CD JSON/YAML DSL, especially when using monorepo setups that can become rather complex in their logic (that they made Google create Bazel, I can think of some interesting Borg/K8s analogies btw), I also believe that modern CI/CD platforms have made several sensible steps in the right direction to handle these more complicated use cases.
(Disclaimer: I work at CircleCI)
At CircleCI for example, we have added valuable features like a VSCode extension[0] to validate and "dry-run" config from within your IDE, we have local runners[1] that you can use to test and run pipelines on your local machine and your own infra, we have dynamic config[2], a Javascript/Typescript SDK[3], a CLI that can validate and run workflows locally[4], and QoL additions like a no-op job type[5] and flexible requires, along with flexible when statements and expression based job filters[6].
And finally, it's of course also possible to combine different approaches into a "best of both worlds" approach, f.e. combining Dagger with CircleCI[7].
I suspect the author of the article could greatly simplify matters if they used a task running tool to orchestrate running tasks, for example. Pick whatever manner of decoupling you want really, most of the time this is the path to simplified CI actions. CI is best when its thought of as a way to stand up fresh copies of an environment to run things inside of.
I have never had the struggles that so many have had with CI as a result. Frankly, I'm consistently surprised at how overly complex people make their CI configurations. There's better tools for orchestration and dependency dependent builds, which is not its purpose to begin with.
I generally agree with you, but I'd be interested to hear your take on what the purpose of CI _actually is_.
It seems to me that a big part of the problem here (which I have also seen/experienced) is that there's no one specific thing that something like GitHub Actions is uniquely suited for. Instead, people want "a bunch of stuff to happen" when somebody pushes a commit, and they imagine that the best way to trigger all of that is to have an incredibly complex - and also bespoke - system on the other end that does all of it.
It's like we learned the importance of modularity in the the realm of software design, but never applied what we learned to the tools that we work with.
Standing up fresh images for validation and post validation tasks.
CI shines for running tests against a clean environment for example.
Really any task that benefits from a clean image being stood before running a task.
The key though is to decouple the tasks from the CI. Complexity like pre-packaging artifacts is not a great fit for CI configuration, that is best pushed to a tool that doesn’t require waterfall logic to make it work.
There is a reason task runners are very popular still
In our environment (our product is a windows desktop application) we use packer to build a custom windows server 2022 image with all the required tools installed. Build agents run on a azure vm scale set that uses the said image for the instance os.
Oh boy, there's a special kind of hell I enter into everytime I set up new github actions. I wrote a blog post a few months ago about my pain[0] but one of the main things I've found over the years is you can massively reduce how horrible writing github actions is by avoiding prebuilt actions, and just using it as a handy shell runner.
If you write behaviour in python/ruby/bash/hell-rust-if-you-really-want and leave your github action at `run: python some/script.py` then you'll have something that's much easy to test locally, and save yourself a lot of pain, even if you wind up with slightly more boilerplate.
At this point, just pause with Github Actions and compare it to how GiLab handles CI.
Much more intuitive, taking shell scripts and other script commands natively and not devolving into a mess of obfuscated typescript wrapped actions that need a shit ton of dependencies.
The problem with Gitlab CI is that now you need to use Gitlab.
I’m not even sure when I started feeling like that was a bad thing. Probably when they started glueing a bunch of badly executed security crud onto the main product.
The earliest warning sign I had for GitLab was when they eliminated any pricing tier below their equivalent of GitHub's Enterprise tier.
That day, they very effectively communicated that they had decided they were only interested in serving Enterprises, and everything about their product has predictably degraded ever since, to the point where now they're now branding themselves "the most comprehensive AI-powered DevSecOps Platform" with a straight face.
GitLab can't even show you more than a few lines of context without requiring you to manually click a bunch of times. Forget the CI functionality, for pull requests it's absolutely awful.
I decided it was a bad thing when they sent password reset emails to addresses given by unauthenticated users. Not that I ever used them. But now it is a hard no, permanently.
They have since had other also severe CVEs. That has made me feel pretty confident in my decision.
there was a pretty bad bug (though I think it was a rails footgun)- that allowed you to append an arbitrary email to the reset request.
The only difficult part for the attacker was finding an email address that was used by the target; though thats hsually the same as you use for git commits; and gitlab “handily” has an email address assigned to each user-id incrementing from 1;
Usually low numbers are admins, so, a pretty big attack vector when combined.
But you can do the same with GitHub, right? Although most docs and articles focus on 3rd party actions, nothing stops you to just run everything in your own shell script.
Yes, you can, and we do at my current job. Much of the time it's not even really the harder approach compared to using someone else's action, it's just that the existence of third party actions makes people feel obliged to use them because they wouldn't want to be accused of Not Invented Here Syndrome.
theoretically we could also use https://just.systems/ or https://mise.jdx.dev/ instead of directly calling gh actions but I haven't tried gh actions personally yet , If its really the nightmare you are saying , then that's sad.
I had this idea the other day when dealing with CI and thought it must be dumb because everyone's not already doing it. It would make your CI portable to other runners in future, too.
A lot of folks in this thread are focusing on the monorepo aspect of things. The "Pull request and required checks" problem exists regardless of monorepo or not.
GitHub Actions allows you to only run checks if certain conditions are met, like "only lint markdown if the PR contains *.md files". The moment you decide to use such rules, you have the "Pull request and required checks" problem. No "monorepo" required.
GitHub required checks at this time allow you to use with external services where GitHub has no idea what might run. For this reason, required checks HAVE to pass. There's no "if it runs" step. A required check on an external service might never run, or it might be delayed. Therefore, if GH doesn't have an affirmation that it passed, you can't merge.
It would be wonderful if for jobs that run on GH where GH can know if the action is supposed to run, if required checks could be "require all these checks if they will be triggered".
I have encountered this problem on every non-trivial project I use with GitHub actions; monorepo or not.
This isn't really the problem, though. This is an easy problem to solve; the real problem is that it costs money to do so.
Also: I'm not asserting that the below is good, just that it works.
First, don't make every check a required check. You probably don't need to require that linting of your markdown files passes (maybe you do! it's an example).
Second, consider not using the `on:<event>:paths`, but instead something like `dorny/paths-filter`. Your workflow now runs every time; a no-op takes substantially less than 1 minute unless you have a gargantuan repo.
Third, make all of your workflows have a 'success' job that just runs and succeeds. Again, this will take less than 1 minute.
At this point, a no-op is still likely taking less than 1 minute, so it will bill at 1 minute, which is going to be $.008 if you're paying.
Fourth, you can use `needs` and `if` now to control when your 'success' job runs. Yes, managing the `if` can be tricky, but it does work.
We are in the middle of a very large migration into GitHub Actions from a self-hosted GitLab. It was something we chose, but also due to some corporate choices our options were essentially GitHub Actions or a massive rethink of CI for several dozen projects. We have already moved into code generation for some aspects of GitHub Actions code, and that's the fifth and perhaps final frontier for addressing this situation. Figure out how to describe a graph and associated completion requirements for your workflow(s), and write something to translate that into the `if` statements for your 'success' jobs.
There's a workaround for the 'pull request and required check' issue.
You create an alternative 'no op' version of each required check workflow that just does nothing and exits with code 0 with the inverse of the trigger for the "real" one.
The required check configuration on github is just based off of job name, so either the trigger condition is true, and the real one has to succeed or the trigger condition is false and the no op one satisfies the PR completion rules instead.
It seems crazy to me that such basic functionality needs such a hacky workaround, but there it is.
Posts like this make me miss Travis. Travis CI was incredible, especially for testing CI locally. (I agree with the author that act is a well done hack. I've stopped using it because of how often I'd have something pass in act and fail in GHA.)
> GitHub doesn't care
My take: GitHub only built Actions to compete against GitLab CI, as built-in CI was taking large chunks of market share from them in the enterprise.
Woodpecker supports running jobs on your own machine (and conveniently provides a command to do that for failed jobs), uses the same sane approach of passing your snippets to the shell directly (without using weird typescript wrappers), and is pluggable into all major forges, GitHub included.
How so? I don’t recall this, and I used Travis, and then migrated to GitHub actions.
As far as I can tell, they are identical as far as testing locally. If you want to test locally, then put as much logic in shell scripts as possible, decoupled from the CI.
My man/woman - you gotta try buildkite. It’s a bit more extra setup since you have to interface with another company, more API keys, etc. But when you outgrow GH actions, this is the way. Have used buildkite in my last two jobs (big US tech companies) and it has been the only pleasant part of CI.
I've use Jenkins, Travis, Circle, Cirrus, GitHub Actions, and Buildkite. Buildkite is leagues ahead of all of the others. It's the only enjoyable CI system I've used.
One really interesting omission to this post is how the architecture of GitHub actions encourages (or at the very least makes deceivingly easy) making bad security decisions.
Common examples are secrets. Organization or repository secrets are very convenient, but they are also massive security holes just waiting for unsuspecting victims to fall into.
Repository environments have the ability to have distinct secrets, but you have to ensure that the right workflows can only access the right environments. It's a real pain to manage at scale.
Being able to `inherit` secrets also is a massive footgun, just waiting to leak credentials to a shared action. Search for and leak `AWS_ACCESS_KEY_ID` anyone?
Cross-repository workflow triggering is also a disaster, and in some circumstances you can abuse the differences in configuration to do things the source repository didn't intend.
Other misc. things about GHA also are cool in theory, but fall down in practice. One example is the wait-timer concept of environments. If you have a multi-job workflow using the same environment, wait-timer applies to EACH JOB in the environment. So if you have a build-and-test workflow with 2 jobs, one for build, and one for test, each step will wait `wait-timer` before it executes. This makes things like multi-environment deployment pipelines impossible to use this feature, unless you refactor your workflows.
Overall, I'd recommend against using GHA and looking elsewhere.
Well that's just someone being a dumbass, since AssumeRoleWithWebIdentity (and its Azure and GCP equivalent) have existed for quite a while. It works flawlessly and if someone does do something stupid like `export HURP_DURP=$AWS_ACCESS_KEY_ID; printenv` in a log, that key is only live for about 15 minutes so the attacker better hurry
Further, at least in AWS and GCP (I haven't tried such a thing in Azure) on can also guard the cred with "if the organization and repo are not ..." then the AssumeRole 403s to ensure that my-awesome-org/junior-dev-test-repo doesn't up and start doing fun prod stuff in GHA
I hate GHA probably more than most, but one can footgun themselves in any setup
If you, or others, are interested I have found that those role-session-name variables make for a great traceability signal when trying to figure out what GHA run is responsible for AWS actions. So instead of
role-session-name: GitHubActionSession
one can consider
role-session-name: gha-${{ github.run_id }} # or your favorite
I don't this second recall what the upper limit is on that session name so you may be able to fit quite a bit of stuff in there
Great points. I totally agree, don't use hard-coded static creds, especially here. But in reality, many services and/or API keys don't support OIDC or short-lived credentials, and the design of secrets in GitHub promote using them, in my opinion.
Whilst I do detest much of Azure DevOps, one thing I do like about their pipelines is that we can securely use service connections and key vaults in Azure to secure pipeline tasks that require credentials to be managed securely.
While I do agree with you regarding encouraging bad secret management practices, one fairly nice solution I’ve landed on is using terraform to manage such things. I guess you could even take it a step further to have a custom lint step (running on GHA, naturally) that disallows secrets configured in a certain manner and blocks a deploy (again, on GHA) on failure.
I guess what I’m saying is, it’s GHA all the way down.
In the end, this is the age old "I built by thing on top of a 3rd party platform, it doesn't quite match my use case (anymore) and now I'm stuck".
Would GitLab have been better? Maybe. But chances are that there is another edge case that is not handled well there. You're in a PaaS world, don't expect the platform to adjust to your workflow; adjust your workflow to the platform.
You could of course choose to "step down" (PaaS to IaaS) by just having a "ci" script in your repo that is called by GA/other CI tooling. That gives you immense flexibility but also you lose specific features (e.g. pipeline display).
I'm not sure if there's a monorepo vs polyrepo difference; just that anything complex is pretty painful in gitlab. YAML "programming" just doesn't scale.
The problem is that your "ci" script often needs some information from the host system, like what is the target git commit? Is this triggered by a pull request, or a push to a branch? Is it triggered by a release? And if so, what is the version of the release?
IME, much of the complexity in using Github Actions (or Gitlab CI, or Travis) is around communicating that information to scripts or build tools.
That and running different tasks in parallel, and making sure everything you want passes.
Doesn't everything in GitLab go into a single pipeline? GitHub at least makes splitting massive CI/CD setups easier by allowing you to write them as separate workflows that are separate files.
> GitHub at least makes splitting massive CI/CD setups easier by allowing you to write them as separate workflows that are separate files.
this makes me feel like you’re really asking “can i split up my gitlab CICD yaml file or does everything need to be in one file”.
if that’s the case:
yes it does eventually all end up in a single pipeline (ignoring child pipelines).
but you can split everything up and then use the `include` statement to pull it all together in one main pipeline file which makes dealing with massive amounts of yaml much easier.
this doesn’t run anything for `job_b_from_template` … you just end up defining the things you want to run for each case, plus any variables you need to provide / override.
you can also override stuff like rules on when it should run if you want to. which is handy.
gitlab CICD can be really modular when you get into it.
if that wasn’t the case: on me.
edit: switched to some yaml instead of text which may or may not be wrong. dunno. i have yet to drink coffee.
addendum you can also do something like this, which means you don’t have to redefine every job in your main ci file, just define the ones you don’t want to run
You can have pipelines trigger child pipelines in gitlab, but usability of them is pretty bad, viewing logs/results of those always needs extra clicking.
> In GitHub you can specify a "required check", the name of the step in your pipeline that always has to be green before a pull request is merged. As an example, I can say that web-app1 - Unit tests are required to pass. The problem is that this step will only run when I change something in the web-app1 folder. So if my pull request only made changes in api1 I will never be able to merge my pull request!
Continuous Integration is not continuous integration if we don’t test that a change has no deleterious side effects on the rest of the system. That’s what integration is. So if you aren’t running all of the tests because they’re slow, then you’re engaging in false economy. Make your tests run faster. Modern hardware with reasonable test runners should be able to whack out 10k unit tests in under a minute. The time to run the tests goes up by a factor of ~7-10 depending on framework as you climb each step in the testing pyramid. And while it takes more tests to cover the same ground, with a little care you can still almost halve the run time replacing one test with a handful of tests that check the same requirement one layer down, or about 70% moving down two layers.
One thing that’s been missing from most of the recent CI pipelines I’ve used is being able to see that a build is going to fail before the tests finish. The earlier the reporting of the failure the better the ergonomics for the person who triggered the build. That’s why the testing pyramid even exists.
If the unit tests are slow enough to want to skip them, they likely are not unit tests but some kind of service-level tests or tests that are hitting external APIs or some other source of a bad smell. If the slow thing is the build, then cache the artifact keyed off the directory contents so the step is fast if code is unchanged. If the unit tests only run for a package when the code changes, there is a lack of e2e/integration testing. So, what is OP's testing strategy? Caching? It seems like following good testing practices would make this problem disappear.
That is true for most cases, which nowadays is web and backend software. As you get into embedded or anything involving hardware things get slower and you need to optimize.
For example, tests involving read hardware can only run at 1x speed, so you will want to avoid running those if you can. If you are building a custom compiler toolchain, that is slow and you will want to skip it if the changes cannot possibly affect the toolchain.
I agree hardware should be that quick, but CI and cloud hardware is woefully underpowered unless you actively seek it out. I’ve also never seen a test framework spew out even close to that in practice. I’m not even sure most frameworks would do that with noop tests, which is sad.
10 years ago my very testing-competent coworker had us running 4200 tests in 37 seconds. In NodeJS. We should be doing as well that today without a gifted maintainer.
I've got an i9 and an NVMe drive. running npm test with 10k no-op tests takes 30 seconds, which is much quicker than I expected it to be (given how slow everything else in the node world is).
Running dotnet test on the other hand with 10k empty tests took 95 seconds.
Honestly, 10k no-op tests should be limited by disk IO, and in an ideal world would be 10 seconds.
Agreed, most of the CI tools don't help in getting feedback early to the developers. I shouldn't have to wait hours for my CI job to complete. Harness is a tool that can reduce build times by caching build artifacts, docker layers and only running a subset of tests that were impacted the by the code change.
Not just me then? I was trying to fix a GitHub action just today but I have no clue how I'm supposed to tear it, so I just keep making tiny changes and pushing.... Not a good system but I'm still within the free tier so I'm willing to put up with it I guess.
I think it’s everyone, debugging GH actions is absolute hell, and it gets terrifying when the action interacts with the world (e.g. creating and deploying packages to a registry).
> it gets terrifying when the action interacts with the world (e.g. creating and deploying packages to a registry).
To be fair, testing actions with side effects on the wider world is terrifying even if you’re running it locally, maybe more so because your nonstandard local environment may have surprises (e.g. an env var you set then forgot) while the remote environment mostly only has stuff you set/installed explicitly, and you can be sloppier (e.g. accidentally running ./deploy when you wanted to run ./test). That part isn’t a GH Actions problem.
Locally it is much easier to set up and validate test environments, or neuter some of the pipeline to test things out and ensure the rest produces expected results (in fact I usually dry-run by default and require a file or envvar to “real run”). Especially as some jobs (rightfully) refuse to run in PRs.
If this is to troubleshoot non-code related failures (perm issues, connection timed out, whatever influences success that doesn't require a code change) then surely the repo's history would benefit from one just clicking "Re-run Job", or its equivalent $(gh ...) invocation, right?
not necessarily, rerun job will most likely use the fully resolved dependency graph of your actions (or equivalent), a fresh run will re-resolve them (e.g. you pinned to @1 vs the specific version like @1.2.3 of a dep).
the history problem goes away if you always enforce squash merge...
GitHub (Actions) is simply not built to support monorepos. Square peg in a round hole and all that. We've opted for using `meta` to simulate monorepos, while being able to use GitHub Actions without too much downsides.
I do wonder if this really solves the author problem because by the looks of it , you just have to run meta command and it would run over each of the sub directory. While at the same time , I think I like it because this is what I think people refer to as "modular monolith"
Combining this with nats https://nats.io/ (hey if you don't want it to be over the network , you could use nats with the memory model of your application itself to reduce any overhead) and essentially just get yourself a really modular monolith in which you can then seperate things selectively (ahem , microservices) afterwards rather easily.
1. We apparently don’t even have a name for it. We just call it “CI” because that’s the adjacent practice. “Oh no the CI failed”
2. It’s conceptually a program that reports failure if whatever it is running fails and... that’s it
3. The long-standing principle of running “the CI” after merging is so backwards that that-other Hoare disparagingly called the correct way (guard “main” with a bot) for The Not Rocket Science Principle or something. And that smug blog title is still used to this day (or “what bors does”)
4. It’s supposed to be configured declaratively but in the most gross way that “declarative” has ever seen
5. In the true spirit of centralization “value add”: the local option of (2) (report failure if failed) has to be hard or at the very least inconvenient to set up
The general philosophy of these CI systems is flawed. Instead of CI running your code, your code should run the CI. In other words, the CI should present an API such that one can have arbitrary code which informs the system of what is going on. E.g. "I'm starting jobs A,B,C", "Job A done successfully", "This file is an artifact for job B".
Information should only from from the user scripts to the CI, and communication should be done by creating files in a specific format and location. This way the system can run and produce the same results anywhere provided it has the right environment/container.
One thing that sounds very nice about Github are merge queues: Once your PR is ready, rather than merging, you submit it to the merge queue, which will rebase it on the last PR also on the merge queue. It then runs the CI on each PR, and finally merges them automatically once successful. If CI fails, doesn't get merged, and the next PR skips yours on the chain.
Still a lot of computation & some wait time, but you can just click & forget. You can also parallelize it; since branches are rebased on each other, you can run CI in advance and, assuming your predecessor is also successful, reuse the result from yours.
That sounds roughly like what happens for Rust. I write a Rust PR, somebody reviews it, they have feedback, I modify the PR, they're happy, and it passes to bors (originally: https://github.com/graydon/bors) which then tries to see whether this can be merged with Rust and if so does so.
It is nice to know that if humans thought your change is OK, you're done. I've only committed small changes (compiler diagnostics, documentation) nothing huge, so perhaps if you really get up in it that's more work, but it was definitely a pleasant experience.
... and sure enough it turns out that work on one of the bors successors was in fact discontinued because you should just use this GitHub feature. TIL.
GHA/Gitlab CI/Buildkite/whatever else floats your boat then just builds a bunch of Bazel targets, naively, in-order etc. Just lean on Bazel fine-grained caching until that isn't enough anymore and stir in remote build execution for more parallelism when you need it.
This works up until ~10M+ lines of code or ~10ish reasonably large services. After that you need to do a bit more work to only build graph of targets that have been changed by the diff. That will get you far enough that you will have a whole team that works on these problems.
Allowing the CI tools to do any orchestration or dictate how your projects are built is insanity. Expressing dependencies etc in YAML is is the path to darkness and is only really justifiable for very small projects.
> The problem is that this step will only run when I change something in the web-app1 folder. So if my pull request only made changes in api1 I will never be able to merge my pull request!
This just seems like a bad implementation to me?
There are definitely ways to set up your actions so that they run all of the unit tests without changes if you'd like, or so that api1's unit tests are not required for a web-app1 related PR to be merged.
Absolutely correct. When creating a new workflow, I always disable push/pull_request triggered builds and instead use the manually triggered `workflow_dispatch` method. This makes testing a new workflow much easier.
Additionally, you can use conditionals based on inputs in the `workflow_dispatch` meaning that you could easily setup a "skip api tests" or "include web tests" option.
It sounds like they have the logic to skip certain things if nothing has changed. The problem is around pull request gates and the lack of dynamic "these tests must be passing before merging is allowed". There are setting on a repository in the ruleset / status checks area that are configured outside of the dynamic yaml of the GHA workflow
GH hosted runners use shared hardware so the performance is never good. There are quite a few options available. Harness CI offers hyper optimized build infrastructure, paired with software intelligence (caching, running subset of tests based on the code change) can reduce build times up-to 4X compared to GH Actions.
> Our code sits in a monorepo which is further divided into folders. Every folder is independent of each other and can be tested, built, and deployed separately.
If this is true, and you still have problems running specific Actions, why not break this into separate repositories?
There is a mono vs poly repo tradeoff. Pros & cons to each approach. If you are doing monorepo, it would be antithetical to break it up into the poly paradigm. You really don't want both
So the way I've solved the multiple folders with independent checks is like this:
all-done:
name: All done
# this is the job that should be marked as required on GitHub. It's the only one that'll reliably trigger
# when any upstream fails: success
# when all upstream skips: pass
# when all upstream success: success
# combination of upstream skip and success: success
runs-on: ubuntu-latest
needs:
- calculate-version
- cargo-build
- cargo-fmt
- cargo-clippy-and-report
- cargo-test-and-report
- docker-build
- docker-publish
if: |
always()
steps:
- name: Fail!
shell: bash
if: |
contains(needs.*.result, 'failure') ||
contains(needs.*.result, 'cancelled')
run: |
echo "One / more upstream failed or was cancelled. Failing job..."
exit 1
- name: Success!
shell: bash
run: |
echo "Great success!"
That way it is resilient against checks not running because they're not needed, but it still fails when any upstream actually fails.
Now, I did end up running the tests of the front-end and back-end because they upload coverage, and if my coverage tool doesn't get both, it'll consider it as a drop in coverage and fail its check.
But in general, I agree with the writer of the post that it all feels like it's not getting enough love.
For example, there is no support for yaml anchors, which really hampers reusability on things that cannot be extracted to separate flows (not to mention separate flows can only be nested 4 deep).
There is also the issue that any commit made by GitHub actions doesn't trigger another build. This is understandable, as you want to avoid endless builds, but sometimes it's needed, and then you need to do the ugly workaround with a PAT (and I believe it can't even be a fine-grained one). Combine that with policies that set a maximum time limit on tokens, your build becomes brittle, as now you need to chase down the person with admin access.
Then there is the issue of Docker actions. They tell you to pin the action to an sha to prevent replacements. Except the action itself points to a replaceable tag.
Lastly, there is a bug where when you create a report for your action, you cannot specify the parent it belongs to. So your ESLint report could be made a child of your coverage report.
I have never used a CI system more flaky and slow than GitHub Actions. The one and only positive thing about it is that you get some Actions usage for free.
The Azure machines GitHub uses for the runners by default have terrible performance in almost every regard (network, disk, CPU). Presumably it would be more reliable when using your own runners, but even the Actions control plane is flaky and doesn't always schedule jobs correctly.
We switched to Buildkite at $DAYJOB and haven't looked back.
I’ve seen many teams get stuck when they rely too heavily on GitHub Actions’ magic. The key issue is how tightly your build logic and config become tied to one CI tool. If the declarative YAML gets too big and tries to handle complex branching or monorepos, it devolves into a maintenance headache—especially when you can’t test it locally and must push blind changes just to see what happens.
A healthier workflow is to keep all the logic (build, test, deploy) in portable scripts and let the CI only orchestrate each script as a single step. It’s easier to troubleshoot, possible to run everything on a dev machine, and simpler if you ever migrate away from GitHub.
For monorepos, required checks are maddening. This should be a first-class feature where CI can dynamically mark which checks apply on a PR, then require only those. Otherwise, you do hacky “no-op” jobs or you force your entire pipeline to run every time.
In short, GitHub Actions can be powerful for smaller codebases or straightforward pipelines, but if your repo is big and you want advanced control, it starts to feel like you’re fighting the tool. If there’s no sign that GitHub wants to address these issues, it’s totally reasonable to look elsewhere or build your own thin orchestration on top of more flexible CI runners.
I tried to use GitHub Actions on Forgejo and... It's so much worse than using an actual CI pipeline.
With Woodpecker/Jenkins you know exactly what your pipeline is doing. With GitHub actions, not even the developers of the actions themselves know what the runner does.
What does this even mean? Are you talking about Forgejo Actions, or are you somehow hosting your code on a Forgejo instance but running CI through GitHub?
> With Woodpecker/Jenkins you know exactly what your pipeline is doing.
If you wrote it from the ground up, sure. On the other hand, I've inherited Jenkins pipelines that were written years before I got there and involved three to four different plugins, and they're way worse to work with than the GitHub Actions that I inherited.
> What does this even mean? Are you talking about Forgejo Actions, or are you somehow hosting your code on a Forgejo instance but running CI through GitHub?
yes, Forgejo Actions which is supposed to be a drop in replacement for GH Actions. You can say they're different things but the general idea and level of complexity is the same.
You basically achieve the same result on github actions if you just ignore all of the github action yaml “magic” settings in the syntax and let your makefile/script do the logic which also makes it trivial to debug locally. But upvote because I do love sourcehut, it’s just so clean!
You can't run AWS lambda or DyanmoDB locally too (well you can but it's a hassle). So by that logic, we shouldn't use them at all. I don't like working with CI too but I'll take GitHub Actions over Jenkins/CircleCI/TravisCI any day.
> You can't run AWS lambda or DyanmoDB locally too (well you can but it's a hassle). So by that logic, we shouldn't use them at all.
No, applying the logic to something like Lambda would mean implementing handlers like:
function handle(lambdaRequest, lambdaContext) {
return myFunction(
stuffExtractedFromLambdaRequest(lambdaRequest),
stuffExtractedFromLambdaContext(lambdaContext)
);
}
Then there's no need to go through the hassle of running Lambda functions locally; since we can just run `myFunction` locally instead.
Dynamo isn't the same, since it's just a service/API that we call; we don't implement its logic, like we do for CI tasks, Lambda functions, etc.
Whilst you're right that it's a hassle to run DynamoDB locally (although not too bad, in my experience); that's also not necessary. It's fine to run code locally which talks to a remote DynamoDB; just set up the credentials appropriately.
Yeah, and that doesn't stop us from using either of them. What I tried to convey is that GHA isn't ideal and it has a few warts but it's still better than most of the options available out there.
Monorepos come with a different set of tradeoffs from polyrepos. Both have their pains. We have a similar setup with Jenkins, and have used CUE to tame a number of these issues. We did so by creating (1) a software catalog (2) per-branch config for versions and CI switches
Similarly, we are adopting Dagger, more as part of a larger "containerize all of our CI steps" which works great for bringing parity to CI & local dev work. There are a number of secondary benefits and the TUI / WUI logs are awesome.
Between the two, I have removed much of the yaml engineering in my work
I hate GitHub Actions, and I hate Azure Pipelines, which are basically the same. I especially hate that GitHub Actions has the worst documentation.
However, I’ve come full circle on this topic. My current position is that you must go all-in with a given CI platform, or forego the benefits it offers. So all my pipelines use all features, to offer a great experience for devs relying on them: Fast, reproducible, steps that are easy to reason about, useful parameters for runs, ...
I am biased because I built the rust SDK for dagger. But I think it is a real step forward for CI. Is it perfect? Nope. But it allows fixing a lot of the shortcomings the author has.
Pros:
- pipeline as code, write it as golang, python, typescript or a mix of thr above.
- Really fast once cached
- Use your languages library for code sharing, versioning and testing
- Runs everywhere local, ci etc. Easy to change from github actions to something else.
Cons:
- Slow on the first run. Lots of pulling of docker images
- The DSL and modules can feel foreign initially.
- Modules are definitely a framework, I prefer just building having a binary I can ship (which is why the rust SDK doesnt support modules yet).
- Doesn't handle large mono repos well, it relies heavily on caching and currently runs on a single node. It can work if you don't have 100 of services especially if the builder is a large machine.
Just the fact that you can actually write ci pipelines that can be tested, packaged, versioned etc. Allows us to ship our pipelines as products which is quite nice and something we've come to rely on heavily
I'm genuinely intrigued by Dagger, but also super confused. For example, this feels like extra complexity around a simple shell command, and I'm trying to grok why the complexity is worth it:
https://docs.dagger.io/quickstart/test/#inspect-the-dagger-f...
I'm a fanboy of Rust, Containerization, and everything-as-code, so on paper Dagger and your Rust SDK seems like it's made for me. But when I read the examples... I dunno, I just don't get it.
It is a perfectly valid crtitisim dagger is not a full build system that dictates what your artifacts look like. Unlike maybe something like bazel or nix. I think of dagger as a sort of interface that now allows me to test and package my build and ci into smaller logical bits and rely on the community for parts of it as well.
In the end you do end up slinging apt install commands for example, but you can test those parts in isolation. Does my ci actually scan this kind of vulnerability, install postgres driver, when I build a rust binary is it musl and working on scratch images.
In some sense dagger feels a little but like a programmatic wrapper on top of docker, because that is actually quite close to what it is.
You can also use it for other things because in my mind it is the easiest way of orchestrating containers. For example running renovate over a list of repositories, spawning adhoc llm containers (pre ollama), etc. Lots of nice uses outside of ci as well even if it is the major selling point
That conceptually allows one to have two different "libraries" in your CI: one in literal golang, as a function which takes in a source and sets up common step configurations, and the other as Dagger Functions via Modules (<https://docs.dagger.io/features/modules> or <https://docs.dagger.io/api/custom-functions#initialize-a-dag...>) which work much closer to GHA uses: blocks from a organization/repo@tag style setup with the grave difference that they can run locally or in CI, which for damn sure is not true of GHA uses: blocks
The closest analogy I have is from GitLab CI (since AFAIK GHA does not allow yaml anchors nor "logical extension"):
.common: &common # <-- use whichever style your team prefers
image: node:21-slim
cache: {} # ...
environment: [] # ...
my-job1:
stage: test
<<: *common
script: [ npm, run, test:unit, run ]
my-job2:
stage: something-else
extends: .common
script: echo "do something else"
1: I'm aware that I am using golang multiple times in this comment, and for sure am a static typing fanboi but as the docs show they allow other languages, too
Exactly. When comparing dagger with a lot of the other formats for ci, it may seem more logical. Ive spent so much time debugging github actions, waiting 10 minutes for testing a full pipeline after a change etc. Over an over again. Dagger has a weird DSL in the programming languages as well but at least it is actual code that I can write for loops around or give parameters and reuse. The instead of a groovy file for Jenkins ;)
Shameless plug but I built GitGuard (https://gitguard.dev) to solve the "Pull request and required checks" problem mentioned here (and other problems).
Basically: you set GitGuard as your required check and then write a simple GitGuard workflow like this:
if anymatch(pull_files,"src/backend/.*") {
assert(checkpassed("backend-tests"))
}
I do not use GitHub Actions for these purposes, and if I did, I would want to ensure that it is a file that can run locally or whatever else just as well. I don't use GitHub Actions to prevent pull requests from being merged (I will always manage them manually), and do not use GitHub Actions to manage writing the program, for testing the program (it would be possible to do this, but I would insist on doing it in a way that is not vendor-locked to GitHub, and by putting most of the stuff outside of the GitHub Actions file itself), etc.
I do have a GitHub Actions file for a purpose which is not related to the program itself; specifically, for auto-assignment of issues. In this case, it is clearly not intended to run locally (although in this case you could do so anyways if you could install the "gh" program on your computer and run the command mentioned there locally, but it is not necessary since GitHub will do it automatically on their computer).
Why is this team sticking multiple directories that are “independent of each other” into a single repository? This sounds like a clear case of doing version control wrong. Monorepos come with their own set of challenges, and I don’t think there are many situations where they’re actually warranted. They certainly don’t help for completely independent projects.
No project within a single organization is completely independent. In general they all serve to meet a unified business objective and developers often need the global context on occasion.
I used to be a multirepo proponent, but have fallen in love with Bazel and “everything bagel” repo.
> No project within a single organization is completely independent. In general they all serve to meet a unified business objective and developers often need the global context on occasion.
Of course; I was only quoting the article. I am a firm believer in making things as simple as possible until you need something complicated, and monorepos/Bazel definitely get a “complicated” label from me.
Yeah, sounds like the problems are more due to monorepos rather than with GitHub actions. Seems like the pendulum always swings too far. Overdoing microservices results in redundant code and interservice spaghetti. Monorepos have their own set of issues. The only solution is to think carefully about a what size chunk of functionality you want to build, test, and deploy as a unit.
For the first point, some mono repo orchestrators (I'm thinking of at least pnpm) have a way to do : run all the (for example) tests for all the packages that had change from master branch + all packages that depend transitively from those packages.
It's very convenient and avoid having to mess with the CI limitations on the matter
Just have Github Actions run a monorepo tool like turborepo. You're just trying to do to much in a yaml file... The solution for all build pipeline tools are always to do most of your build logic in a bash-script/makefile/monorepo-tool.
I use Github Actions as a fertile testing playground to work out how to do things locally.
For example, if you've ever had to wade into the codesigning/notarization quagmire, observing the methods projects use with Github Actions to do it, can teach you a lot about how to do things, locally.
Many if not all mentioned issues derive from the fact that nowadays pipelines are most of the time - YML based - which is terrible choise for programming , you might want take a look at Sparky which is 100% Raku cicd system thst does not have many of mentioned pitfalls and super flexible …
In my newest hobby project, I decided to bite the bullet and use the flake.nix as single source of truth. And it's surprisingly fast! I used cargo-crane to cache rust deps. This also works locally just running "nix flake check". Much better than dealing with github actions, caches, and whatnot.
Apart from the nix gh action that just runs "nix flake check", the only other actions are for making a github release on certain tags, and uploading release artifacts - which is something that should be built-in IMO.
I once used Team City and Octopus Deploy in a company. And ever since then, dealing with Gitlab Pipelines and Github Actions, I find them so much poorer as a toolkit.
We are very much in the part of the platform cycle where best-in-breed is losing out to all-in-one. Hopefully we see things swing in the other direction in the next few years where composable best-in-breed solutions recapture the hearts and minds of the community.
Every CI system has its flaws but GitHub Actions in my opinion is pretty nice especially in terms of productivity; easy to setup, tons of prebuild actions, lots of examples, etc.
I've used Tekton, Jenkins, Travis, Hudson, StarTeam, Rational Jazz, Continuum and a host of other CI systems over the years but GitHub Actions ain't bad.
Google made Release-Please to make monorepo development easier, there is a GitHub Action for it in the marketplace. Would probably make things a lot cleaner for this situation.
Blindly using automation or implementing it with validation will always bite a person in the butt. Been there done that. It is good but it should always be event driven with a point of user validation.
Act has limitations because GitHub Actions run via virtualization, while Act runs via containerization. This means that actions behave differently across the two platforms.
My recent experience with Github Actions is that it will randomly fail running a pipeline that hasn't changed with an incomprehensible error message. I re-run the action a few hours later and it works perfectly.
GitHub Action has supported your own locally hosted runners for years, so I presume "there is no way of running GitHub Actions locally" is referring to something else.
GitHub cares. GitHub cares about active users on their platform. Whether it's managing PRs, doing code reviews, or checking the logs of another failed action.
They don’t care about things that I care about, including everything the author talked about, and also things like allowing whitespace-ignore on diffs to be set on by default in a repo or per user - an issue that’s been open for half a decade now.
(Whitespace is just noise in a typescript repo with automatic formatting)
GitHub often actively doesn't act in situations where acting would be prudent, which portrays from an outside perspective a disinterest in those who give their time to document shortcomings. Would you care to guess when the last time that the GitHub API was updated? It's probably much longer than you'd think (2+ years at this point).
Ideally we should handle it as any other code, that is: do tests, handle multiple environments including the local environment, lint/build time error detection etc
Yeah-yeah, but it's not like they allow you to run your build definitions locally nor they address some other concerns. With GHA you may use nix-quick-install in a declarative manner, nixify your builds and then easily run them locally and under GHA. In case of jenkins/tc you would have to jump through much more hoops.
That GH Actions and Azure Pipelines both settled for this cursed Yaml is hard to understand. Just make a real programming language do it! And ffs make a local test env so I can run the thing.
> no way of running actions locally
My policy is to never let pipeline DSLs contain any actual logic outside orchestration for the task, relying solely on one-liner build or test commands. If the task is more complicated than a one-liner, make a script for it in the repo to make it a one-liner. Doesn't matter if it's GitHub Actions, Jenkins, Azure DevOps (which has super cursed yaml), etc.
This in turn means that you can do what the pipeline does with a one-liner too, whether manually, from a vscode launch command, a git hook, etc.
This same approach can fix the mess of path-specific validation too - write a regular script (shell, python, JS, whatever you fancy) that checks what has changed and calls the appropriate validation script. The GitHub action is only used to run the script on PR and to prepare the CI container for whatever the script needs, and the same pipeline will always run.
The reason why many CI configs devolve into such a mess isn't typically that they don't extract complicated logic into scripts, it's about all the interactions with the CI system itself. This includes caching, sharing of artifacts, generating reports, configuring permissions, ordering of jobs, deciding when which jobs will run, deciding what to do when jobs fail, etc. All of this can get quite messy in a large enough project.
It never becomes unbearably messy this way though.
The reason it gets unbearably messy is because most people google "how to do x in github actions" (e.g. send a slack message) and there is a way and it's almost always worse than scripting it yourself.
The reason it gets unbearably messy is that GitHub has constructed an ecosystem that encourages developers to write Turing complete imperative behavior into YAML without providing the same language constructs/tooling that a proper adult language provides to encourage code reuse and debugging.
Without tooling like this any sufficiently complex system is guaranteed to evolve into a spaghetti mess, because no sane way exists to maintain such a system at scale without proper tooling, which one would need to hand roll themselves against a giant, ever changing mostly undocumented black box proprietary system (GitHub Actions). Someone tried to do this, the project is called “act”. The results are described by the author in the article as “subpar”.
The only sane way to use GitHub Actions at scale is to take the subset of its features that you leverage to perform the execution (event triggers, runs-on, etc) and only use those features, and farm out all the rest of the work in something that is actually maintainable eg Buildkit, Bazel, Gradle etc
Which I feel is a recurring lesson in CI in general. CI systems have to be scriptable to get their job done because they can't anticipate every build system. With even terrible scriptability comes Turing completeness, because it is hard to avoid, a natural law. Eventually someone figures out how to make it do wild things. Eventually those wild things becomes someone's requirements in their CI pipeline. Eventually someone is blog posting about how terrible that entire CI system is because of how baroque their pipeline has become and how many crazy scripts it has that are hard to test and harder to fix.
It's like the circle of life for CI systems.
It's not just Github, Gitlab uses the same mess of YAML-programming with custom extensions. So do, I believe, many other systems.
Ironically, Jenkins does this one correctly, they just give you a regular programming language. It's a shame there are so many other pitfalls, though.
Jenkins has almost the opposite problem: they give you the freedom to do everything, but point the gun right at your foot in the process of doing so
Caching and sharing artifacts is usually the main culprit. My company has been using https://nx.dev/ for that. It works locally as well and CI and it just works.
Our NX is pointed to store artifacts in GHA, but our GHA scripts don't do any caching directly, it is all handled by NX. It works so well I would even consider pulling a nodejs environment to run it in non-nodejs projects (although I haven't tried, probably would run into some problems).
It is somewhat heavy on configuration, but it just moves the complexity from CI configuration to NX configuration (which is nicer IMO). Our CI pipelines are super fast if you don't hit one of one of our slow compilling parts of the codebase.
With NX your local dev environment can pull cached items that were built from previous CI ran-jobs or other devs. We have some native C++ dependencies that are kind of a pain to build locally, our dev machines can pull the built binaries built by other devs (since all devs and CI also share the same cache-artifacts storage). So it makes developing locally a lot easier as well, I don't even remember last time I had to build the native C++ stuff myself since I don't work on it.
Do you know the criteria used to pick the nx.dev? That is, do you pay for their Cloud, or do you do some plumbing yourselves to make it work on GitHub and other things?
Looks interesting. We’ve picked tools based on time saved without too much extra knowledge or overhead required, so this may prove promising.
To be honest, I wasn't the one who added it and have only occasionally done some small changes to the NX configuration. I don't think we pay for their Cloud, I think all our stored artifacts are stored in GHA caching system and we pull them using our github SSH keys. Although I don't know exactly how that was set up. The fact that someone set it up and I just started using it and it just works is a testament of how good it works.
NX is good because it does the caching part of CI in a way that works both locally and on CI. But of course it doesn't really help at all with the other points raised by the OP.
One interesting thing about NX as well is that it helps with you managing your own local build chain, like in the example I mentioned above, when I run a project that requires the C++ native dependency, that project gets built automatically (or rather my computer pulls the built binaries from the remote cache).
But for all of this to work you need to set up these dependency chains explicitly in your NX configuration, but that is formalizing an actual requirement instead of leaving it implicit (or in Makefiles or in scripts that only run in CI).
I do have to say that our NX configuration is quite long though, but I feel that once you start using NX it is just too tempting to split your project up in individual cacheable steps even if said steps are very fast to run and produce no artifacts. Although you don't have to.
For example we have separate steps for linting, typescript type-checking, code formatting, unit testing for each unique project in our mono-repo. In practice they could be all the same step because they all get invalidated at the same time (basically on any file change).
You should generate your report with regular scripts. You need ci config to deploy them but that is the only part that should be different.
This doesn’t really work when you start sharding tests.
It works just fine - you have ci scripts for tests-group-1, test-group-2, and so on. You data collection will need to aggregate data from them all, but that is something most data collection systems have (and at least the ones I know of also allow individuals to upload their local test data thus meaning you can test that upload locally). If you break those test groups up right most developers will know which they should run as a result of their changes (if your tests are not so long that developers wouldn't run them all before pushing then you shouldn't shard anyway, though it may be reasonable to say your CI shards are still longer than what a developer would run locally)
I have had many tests which manipulate the global environment (integration tests should do this, though I'm not convinced the distinction between integration and unit tests is valuable) so the ability run the same tests as CI is very helpful in finding and verifying you fixed these.
I’ll go so far as to say the massive add on/plugin list and featuritis of CI/CD tools is actively harmful to the sanity of your team.
The only functionality a CI tool should be providing is:
- starting and running an environment to build shit in
- accurately tracking success or failure
- accurate association of builds with artifacts
- telemetry (either their own or integration) and audit trails
- correlation with project planning software
- scheduled builds
- build chaining
That’s a lot, but it’s a lot less than any CI tool made in the last 15 years does, and that’s enough.
There’s a big difference for instance between having a tool that understands Maven information enough to present a build summary, and one with a Maven fetch/push task. The latter is a black box you can’t test locally, and your lead devs can’t either, so when it breaks, it triggers helplessness.
If the only answer to a build failure is to stare at config and wait for enlightenment, you fucked up.
100%. The ci/cd job should be nothing more than a wrapper around the actual logic which is code in your repo.
I write a script called `deploy.sh` which is my wrapper for my ci/cd jobs. It takes options and uses those options to find the piece of code to run.
The ci/cd job can be parameterized or matrixed. The eventually-run individual jobs have arguments, and those are passed to deploy.sh. Secrets/environment variables are set from the ci/cd system, also parameterized/matrixed (or alternately, a self-hosted runner can provide deploy.sh access to a vault).
End result: from my laptop I can run `deploy.sh deploy --env test --modules webserver` to deploy the webserver to test, and the CI/CD job also runs the same job the same way. The only thing I maintain that's CI/CD-specific is the GitHub Action-specific logic of how to get ready to run `deploy.sh`, which I write once and never change. Thus I could use 20 different CI/CD systems, but never have to refactor my actual deployment code, which also always works on my laptop. Vendor lock-in is impossible, thanks to a little abstraction.
(If you have ever worked with a team with 1,000 Jenkins jobs and the team has basically decided they can never move off of Jenkins because it would take too much work to rewrite all the jobs, you'll understand why I do it this way)
Hey if you’ve never heard of it consider using just[0], it’s a better makefile and supports shell scripting explicitly (so at least equivalent in power, though so is Make)
[0]: https://github.com/casey/just
The shell also supports shell scripting! You don't need Just or Make
Especially for Github Actions, which is stateless. If you want to reuse computation within their VMs (i.e. not do a fresh build / test / whatever), you can't rely on Just or Make
A problem with Make is that it literally shells out, and the syntax collides. For example, the PID in Make is $$$$, because it's $$ in shell, and then you have to escape $ as $$ with Make.
I believe Just has similar syntax collisions. It's fine for simple things, but when it gets complex, now you have {{ just vars }} as well as $shell_vars.
It's simpler to "just" use shell vars, and to "just" use shell.
Shell already has a lot of footguns, and both Just and Make only add to that, because they add their own syntax on top, while also depending on shell.
Thank you, I have seen it, but I prefer Make.
I bet all your targets are .PHONY?
But I can install it on any Linux system from the base repository
If all your targets are .PHONY, you might as well just as bash (or your favourite shell) directly.
Make targets look suspiciously like functions (or procedures), but they actually aren't.
I don't typically use .PHONY as my targets aren't the same name as files and performance isn't an issue.
Here is an example of a "complex" Makefile I use to help manage Helm deployments (https://github.com/peterwwillis/devops-infrastructure/blob/m...). It uses canned recipes, functions (for loops), default targets, it includes targets and variables from other Makefiles, conditionally crafts argument lists, and more. (It's intended to be a "default" Makefile that is overridden by an additional Makefile.inc file)
I could absolutely rewrite that in a shell script, but I would need to add a ton of additional code to match the existing functionality. Lines of code (and complexity) correlates to bugs, so fewer lines of code = less bugs, so it's easier to maintain, even considering the Make-specific knowledge required.
They say "use the best tool for the job". As far as I've found, for a job like that, Make fits the best. If some day somebody completely re-writes all the functionality of Make in a less-obnoxious way, I'll use that.
> I don't typically use .PHONY as my targets aren't the same name as files and performance isn't an issue.
They are still phony at heart in this case, even if you don't declare them .PHONY.
Make really wants to produce files; if your targets don't produce the files they are named for you are going to run into trouble (or have to be rather careful to avoid the sharp edges).
> They say "use the best tool for the job". As far as I've found, for a job like that, Make fits the best. If some day somebody completely re-writes all the functionality of Make in a less-obnoxious way, I'll use that.
You could try eg Shake (https://shakebuild.com/), but it requires some Haskell.
One of the benefits of just -- it isn't a build system, it's a command runner.
Good facilities for arguments, builtins like absolute locations -- there are a ton of benefits (see the README).
I discovered Just with a similar comment in Hacker News and I want to add my +1.
It is so much better to run scripts with Just than it is doing it with Make. And although I frankly tend to prefer using a bash script directly (much as described by the parent commenter), Just is much less terrible than Make.
Now the only problem is convincing teams to stop following the Make dogma, because it is so massively ingrained and it has so many probems and weirdnesses that just don't add anything if you just want a command executor.
The PHONY stuff, the variable scaping, the every-line-is-a-separate-shell, and just a lot of stuff that don't help at all.
Make has a lot of features that you don't use at first, but do end up using eventually, that Just does not support (because it's trying to be simpler). If you learn it formally (go through the whole HTML manual) it's not hard to use, and you can always refer back to the manual for a forgotten detail.
I don't understand why this is not the evident approach for everyone writing GitHub Actions/GitLab CI/CD yaml etc....
I've struggled in some teams to explained why it's better to extract your command in scripts (ShellCheck on it, scripts are simple to run locally etc...) instead of writing a Frankenstein of YAML and shell commands. I hope someday to find an authoritative guidelines on writing pipeline that promote this approach so at least I can point to this link instead of defending myself being a dinosaur!
In a previous job we had a team tasked with designing these "modern" CI/CD pipeline solutions, mostly meant for Kubernetes, but it was suppose to work for everything. They had such a hard on for tools that would run each step as a separate isolated task and did not want pipelines to "devolve" into shell scripts.
Getting anything done in such environments are just a pain. You spend more time fighting the systems than you do actually solving problems. It is my opinion that a CI/CD system needs just the following features: Triggers (source code repo, http endpoints or manually triggered), secret management and shell script execution. That's it, you can build anything using that.
I think what they really wanted was something like bazel. The only real benefit I can think right now for not "devolving" into shell scripts is distributed caching with hermetic builds. It has very real benefits but it also requires real effort to work correctly.
I just joined as the enterprise architect for company that has never had one. There is an existing devops team that is making everyone pull their hair out and I haven't had a single spare minute to dig in on their mess but this sounds early familiar.
Is this really the job of an enterprise architect? To dig into the details of a devops team's mess?
The job of senior people should mostly be to make sure the organisation runs smoothly.
If no one else is doing anything about the mess, then it falls to the senior person to sort it out.
As a rule of thumb:
- Ideally your people do the Right Thing by themselves by the magic of 'leadership'. - Second best: you chase the people to do the Right Thing. - Third: you as the senior person do the Right Thing. - Least ideal: no one fixes the mess nor implements the proper way.
I guess some people can achieve the ideal outcome with pure charisma (or fear?) alone, but I find that occasionally getting your hands dirty (option 3) helps earn the respect to make the 'leadership' work. It can also help ground you in the reality of the day to day work.
However, you are right that a senior person shouldn't get bogged down with such work. You need to cut your losses at some point.
Where I work, which granted is a very large company, the enterprise architects focus on ERP processes, logistic flows, how prices flow from the system where they are managed to the places that need them, and so on. They are several levels removed from devops teams. DevOps concerns are maybe handled by tech leads, system architects or technical product managers.
Makes sense. From datavirtue's comment is sounded like they joined a much smaller outfit without much in terms of established _working_ procedures here.
It's more about giving the team and the overall strategy a thumbs up or down, so yes.
Mostly agreed, but (maybe orthogonal) IME, popular CI/CD vendors like TeamCity* can make even basic things like shell script execution problematic.
* TC offers sh, full stop. If you want to script something that depends on bash, it's a PITA and you end up with a kludge to run bash in sh in docker in docker.
> If you need a specific interpreter to be used, specify shebang (for example, #!/bin/bash) as the first line of the script.
https://www.jetbrains.com/help/teamcity/command-line.html#:~...
Your "docker in docker" comment makes me wonder if you're conflating the image that you are using, that just happens to be run by TeamCity, versus some inherent limitation of TC. I believe a boatload of the Hashicorp images ship with dash, or busybox or worse, and practically anything named "slim" or "alpine" similarly
My "favorite" is when I see people go all in, writing thousands of lines of Jenkins-flavor Groovy that parses JSON build specifications of arbitrary complexity to sort out how to build that particular project.
"But then we can reuse the same pipeline for all our projects!"
I think that is pitfall of software devs.
For me it was an epiphany as software dev - not to write reusable extensible scripts - I am so much more productive after that.
> "But then we can reuse the same pipeline for all our projects!"
oh god just reading that gave me PTSD flash backs.
At $priorGig there was the "omni-chart". It was a helm chart that was so complex it needed to be wrapped in terraform and used composable terraform modules w/ user var overrides as needed.
Debugging anything about it meant clearing your calendar for the day and probably the following day, too.
I can rarely reuse the same pipeline for the same project 6 months down the road, much less reuse for anything else.
The few bits that end up getting reused are the externalized bash scripts.
I think I can summarize it in a rough, general way.
To make the thing actually fast at scale, a lot of the logic ends up being specific to the provider; requiring tokens, artifacts etc that aren't available locally. You end up with something that tries to detect if you're running locally or in CI, and then you end up in exactly the same situation.
You are right, and this is where a little bit of engineering comes in. Push as much of the logic to scripts (either shell or python or whatever) that you can run locally. Perhaps in docker, whatever. All the token, variables, artifacts etc should act as inputs or parameters to your scripts. You have several mechanisms at your disposal, command line arguments, environment variables, config files, etc. Those are all well understood, universal, language and environment agnostic, to an extent.
The trick is to NOT have your script depend on the specifics of the environment, but reverse the dependency. So replace all `If CI then Run X else if Local Run Y` with the ability to configure the script to run X or Y, then let the CI configure X and local configure Y. For example.
I'm not saying it is always easy and obvious. For bigger builds, you often really want caching and have shitloads of secrets and configurations going on. You want to only build what is needed, so you need something like a DAG. It can get complex fast. The trick is making it only as complex as it needs be, and only as reusable as and when it is actually re-used.
> The trick is to NOT have your script depend on the specifics of the environment, but reverse the dependency. So replace all `If CI then Run X else if Local Run Y` with the ability to configure the script to run X or Y, then let the CI configure X and local configure Y. For example.
> I'm not saying it is always easy and obvious. For bigger builds, you often really want caching and have shitloads of secrets and configurations going on.
Here's the thing. When you don't want secrets, or caching, or messaging, or conditional deploys, none of this matters. Your build can be make/mvn/go/cargo and it just works, and is easy. It only gets messy when you want to do "detect changes since last run and run tests on those components", or "don't build the moon and the stars, pull that dependency in CI, as local users have it built already and it won't change." And the way to handle those situations involves running different targets/scripts/whatever and not what is actually in your CI environment.
I've lost count of how many deploys have been marked as failed in my career because the shell script for posting updates to slack has an error, and that's not used in the local runs of CI.
What you _actually_ need is a staging branch of your CI and a dry-run flag on all of your state modifying commands. Then, none of this matters.
A shell script has many extremely sharp edges like dealing with stdin, stderr, stdout, subprocesses, exit codes, environmental variables, etc.
Most programmers have never written a shell script and writing CI files is already frustrating because sometimes you have to deploy, run, fix, deploy, run, fix, which means nobody is going to stop in the middle of that and try to learn shell scripting.
Instead, they copy commands from their terminal into the file and the CI runner takes care of all the rough edges.
I ALWAYS advise writing a shell script but I know it's because I actually know how to write them. But I guess that's why some people are paid more big bux.
GitHub's CI yaml also accepts eg Python. (Or anything else, actually.)
That's generally a bit less convenient, ie it takes a few more lines, but it has significantly fewer sharp edges than your typical shell script. And more people have written Python scripts, I guess?
I find that Python scripts that deal with calling other programs have even more sharp edges because now you have to deal with stdin, stderr and stdout much more explicitly. Now you need to both know shell scripting AND Python.
Python’s subprocess has communicate(), check_output() and other helpers which takes care of a lot but (a) you need to know what method you should actually call and (b) if you need to do something outside of that, you have to use Popen directly and it’s much more work than just writing a shell script. All doable if you understand pipes and all that but if you don’t, you’ll be just throwing stuff at the wall until something sticks.
1) If possible, don't run shell scripts with Python. Evaluate why you are trying to do that and don't. 2) Python has a bunch of infrastructure compared to shell, you can use it. Shell scripts don't. 3) Apply the same you used for the script to what it calls. CI calls control script for job, script calls tools/libraries for heavy lifting.
Often the shell script just calls a python/Ruby/rust exec anyways...
Shell scripts are for scripting...the shell. Nothing else.
Sure but I already know all this at a level deeper than most people.
Your average person is going to be blindsighted.
Your average person will be blind sighted either way, at least one way they have a much better set of tools to help them out, once they are blind sighted.
[dead]
Yes, but at least that's all fairly obvious---you might not know how to solve the problem, but at least you know you have a problem that needs solving. Compare that to the hidden pitfalls of eg dealing with whitespace in filenames in shell scripts. Or misspelled variable names that accidentally refer to non-existent variables but get treated as if they are set to be "".
This all reminds me of the systemd ini-like syntax vs shell scripts debate. Shell scripts are superior, of course, but they do require deeper knowledge of unix-like systems.
yeah if you author CI jobs, you should know linux, otherwise a person should not even touch the CI system with 10ft pole
I've been working with Linux since I was 10 (I'm much older now), and I still don't think I "know Linux". The upper bound on understanding it is incredibly high. Where do you draw the line?
just basics enough to understand typical bash commands, scripts, how env variables work, etc
> [...] instead of writing a Frankenstein of YAML and shell commands.
The 'Frankenstein' approach isn't what makes it necessarily worse. Eg Makefiles work like that, too, and while I have my reservations about Make, it's not really because they embed shell scripts.
it can be quite hard to write proper scripts that work consistently... different shells have different behaviours, availability of local tools, paths, etc
and it feels like fighting against the flow when you're trying to make it reusable across many repos
Containerize the build environment so everything is captured (dependencies, build tools, etc)
If you don't value your and your developer's time, certainly, containerize everything.
I've rarely seen a feedback loop with containers that's not longer than 10s only due to containerization itself, and that breaks the "golden" 10s rule (see https://www.nngroup.com/articles/response-times-3-important-...).
If you aim for quicker turn-around (eg. just running a single test in <1s), you'll have to either aggressively optimize containers (which is pretty non-idiomatic with Docker containers in particular), or do away with them.
> I've rarely seen a feedback loop with containers that's not longer than 10s only due to containerization itself
Sounds like a skill issue tbh.
`time podman run —-rm -it fedora:latest echo hello` will return in a few milliseconds, whatever delay you are complaining about would be from the application running in the container.
Lastly containers != docker.
Are you talking about CI or local development? Why would you run a single test in CI? And why would a container add 10+ seconds to a local task?
I am talking about either, because the GP post was about "containerizing a build environment": you need your project built to either run it in CI or locally.
Why would it be slow?
It needs to be rebuilt? (on a fast moving project with mid-sized or large team, you'll get dependency or Dockerfile changes frequently)
It needs to restart a bunch of dependent services?
Container itself is slow to initialize?
Caching of Docker layers is tricky, silly (you re-arrange a single command line and poof, it's invalidated, including all the layers after) and hard to make the most of.
If you can't get a single test running in <1s, you are never going to get a full test suite running in a couple of seconds, and never be able to do an urgent deploy in <30s.
If you're not containerizing your CI/CD, you're really lost.
That might be the case if Docker did in fact guarantee (or at least make it easy to guarantee) deterministic builds -- but it doesn't really even try:
1. Image tags ("latest", etc.) can change over time. Does any layer in your Dockerfile -- including inside transitive deps -- build on an existing layer identified by tag? If so, you never had reproducibility.
2. Plenty of Dockerfiles include things like "apt-get some-tool" or its moral equivalent, which will pull down whatever is the latest version of that tool.
It's currently common and considered normal to use these "features". Until that changes, Docker mostly adds only the impression of reproducibility, but genuine weight and pain.
The advantage that Docker brings isn't perfect guaranteed reusability, it's complete independence (or as close you can easily get while not wasting resources on VMs) from the system on which the build is running, plus some resuability in practice in a certain period of time, and given other good practices.
Sure, if I try to rebuild a docker image from 5 years ago, it may fail, or produce something different, because it was pulling in some packages that have changed significantly in apt, or perhaps pip has changed encryption and no longer accepts TLS1.1 or whatever. And if you use ':latest' or if your team has a habit of reusing build numbers or custom tags, you may break the assumptions even sooner.
But even so, especially if using always incremented build numbers as image tags, a docker image will work the same way om the Jenkins system, the GitHub Actions pipeline, and every coworker's local build, for a while.
I agree that Docker doesn't do the best job here, you can still get a lot better reproducibility than without though.
You can use specific image hashes to work around image tag changes.
For problem 2 ideally use something like NixOS as a base or at least nix package manager.
But IMO Nix is quite complicated, even though the working model is really good. So this is mostly if you really need deterministic builds.
Docker already gets you 80% of the way with 20% of the effort.
1. Use a hash for the base images. 2. The meaning of “latest” is dependent on what base images you are using. Using UBI images for example means your versions are not going to change because redhat versions don’t really change.
But really containerizing the build environment is not related to deterministic builds as there’s a lot more work needed to guarantee that. Including possible changes in the application itself.
Lastly you don’t need to use docker or dockerfiles to build containers. You can use whatever tooling you want to create a rootfs and then create an image out of that. A nifty way of actually guaranteeing reproducibility is to use nix to build a container image.
>But really containerizing the build environment is not related to deterministic builds
What would you say the goal of containerising builds is, if not reproducibility?
They didn't say reproducibility, they said determinism.
If you use a container with the same apt-get commands and the same OS, you already separate out almost all the reproducibility issues. What you get will be a very similar environment.
But it's not deterministic as you point out, that takes a lot more effort.
How do I containerize building desktop apps for windows with MSVC?
Wine?
Less snarky: https://learn.microsoft.com/en-us/virtualization/windowscont... seems to be a thing?
And in any case, you can use VMs instead of containers.
You know, I'd love to run MSVC in Wine on a ubuntu container. I bet it would be quicker.
I've had the unfortunate pleasure of working with Windows Containers in the past. They are Containers in the sense that you can write a dockerfile for them, and they're somewhat isolated, but they're no better than a VM in my experience.
> And in any case, you can use VMs instead of containers.
They're not the same thing. If that's the case running linux AMI's on EC2 should be the same as containers.
It’s the same thing for the purposes of capturing the build environment.
It doesn’t really matter if you have to spin up a Linux instances and then run your build environment as a container there vs spinning up a windows VM.
Only if your tech stack is bad (i.e. Python). My maven builds work anywhere with an even vaguely recent maven and JVM (and will fail-fast with a clear and simple error if you try to run them in something too old), no need to put an extra layer of wrapping around that.
It's trivial to control your Python stack with things like virtualenv (goes back to at least 2007) and has been for ages now (I don't really remember the time when it wasn't, and I've been using Python for 20+ years).
What in particular did you find "bad" with Python tech stack?
(I've got my gripes with Python and the tooling, but it's not this — I've got bigger gripes with containers ;-))
> What in particular did you find "bad" with Python tech stack?
Stateful virtualenvs with no way to check if they're clean (or undo mistakes), no locking of version resolution (much less deterministic resolution), only one-way pip freeze that only works for leaf projects (and poorly even then), no consistency/standards about how the project management works or even basic things like the directory layout, no structured unit tests, no way to manage any of this stuff because all the python tooling is written in python so it needs a python environment to run so even if you try to isolate pieces you always have bootstrap problems... and most frustrating of all, a community that's ok with all this and tries to gaslight you that the problems aren't actually problems.
Sounds a lot like nitpicking, and I'll demonstrate why.
With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right? You resolve both by recreating them from scratch (and you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally).
The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.
As for directory layout, it's pretty clear it's guided by Python import rules: those are tricky, but once you figure them out, you know what you can and should do.
Can you clarify what do you mean with "structured unit tests"? Python does not really limit you in how you organize them, so I am really curious.
Sure, a bootstrapping problem does exist, but rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground, after which you can easily control all the deps in them (again a requirements-dev.txt.in + requirements-dev.txt pattern will help you).
And there's a bunch of new dev tools springing up recently that are written in Rust for Python, so even that points at a community that constantly works to improve the situation.
I am sorry that you see this as "gaslighting" instead of an opportunity to learn why someone did not have the same negative experience.
> With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right?
I guess theoretically you could, but I don't think that's part of anyone's normal workflow. Whereas it's extremely easy to run "pip install" from project A's directory with project B's virtualenv active (or vice versa). You might not even notice you've done it.
> You resolve both by recreating them from scratch
But with Docker you can wipe the container and start again from the image, which is fixed. You don't have to re-run the Dockerfile and potentially end up with different versions of everything, which is what you have to do with virtualenv (you run pip install and get something completely different from the virtualenv you deleted).
> you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally
But you have to undo it every time you want to add or update a dependency. In other ecosystems it's easy to keep my dependencies in line with what's in the equivalent of requirements.txt, but hard to install some random other unmanaged dependency. In the best ecosystems there's no need to "install" your dependencies at all, you just always have exactly the packages listed in the requirements.txt equivalent available at runtime when you run things.
> The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.
I've literally never seen a project that does that. And even if you do that, it's still harder to work with because you can't upgrade one dependency without unlocking all of your dependencies, right?
> As for directory layout, it's pretty clear it's guided by Python import rules
I don't mean within my actual code, I mean like: where does source code go, where does test code go, where do non-code assets go.
> Can you clarify what do you mean with "structured unit tests"?
I mean, like, if I'm at looking at a particular module in the source code, where do I go to find the tests for that module? Where's the test-support code as distinct from the specific tests?
> rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground
Whether virtualenv is available is a relatively recent change, so you already have a fractal problem. Having an uncontrolled way of installing your build environment is another of those things that's fine until it isn't.
> And there's a bunch of new dev tools springing up recently that are written in Rust for Python
Yeah, that's the one thing that gives me some hope that there might be light at the end of the tunnel, since I hear they mostly ignore all this idiocy (and avoid e.g. having user-facing virtualenvs at all) and just do the right thing. Hopefully once they catch on we'll see Python start to be ok without containers too and maybe the container hype will die down. But it's certainly not the case that everything has been fine since 2007; quite the opposite.
except you need to install the correct Java version and maven (we really should be using gradle by now)
Also in many projects there’s things other than code that need to be “built” (assets, textures, translations, etc). Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there. That’s objectively worse than the YAML hell we were complaining about at the top of the thread.
> you need to install the correct Java version and maven
Like I said, any version from the last, like, 10+ years (Java and Maven are both serious about backward compatibility), and if you install an ancient version you at least get fail-fast with a reasonable error.
> we really should be using gradle by now
We really shouldn't.
> Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there.
Maven doesn't have a build.xml, are you thinking of ant? With maven you write your custom build steps as build plugins, and they're written in plain old Java (or Kotlin, or Scala, or...) code, as plain old Maven modules, with the same kind of ordinary unit testing as your regular code; all your usual code standards apply (e.g. if you want to check your test coverage, you do it the same way as for your regular code - indeed, probably the same configuration you set up for your regular code is already getting applied). That's a lot better than YAML.
I'm not sold on using containers for macOS desktop apps...
Pick a single shell and treat it like a programming language.
Or write your stuff in eg Python in the first place. GitHub's CI yaml supports scripts in arbitrary languages, not just shell.
The reason for this is that nobody took the time to write a proper background document on Github Actions. The kind of information that you or I might convey if asked to explain it at the whiteboard to junior hires, or senior management.
This syndrome is very common these days. Things are explained differentially: it's like Circle CI but in the GH repo. Well that's no use if the audience wasn't around when Circle CI was first new and readily explained (It's like Jenkins but in the cloud...).
> My policy is to never let pipeline DSLs contain any actual logic outside orchestration for the task,
I call this “isomorphic CI” — ie: as long as you set the correct env vars, it should run identically on GitHub actions, Jenkins, your local machine, a VM etc
This is the only DevOps way. Abstract the build into a single step.
And yet, you would be surprised at the amount of people who react like that's an ignorant statement ("not feasible in real world conditions"), an utopic goal ("too much time to implement"), an impossible feat ("automation difficults human oversight"), or, my favorite, the "this is beneath us" excuse ("see, we are special and this wouldn't work here").
Automation renders knowledge into a set of executable steps, which is much better than rendering knowledge into documentation, or leaving it to rot in people's minds. Compiling all rendered knowledge into a single step is the easiest way to ensure all elements around the build and deployment lifecycle work in unison and are guarded around failures.
Building a Github-specific CI pipeline similarly transfer it into a set of executable steps.
The only difference is that you are now tied to a vendor for executing that logic, and the issue is really that this tooling is proprietary software (otherwise, you could just take their theoretical open source runner system and run it locally).
To me, this is mostly a question of using non-open-source development tools or not.
Yep. I remember at a previous company multiple teams had manually created steps in TeamCity (and it wasn't even being backed up in .xml files).
I just did my own thing and wrapped everything deploy.sh and test.sh and when the shift to another system came... well it was still kind of annoying, but at least I wasn't recreating the whole thing.
i like this term
That’s usually very hard or impossible for many things. The AzDo yaml consists of a lot of steps that are specific to the CI environment (fetching secrets, running tests on multiple nodes, storing artifacts of various kinds).
Even if the ”meat” of the script is a single build.ps oneliner, I quickly end up with 200 line yaml scripts which have no chance of working locally.
Azure DevOps specifically has a very broken approach to YAML pipelines, because they effectively took their old graphical pipeline builder and just made a YAML representation of it.
The trick to working with this is that you don't need any of their custom Azure DevOps task types, and can use the shell type (which has a convenient shorthand) just as well as in any other CI environment. Even the installer tasks are redundant - in other CI systems, you either use a container image with what you need, or install stuff at the start, and Azure DevOps works with both of these strategies.
So no, it's neither hard nor impossible, but Microsoft's half-assed approach to maintaining Azure DevOps and overall overcomplicated legacy design makes it a bit hard to realize that doing what their documentation suggests is a bad idea, and that you can use it in a modern way just fine. At least their docs do not recommend that you use the dedicated NPM-type task for `npm install` anymore...
(I could rant for ages about Azure DevOps and how broken and unloved it is from Microsoft's side. From what I can tell, they're just putting in the minimum effort to keep old Enterprise customers that have been there through every rename since Team Foundation Server from jumping ship - maybe just until Github's enterprise side has matured enough? Azure DevOps doesn't even integrate well with Azure, despite its name!)
It has been on life support for a long time AFAIK. I designed Visual Studio Online (the first launch of AzDO) - and every engineer, PM, and executive I worked with is either in leadership at GitHub or retired.
It feels clear from an outside perspective that all the work on AzDO Pipelines has shifted to focus on GitHub Actions and Actions is now like 3 or 4 versions ahead. Especially because the public Issue trackers for some of AzDO Pipelines "Roadmap" are still up (on GitHub, naturally) and haven't been updated since ~2020.
I wish Microsoft would just announce AzDO's time of death and save companies increasingly crazy AzDO blinders and/or weird mixes of GitHub and AzDO as GitHub-only is clearly the present/future.
Yeah feels like they should be able to converge actions and pipelines.
Keeping some separation between AzDo itself and GH also requires some balancing. But so far I’m pretty sure I could never sell our enterprise on a shift to GH. Simply not enough jira-esque features in GH with complex workflows, time reporting etc so I can’t see them doing the bigger GH/AzDo merger.
This month's rollout of sub-issues and issue types would be most of what my organization thinks it needs to shift to GH Issues, I believe, barring however long it would take to rewrite some sync up automation with ServiceNow based on those issue types. Of course it will take another 6 months to a year before those kinds of features make it to GitHub Enterprise, so it is still not happening any time soon. (Though that gets back to my "weird" mixes. I don't entirely know why my company is using AzDO SaaS for Issue Tracking but decided GHE over "normal" cloud GH for Repos. But that's not the weirdest mix I've seen.)
I definitely get the backwards compatibility thing and "don't break someone's automation", but at the same time, Microsoft could at least mark AzDO's official Roadmap as "Maintenance Only" and send the message that feels obvious as a user that GitHub is getting far more attention than AzDO can, but is hard to convince management and infosec that a move to GitHub is not just "the future" but "the present" (and also maybe "the past", now, given AzDO seems to have been frozen ~2020).
This doesn’t seem to address the parent comment’s point at all, which was about required non-shell configuration such as for secrets, build parallelism, etc.
I'm increasingly designing CI stuff around rake tasks. Then I run rake in the workflow.
But that caters only for each individual command... as you mention the orchestration is still coded in, and duplicated from what rake knows and would do.
So I'm currently trying stuff that has a pluggable output: one output (the default) is that it runs stuff, but with just a rake var, instead of generating then running commands it generates workflow content that ultimately gets merged in an ERB workflow template.
The model I like the most though is Nix-style distributed builds: it doesn't matter if you do `nix build foo#bar` (local) or `nix build -j0 foo#bar` (zero local jobs => use a remote builder†), the `foo#bar` "task" and its dependents gets "built" (a.k.a run).
† builders get picked matching target platform and label-like "features" constraints.
Ever since there has been gitlab-runner, I've wondered why the hell can't I just submit some job to a (list of) runner(s) - some of which could be local - without the whole push-to-repo+CI orchestrator? I mean I don't think it would be out of this world to write a CLI command that locally parses whatever-ci.yml, creates jobs out of it, and submit them to a local runner.
The actual subtle issue here is that sometimes you actually need CI features around caching and the like, so you are forced to engage with the format a bit.
You can, of course, chew it down to a bare minimum. But I really wish more CI systems would just show up with "you configure us with scripts" instead of the "declarative" nonsense.
CI that isn't running on your servers wants very deep understanding of how your process works so they can minimize their costs (this is true whether or not you pay for using CI)
Totally! It's a legitimate thing! I just wish that I had more tools for dynamically providing this information to CI so that it could work better but I could also write relatively general tooling with a general purpose language.
The ideal for me is (this is very silly and glib and a total category error) LSP but for CI. Tooling that is relatively normalized, letting me (for example) have a pytest plugin that "does sharding" cleanly across multiple CI operators.
There's some stuff and conventions already of course, but in particular caching and spinning up jobs dynamically are still not there.
I agree with wrapping things like build scripts to test locally.
Still, some actions or CI steps are also not meant to be run locally. Like when it publishes to a repo or needs any credentials that are used by more than one person.
Btw, Github actions and corresponding YAML are derived from Azure DevOps and are just as cursed.
The whole concept of Github CI is just pure misuse of containers when you need huge VM images - container is technically correct, but a far fetched word for this - that have all kinds of preinstalled garbage to run typescript-wrapped code to call shell scripts.
This is the right way to use CI/CD systems, as dumb orchestrators without inherent knowledge of your software stack. But the problem is, everything from their documentation, templates, marketplace encourage you to do exactly the opposite and couple your build tightly with their system. It's poor product design imo, clearly optimising for vendor lock-in over usability.
I use gitlab-ci-local to run Gitlab pipelines locally - does such a thing not exist for GitHub actions?
Oh, yeah, I remember looking at that a while back. I don't recall how much it had implemented at the time but it seems that firecow took a vastly different approach than nektos/act did, going so far as to spend what must have been an enormous amount of time/energy to cook up https://github.com/firecow/gitlab-ci-local/blob/4.56.2/src/s... (and, of course, choosing a dynamically typed language versus golang)
>Lack of local development. It's a known thing that there is no way of running GitHub Actions locally.
This is one thing I really love about Buildkite[0] -- being able to run the agent locally. (Full disclosure: I also work for Buildkite.) The Buildkite agent runs as a normal process too (rather than as a Docker container), which makes the process of workflow development way simpler, IMO. I also keep a handful of agents running locally on my laptop for personal projects, which is nice. (Why run those processes on someone else's infra if I don't have to?)
>Reusability and YAML
This is another one that I believe is unique to Buildkite and that I also find super useful. You can write your workflows in YAML of course -- but you can also write them in your language of choice, and then serialize to YAML or JSON when you're ready to start the run (or even add onto the run as it's running if you need to). This lets you encapsulate and reuse (and test, etc.) workflow logic as you need. We have many large customers that do some amazing things with this capability.
[0]: https://buildkite.com/docs/agent/v3
Are you two talking about the same thing? I believe the grandparent is talking about running it locally on development machines, often for testing purposes.
Asking because Github Action also supports Self-Hosted runners [1].
[1] https://docs.github.com/en/actions/hosting-your-own-runners/...
Same thing, yeah, IIUC (i.e., running the agent/worker locally for testing). It's conceptually similar to self-hosted runners, yes, but also different in a few practical ways that may matter to you, depending on how you plan to run in production.
For one, with GitHub Actions, hosted and self-hosted runners are fundamentally different applications; hosted runners are fully configured container images, (with base OS, tools, etc., on board), whereas self-hosted runners are essentially thin, unconfigured shell scripts. This means that unless you're planning on using self-hosted runners in production (which some do of course, but most don't), it wouldn't make sense to dev/test with them locally, given how different they are. With Buildkite, there's only one "way" -- `buildkite-agent`, the single binary I linked to above.
The connection models are also different. While both GHA self-hosted runners and the Buildkite agent connect to a remote service to claim and run jobs, GHA runners must first be registered with a GitHub org or repository before you can use them, and then workflows must also be configured to use them (e.g., with `runs-on` params). With Buildkite, any `buildkite-agent` with a proper token can connect to a queue to run a job.
There are others, but hopefully that gives you an idea.
I think it's mostly obvious how you could implement CI with "local-first" programs (scripts?), but systems like Github provide value on top by making some of the artifacts or steps first-class objects.
Not sure how much of that does Github do, but it could parse test output (I know we used that feature in Gitlab back when I was using that on a project), track flaky tests for you or give you a link to code for a failing test directly.
Or it could highlight releases on their "releases" page, with release notes prominently featured.
And they allow you to group pipelines by type and kind, filter by target environment and such.
On top of that, they provide a library of re-usable workflows like AWS login, or code checkout or similar.
With all that, the equation is not as clear cut: you need to figure out how to best leverage some of those (or switch to external tools providing them), and suddenly, with time pressure, just going with the flow is a more obvious choice.
Indeed. Anything else is just asking for trouble.
CI must only be an application of tools available to developers locally.
This, other than for practical reasons, is a tax on complexity.
100% agree and that has been my experience too. It also makes testing the logic locally much easier, just run the script in the appropriate container.
The pipeline DSLs, because they are not full programming languages have to include lots of specific features and options and if you want something slightly outside of what they are designed for, you are out of luck. In a way it feels like how graphics were in the age of fixed-pipeline, when there had to be a complex API to cover all use cases yet it was not flexible enough.
This is my preferred way of doing things as well. Not being able to run the exact same thing that's running in CI easily locally is a bit of a red flag in my opinion. I think the only exception I've ever encountered to this is when working on client software for HSMs, which had some end-to-end tests that couldn't be run without actually connecting to the specific hardware that took some setup to be able to access when running tests locally.
That’s my policy too. I see way too many Jenkins/Actions scripts with big logic blocks jammed into YAML. If the entire build and test process is just a single script call, we can run it locally, in a GitHub workflow, or anywhere else. Makes it less painful to switch CI systems, and devs can debug easily without pushing blind commits. It’s surprising how many teams don’t realize local testing alone saves huge amounts of time.
When I automate my github actions I keep everything task orientated and if anything is pushing code or builds it has user step to verify the work by the automation. You approve and then merge it kicks off promotional pipelines not necessarily for deployment but to promote a build as stable through tagging.
https://nix-ci.com does this by construction.
So for example, my action builds on 4 different platforms (win-64, linux-amd64, mac-intel, mac-arm), it does this in parallel then gets the artifacts for all for and bundles them into a single package.
How would you suggest I do this following your advice?
While youre correct, environmental considerations are another advantage that testing locally SHOULD be able to provide (i.e. you can test your scripts or Make targets or whatever in the same runner that runs in the actual build system.)
This is not possible with GHA.
Of course you can, just specify a container image of your choice and run the same container for testing locally.
However, replicating environmental details is only relevant where the details are known to matter. A lot of effort has been wasted and workflows crippled by the idea that everything must be 100% identical irrespective of actual dependencies and real effects.
there's:
https://github.com/Pernosco/gha-runner
but I'm not sure how complete it is, and probably doesn't satisfy the author's use cases
Although there are definitely merits in moving the complex logic outside of the CI/CD JSON/YAML DSL, especially when using monorepo setups that can become rather complex in their logic (that they made Google create Bazel, I can think of some interesting Borg/K8s analogies btw), I also believe that modern CI/CD platforms have made several sensible steps in the right direction to handle these more complicated use cases.
(Disclaimer: I work at CircleCI)
At CircleCI for example, we have added valuable features like a VSCode extension[0] to validate and "dry-run" config from within your IDE, we have local runners[1] that you can use to test and run pipelines on your local machine and your own infra, we have dynamic config[2], a Javascript/Typescript SDK[3], a CLI that can validate and run workflows locally[4], and QoL additions like a no-op job type[5] and flexible requires, along with flexible when statements and expression based job filters[6].
And finally, it's of course also possible to combine different approaches into a "best of both worlds" approach, f.e. combining Dagger with CircleCI[7].
[0]https://circleci.com/docs/vs-code-extension-overview/
[1]https://circleci.com/blog/using-runner-for-local-testing/
[2]https://circleci.com/docs/dynamic-config/
[3]https://circleci.com/docs/circleci-config-sdk/
[4]https://circleci.com/docs/how-to-use-the-circleci-local-cli/
[5]https://circleci.com/changelog/new-job-type-no-op-job-can-ma...
[6]https://circleci.com/changelog/more-flexible-job-required-ca...
[7]https://docs.dagger.io/integrations/circleci/
Folks pick the wrong tool for the job at hand.
I suspect the author of the article could greatly simplify matters if they used a task running tool to orchestrate running tasks, for example. Pick whatever manner of decoupling you want really, most of the time this is the path to simplified CI actions. CI is best when its thought of as a way to stand up fresh copies of an environment to run things inside of.
I have never had the struggles that so many have had with CI as a result. Frankly, I'm consistently surprised at how overly complex people make their CI configurations. There's better tools for orchestration and dependency dependent builds, which is not its purpose to begin with.
I generally agree with you, but I'd be interested to hear your take on what the purpose of CI _actually is_.
It seems to me that a big part of the problem here (which I have also seen/experienced) is that there's no one specific thing that something like GitHub Actions is uniquely suited for. Instead, people want "a bunch of stuff to happen" when somebody pushes a commit, and they imagine that the best way to trigger all of that is to have an incredibly complex - and also bespoke - system on the other end that does all of it.
It's like we learned the importance of modularity in the the realm of software design, but never applied what we learned to the tools that we work with.
Standing up fresh images for validation and post validation tasks.
CI shines for running tests against a clean environment for example.
Really any task that benefits from a clean image being stood before running a task.
The key though is to decouple the tasks from the CI. Complexity like pre-packaging artifacts is not a great fit for CI configuration, that is best pushed to a tool that doesn’t require waterfall logic to make it work.
There is a reason task runners are very popular still
How would you set up tool installations? Inside the CI or inside the script?
In our environment (our product is a windows desktop application) we use packer to build a custom windows server 2022 image with all the required tools installed. Build agents run on a azure vm scale set that uses the said image for the instance os.
This. I'd go more relaxed on the one-liner requirement, a few lines are fine, but the approach is correct IMHO.
Oh boy, there's a special kind of hell I enter into everytime I set up new github actions. I wrote a blog post a few months ago about my pain[0] but one of the main things I've found over the years is you can massively reduce how horrible writing github actions is by avoiding prebuilt actions, and just using it as a handy shell runner.
If you write behaviour in python/ruby/bash/hell-rust-if-you-really-want and leave your github action at `run: python some/script.py` then you'll have something that's much easy to test locally, and save yourself a lot of pain, even if you wind up with slightly more boilerplate.
[0] https://benrutter.github.io/posts/github-actions/
At this point, just pause with Github Actions and compare it to how GiLab handles CI.
Much more intuitive, taking shell scripts and other script commands natively and not devolving into a mess of obfuscated typescript wrapped actions that need a shit ton of dependencies.
The problem with Gitlab CI is that now you need to use Gitlab.
I’m not even sure when I started feeling like that was a bad thing. Probably when they started glueing a bunch of badly executed security crud onto the main product.
The earliest warning sign I had for GitLab was when they eliminated any pricing tier below their equivalent of GitHub's Enterprise tier.
That day, they very effectively communicated that they had decided they were only interested in serving Enterprises, and everything about their product has predictably degraded ever since, to the point where now they're now branding themselves "the most comprehensive AI-powered DevSecOps Platform" with a straight face.
GitLab can't even show you more than a few lines of context without requiring you to manually click a bunch of times. Forget the CI functionality, for pull requests it's absolutely awful.
I decided it was a bad thing when they sent password reset emails to addresses given by unauthenticated users. Not that I ever used them. But now it is a hard no, permanently.
They have since had other also severe CVEs. That has made me feel pretty confident in my decision.
If password reset emails shouldn’t be sent to unauthenticated users, how would users reset their passwords?
there was a pretty bad bug (though I think it was a rails footgun)- that allowed you to append an arbitrary email to the reset request.
The only difficult part for the attacker was finding an email address that was used by the target; though thats hsually the same as you use for git commits; and gitlab “handily” has an email address assigned to each user-id incrementing from 1;
Usually low numbers are admins, so, a pretty big attack vector when combined.
But you can do the same with GitHub, right? Although most docs and articles focus on 3rd party actions, nothing stops you to just run everything in your own shell script.
Yes, you can, and we do at my current job. Much of the time it's not even really the harder approach compared to using someone else's action, it's just that the existence of third party actions makes people feel obliged to use them because they wouldn't want to be accused of Not Invented Here Syndrome.
if anything, gitlab's ci seems even worse...
theoretically we could also use https://just.systems/ or https://mise.jdx.dev/ instead of directly calling gh actions but I haven't tried gh actions personally yet , If its really the nightmare you are saying , then that's sad.
I had this idea the other day when dealing with CI and thought it must be dumb because everyone's not already doing it. It would make your CI portable to other runners in future, too.
A lot of folks in this thread are focusing on the monorepo aspect of things. The "Pull request and required checks" problem exists regardless of monorepo or not.
GitHub Actions allows you to only run checks if certain conditions are met, like "only lint markdown if the PR contains *.md files". The moment you decide to use such rules, you have the "Pull request and required checks" problem. No "monorepo" required.
GitHub required checks at this time allow you to use with external services where GitHub has no idea what might run. For this reason, required checks HAVE to pass. There's no "if it runs" step. A required check on an external service might never run, or it might be delayed. Therefore, if GH doesn't have an affirmation that it passed, you can't merge.
It would be wonderful if for jobs that run on GH where GH can know if the action is supposed to run, if required checks could be "require all these checks if they will be triggered".
I have encountered this problem on every non-trivial project I use with GitHub actions; monorepo or not.
This isn't really the problem, though. This is an easy problem to solve; the real problem is that it costs money to do so.
Also: I'm not asserting that the below is good, just that it works.
First, don't make every check a required check. You probably don't need to require that linting of your markdown files passes (maybe you do! it's an example).
Second, consider not using the `on:<event>:paths`, but instead something like `dorny/paths-filter`. Your workflow now runs every time; a no-op takes substantially less than 1 minute unless you have a gargantuan repo.
Third, make all of your workflows have a 'success' job that just runs and succeeds. Again, this will take less than 1 minute.
At this point, a no-op is still likely taking less than 1 minute, so it will bill at 1 minute, which is going to be $.008 if you're paying.
Fourth, you can use `needs` and `if` now to control when your 'success' job runs. Yes, managing the `if` can be tricky, but it does work.
We are in the middle of a very large migration into GitHub Actions from a self-hosted GitLab. It was something we chose, but also due to some corporate choices our options were essentially GitHub Actions or a massive rethink of CI for several dozen projects. We have already moved into code generation for some aspects of GitHub Actions code, and that's the fifth and perhaps final frontier for addressing this situation. Figure out how to describe a graph and associated completion requirements for your workflow(s), and write something to translate that into the `if` statements for your 'success' jobs.
There's a workaround for the 'pull request and required check' issue. You create an alternative 'no op' version of each required check workflow that just does nothing and exits with code 0 with the inverse of the trigger for the "real" one.
The required check configuration on github is just based off of job name, so either the trigger condition is true, and the real one has to succeed or the trigger condition is false and the no op one satisfies the PR completion rules instead.
It seems crazy to me that such basic functionality needs such a hacky workaround, but there it is.
Or you can just check if the step was skipped. I don't get the point of the article.
Managing a monorepo with acyclic dependencies is super easy: dornys path filter in one job and the other jobs check
1. whether their respective or any dependency's path got changed 2. and all dependency jobs were either successful or skipped.
Done. No need to write an article.
Posts like this make me miss Travis. Travis CI was incredible, especially for testing CI locally. (I agree with the author that act is a well done hack. I've stopped using it because of how often I'd have something pass in act and fail in GHA.)
> GitHub doesn't care
My take: GitHub only built Actions to compete against GitLab CI, as built-in CI was taking large chunks of market share from them in the enterprise.
Woodpecker supports running jobs on your own machine (and conveniently provides a command to do that for failed jobs), uses the same sane approach of passing your snippets to the shell directly (without using weird typescript wrappers), and is pluggable into all major forges, GitHub included.
To be fair, GitHub also charges for Actions minutes and storage, so it's one of the few pieces that do generate revenue.
How so? I don’t recall this, and I used Travis, and then migrated to GitHub actions.
As far as I can tell, they are identical as far as testing locally. If you want to test locally, then put as much logic in shell scripts as possible, decoupled from the CI.
My man/woman - you gotta try buildkite. It’s a bit more extra setup since you have to interface with another company, more API keys, etc. But when you outgrow GH actions, this is the way. Have used buildkite in my last two jobs (big US tech companies) and it has been the only pleasant part of CI.
+1
I've use Jenkins, Travis, Circle, Cirrus, GitHub Actions, and Buildkite. Buildkite is leagues ahead of all of the others. It's the only enjoyable CI system I've used.
This is indeed the way.
One really interesting omission to this post is how the architecture of GitHub actions encourages (or at the very least makes deceivingly easy) making bad security decisions.
Common examples are secrets. Organization or repository secrets are very convenient, but they are also massive security holes just waiting for unsuspecting victims to fall into.
Repository environments have the ability to have distinct secrets, but you have to ensure that the right workflows can only access the right environments. It's a real pain to manage at scale.
Being able to `inherit` secrets also is a massive footgun, just waiting to leak credentials to a shared action. Search for and leak `AWS_ACCESS_KEY_ID` anyone?
Cross-repository workflow triggering is also a disaster, and in some circumstances you can abuse the differences in configuration to do things the source repository didn't intend.
Other misc. things about GHA also are cool in theory, but fall down in practice. One example is the wait-timer concept of environments. If you have a multi-job workflow using the same environment, wait-timer applies to EACH JOB in the environment. So if you have a build-and-test workflow with 2 jobs, one for build, and one for test, each step will wait `wait-timer` before it executes. This makes things like multi-environment deployment pipelines impossible to use this feature, unless you refactor your workflows.
Overall, I'd recommend against using GHA and looking elsewhere.
> Search for and leak `AWS_ACCESS_KEY_ID` anyone?
Well that's just someone being a dumbass, since AssumeRoleWithWebIdentity (and its Azure and GCP equivalent) have existed for quite a while. It works flawlessly and if someone does do something stupid like `export HURP_DURP=$AWS_ACCESS_KEY_ID; printenv` in a log, that key is only live for about 15 minutes so the attacker better hurry
Further, at least in AWS and GCP (I haven't tried such a thing in Azure) on can also guard the cred with "if the organization and repo are not ..." then the AssumeRole 403s to ensure that my-awesome-org/junior-dev-test-repo doesn't up and start doing fun prod stuff in GHA
I hate GHA probably more than most, but one can footgun themselves in any setup
I never knew how easy it was to setup role assuming for AWS/GHA. It’s much easier than managing the access/secret.
I wrote a little about it in this blog post: https://joshstrange.com/2024/04/26/nightly-postgres-backups-...
If you, or others, are interested I have found that those role-session-name variables make for a great traceability signal when trying to figure out what GHA run is responsible for AWS actions. So instead of
one can consider I don't this second recall what the upper limit is on that session name so you may be able to fit quite a bit of stuff in thereGreat points. I totally agree, don't use hard-coded static creds, especially here. But in reality, many services and/or API keys don't support OIDC or short-lived credentials, and the design of secrets in GitHub promote using them, in my opinion.
Whilst I do detest much of Azure DevOps, one thing I do like about their pipelines is that we can securely use service connections and key vaults in Azure to secure pipeline tasks that require credentials to be managed securely.
While I do agree with you regarding encouraging bad secret management practices, one fairly nice solution I’ve landed on is using terraform to manage such things. I guess you could even take it a step further to have a custom lint step (running on GHA, naturally) that disallows secrets configured in a certain manner and blocks a deploy (again, on GHA) on failure.
I guess what I’m saying is, it’s GHA all the way down.
What’s your suggestion for not-GHA?
In the end, this is the age old "I built by thing on top of a 3rd party platform, it doesn't quite match my use case (anymore) and now I'm stuck".
Would GitLab have been better? Maybe. But chances are that there is another edge case that is not handled well there. You're in a PaaS world, don't expect the platform to adjust to your workflow; adjust your workflow to the platform.
You could of course choose to "step down" (PaaS to IaaS) by just having a "ci" script in your repo that is called by GA/other CI tooling. That gives you immense flexibility but also you lose specific features (e.g. pipeline display).
> Would GitLab have been better?
My impression of gitlab CI is that it's also not built for monorepos.
(I'm a casual gitlab CI user).
I'm not sure if there's a monorepo vs polyrepo difference; just that anything complex is pretty painful in gitlab. YAML "programming" just doesn't scale.
The problem is that your "ci" script often needs some information from the host system, like what is the target git commit? Is this triggered by a pull request, or a push to a branch? Is it triggered by a release? And if so, what is the version of the release?
IME, much of the complexity in using Github Actions (or Gitlab CI, or Travis) is around communicating that information to scripts or build tools.
That and running different tasks in parallel, and making sure everything you want passes.
Doesn't everything in GitLab go into a single pipeline? GitHub at least makes splitting massive CI/CD setups easier by allowing you to write them as separate workflows that are separate files.
> GitHub at least makes splitting massive CI/CD setups easier by allowing you to write them as separate workflows that are separate files.
this makes me feel like you’re really asking “can i split up my gitlab CICD yaml file or does everything need to be in one file”.
if that’s the case:
yes it does eventually all end up in a single pipeline (ignoring child pipelines).
but you can split everything up and then use the `include` statement to pull it all together in one main pipeline file which makes dealing with massive amounts of yaml much easier.
https://docs.gitlab.com/ee/ci/yaml/includes.html
you can also use `include` to pull in a yaml config from another project to add things like SAST on the fly.
previous workplace i had like 4 CICD template repos and constructed all 30 odd actual build repos from those four templates.
used `include` to pull in some yaml template jobs, which i made run when by doing something like (it’s been a while, might get this wrong)
this doesn’t run anything for `job_b_from_template` … you just end up defining the things you want to run for each case, plus any variables you need to provide / override.you can also override stuff like rules on when it should run if you want to. which is handy.
gitlab CICD can be really modular when you get into it.
if that wasn’t the case: on me.
edit: switched to some yaml instead of text which may or may not be wrong. dunno. i have yet to drink coffee.
addendum you can also do something like this, which means you don’t have to redefine every job in your main ci file, just define the ones you don’t want to run
where the template you import has a job_a and job_b definition. both get pulled in, but job_b gets overwritten so it never runs.less useful when just splitting things into multiple files to make life simpler.
super useful when using the same templates across multiple independent repositories to make everything build in as close to the same way as possible.
You can have pipelines trigger child pipelines in gitlab, but usability of them is pretty bad, viewing logs/results of those always needs extra clicking.
Re: monorepo
> In GitHub you can specify a "required check", the name of the step in your pipeline that always has to be green before a pull request is merged. As an example, I can say that web-app1 - Unit tests are required to pass. The problem is that this step will only run when I change something in the web-app1 folder. So if my pull request only made changes in api1 I will never be able to merge my pull request!
Continuous Integration is not continuous integration if we don’t test that a change has no deleterious side effects on the rest of the system. That’s what integration is. So if you aren’t running all of the tests because they’re slow, then you’re engaging in false economy. Make your tests run faster. Modern hardware with reasonable test runners should be able to whack out 10k unit tests in under a minute. The time to run the tests goes up by a factor of ~7-10 depending on framework as you climb each step in the testing pyramid. And while it takes more tests to cover the same ground, with a little care you can still almost halve the run time replacing one test with a handful of tests that check the same requirement one layer down, or about 70% moving down two layers.
One thing that’s been missing from most of the recent CI pipelines I’ve used is being able to see that a build is going to fail before the tests finish. The earlier the reporting of the failure the better the ergonomics for the person who triggered the build. That’s why the testing pyramid even exists.
This comment is way too far down the page.
If the unit tests are slow enough to want to skip them, they likely are not unit tests but some kind of service-level tests or tests that are hitting external APIs or some other source of a bad smell. If the slow thing is the build, then cache the artifact keyed off the directory contents so the step is fast if code is unchanged. If the unit tests only run for a package when the code changes, there is a lack of e2e/integration testing. So, what is OP's testing strategy? Caching? It seems like following good testing practices would make this problem disappear.
That is true for most cases, which nowadays is web and backend software. As you get into embedded or anything involving hardware things get slower and you need to optimize.
For example, tests involving read hardware can only run at 1x speed, so you will want to avoid running those if you can. If you are building a custom compiler toolchain, that is slow and you will want to skip it if the changes cannot possibly affect the toolchain.
I agree hardware should be that quick, but CI and cloud hardware is woefully underpowered unless you actively seek it out. I’ve also never seen a test framework spew out even close to that in practice. I’m not even sure most frameworks would do that with noop tests, which is sad.
10 years ago my very testing-competent coworker had us running 4200 tests in 37 seconds. In NodeJS. We should be doing as well that today without a gifted maintainer.
I've got an i9 and an NVMe drive. running npm test with 10k no-op tests takes 30 seconds, which is much quicker than I expected it to be (given how slow everything else in the node world is).
Running dotnet test on the other hand with 10k empty tests took 95 seconds.
Honestly, 10k no-op tests should be limited by disk IO, and in an ideal world would be 10 seconds.
> disk IO
How many tests did you put in each file?
2500!
Agreed, most of the CI tools don't help in getting feedback early to the developers. I shouldn't have to wait hours for my CI job to complete. Harness is a tool that can reduce build times by caching build artifacts, docker layers and only running a subset of tests that were impacted the by the code change.
I call writing GitHub Actions "Search and Deploy", constantly pushing to a branch to get an action to run is a terrible pattern...
You'd think, especially with the deep VS Code integration, they'd have at least a basic sanity-check locally, even if not running the full pipeline.
Not just me then? I was trying to fix a GitHub action just today but I have no clue how I'm supposed to tear it, so I just keep making tiny changes and pushing.... Not a good system but I'm still within the free tier so I'm willing to put up with it I guess.
I think it’s everyone, debugging GH actions is absolute hell, and it gets terrifying when the action interacts with the world (e.g. creating and deploying packages to a registry).
> it gets terrifying when the action interacts with the world (e.g. creating and deploying packages to a registry).
To be fair, testing actions with side effects on the wider world is terrifying even if you’re running it locally, maybe more so because your nonstandard local environment may have surprises (e.g. an env var you set then forgot) while the remote environment mostly only has stuff you set/installed explicitly, and you can be sloppier (e.g. accidentally running ./deploy when you wanted to run ./test). That part isn’t a GH Actions problem.
Locally it is much easier to set up and validate test environments, or neuter some of the pipeline to test things out and ensure the rest produces expected results (in fact I usually dry-run by default and require a file or envvar to “real run”). Especially as some jobs (rightfully) refuse to run in PRs.
If this is to troubleshoot non-code related failures (perm issues, connection timed out, whatever influences success that doesn't require a code change) then surely the repo's history would benefit from one just clicking "Re-run Job", or its equivalent $(gh ...) invocation, right?
not necessarily, rerun job will most likely use the fully resolved dependency graph of your actions (or equivalent), a fresh run will re-resolve them (e.g. you pinned to @1 vs the specific version like @1.2.3 of a dep).
the history problem goes away if you always enforce squash merge...
I use Mercurial + hg-git like a weirdo. Not sure if Mercurial supports empty commits, I don't think it does.
I was curious and it certainly appears that you are right
Turns out it does, actually.
From https://stackoverflow.com/a/71428853. I tried it to confirm.Ah yes, I have a git alias created specifically for the "we don't know what it does until we push it" world of CI:
> yolo = "!git commit --all --amend --no-edit && git push --force #"
Biggest pet peeve of GHA by a country mile.
GitHub (Actions) is simply not built to support monorepos. Square peg in a round hole and all that. We've opted for using `meta` to simulate monorepos, while being able to use GitHub Actions without too much downsides.
Which makes me wonder if there is a way to simulate multiple repos while maintaining a mono repo. Or mirror a portion of a monorepo as a single repo.
Obviously this would be a real pain to implement just to fix the underlying problem, but it's an interesting (awful) solution
hey could you please share the`meta` tool you mentioned , sounds interesting ! couldn't find it on internet [skill issue]
Guessing it's https://github.com/mateodelnorte/meta googlefu "meta github repo"
hey thanks!
definitely interesting!
I do wonder if this really solves the author problem because by the looks of it , you just have to run meta command and it would run over each of the sub directory. While at the same time , I think I like it because this is what I think people refer to as "modular monolith"
Combining this with nats https://nats.io/ (hey if you don't want it to be over the network , you could use nats with the memory model of your application itself to reduce any overhead) and essentially just get yourself a really modular monolith in which you can then seperate things selectively (ahem , microservices) afterwards rather easily.
Modular monolith refers to the architecture of your application[1]. It's a different concept from "monorepo", although they can be used together.
I'm not sure what NATS has to do with anything in this post or discussion. Also, a modular monolith is almost the antithesis of microservices.
[1]: https://www.thoughtworks.com/en-us/insights/blog/microservic...
Article title: "[Common thing] doesn't work very well!"
Article body: "So we use a monorepo and-"
Tale as old as time
I‘m also struggling with gh actions. And none of my repos is a monorepo.
Why is this so difficult?
1. We apparently don’t even have a name for it. We just call it “CI” because that’s the adjacent practice. “Oh no the CI failed”
2. It’s conceptually a program that reports failure if whatever it is running fails and... that’s it
3. The long-standing principle of running “the CI” after merging is so backwards that that-other Hoare disparagingly called the correct way (guard “main” with a bot) for The Not Rocket Science Principle or something. And that smug blog title is still used to this day (or “what bors does”)
4. It’s supposed to be configured declaratively but in the most gross way that “declarative” has ever seen
5. In the true spirit of centralization “value add”: the local option of (2) (report failure if failed) has to be hard or at the very least inconvenient to set up
I’m not outraged when someone doesn’t “run CI”.
The general philosophy of these CI systems is flawed. Instead of CI running your code, your code should run the CI. In other words, the CI should present an API such that one can have arbitrary code which informs the system of what is going on. E.g. "I'm starting jobs A,B,C", "Job A done successfully", "This file is an artifact for job B".
Information should only from from the user scripts to the CI, and communication should be done by creating files in a specific format and location. This way the system can run and produce the same results anywhere provided it has the right environment/container.
One thing that sounds very nice about Github are merge queues: Once your PR is ready, rather than merging, you submit it to the merge queue, which will rebase it on the last PR also on the merge queue. It then runs the CI on each PR, and finally merges them automatically once successful. If CI fails, doesn't get merged, and the next PR skips yours on the chain.
Still a lot of computation & some wait time, but you can just click & forget. You can also parallelize it; since branches are rebased on each other, you can run CI in advance and, assuming your predecessor is also successful, reuse the result from yours.
Only available for enterprise orgs though.
There is room for improvement: https://matklad.github.io/2023/06/18/GitHub-merge-queue.html
I remember when OpenStack had this a decade ago in open source software. How much the dream of OS has faded. :'(
https://opensource.com/article/20/2/zuul
That sounds roughly like what happens for Rust. I write a Rust PR, somebody reviews it, they have feedback, I modify the PR, they're happy, and it passes to bors (originally: https://github.com/graydon/bors) which then tries to see whether this can be merged with Rust and if so does so.
It is nice to know that if humans thought your change is OK, you're done. I've only committed small changes (compiler diagnostics, documentation) nothing huge, so perhaps if you really get up in it that's more work, but it was definitely a pleasant experience.
... and sure enough it turns out that work on one of the bors successors was in fact discontinued because you should just use this GitHub feature. TIL.
Use Bazel.
GHA/Gitlab CI/Buildkite/whatever else floats your boat then just builds a bunch of Bazel targets, naively, in-order etc. Just lean on Bazel fine-grained caching until that isn't enough anymore and stir in remote build execution for more parallelism when you need it.
This works up until ~10M+ lines of code or ~10ish reasonably large services. After that you need to do a bit more work to only build graph of targets that have been changed by the diff. That will get you far enough that you will have a whole team that works on these problems.
Allowing the CI tools to do any orchestration or dictate how your projects are built is insanity. Expressing dependencies etc in YAML is is the path to darkness and is only really justifiable for very small projects.
I have moved to this.
CI works _fantastic_. Absolute top-notch.
Then I have to sort out deployment...
> The problem is that this step will only run when I change something in the web-app1 folder. So if my pull request only made changes in api1 I will never be able to merge my pull request!
This just seems like a bad implementation to me?
There are definitely ways to set up your actions so that they run all of the unit tests without changes if you'd like, or so that api1's unit tests are not required for a web-app1 related PR to be merged.
Absolutely correct. When creating a new workflow, I always disable push/pull_request triggered builds and instead use the manually triggered `workflow_dispatch` method. This makes testing a new workflow much easier.
Additionally, you can use conditionals based on inputs in the `workflow_dispatch` meaning that you could easily setup a "skip api tests" or "include web tests" option.
It sounds like they have the logic to skip certain things if nothing has changed. The problem is around pull request gates and the lack of dynamic "these tests must be passing before merging is allowed". There are setting on a repository in the ruleset / status checks area that are configured outside of the dynamic yaml of the GHA workflow
IMHO the main problem with GH Actions is that the runners are so slow. Feels like running your build on a frigging C64 sometimes ;)
GH hosted runners use shared hardware so the performance is never good. There are quite a few options available. Harness CI offers hyper optimized build infrastructure, paired with software intelligence (caching, running subset of tests based on the code change) can reduce build times up-to 4X compared to GH Actions.
Are you hosting your own runners or relying on GitHub's?
Blacksmith is your buddy. Its free and just has better images for single-core operations. Unless you're Google, I can guarantee it's faster.
[dead]
> Our code sits in a monorepo which is further divided into folders. Every folder is independent of each other and can be tested, built, and deployed separately.
If this is true, and you still have problems running specific Actions, why not break this into separate repositories?
There is a mono vs poly repo tradeoff. Pros & cons to each approach. If you are doing monorepo, it would be antithetical to break it up into the poly paradigm. You really don't want both
My immediate response as well.
So the way I've solved the multiple folders with independent checks is like this:
That way it is resilient against checks not running because they're not needed, but it still fails when any upstream actually fails.Now, I did end up running the tests of the front-end and back-end because they upload coverage, and if my coverage tool doesn't get both, it'll consider it as a drop in coverage and fail its check.
But in general, I agree with the writer of the post that it all feels like it's not getting enough love.
For example, there is no support for yaml anchors, which really hampers reusability on things that cannot be extracted to separate flows (not to mention separate flows can only be nested 4 deep).
There is also the issue that any commit made by GitHub actions doesn't trigger another build. This is understandable, as you want to avoid endless builds, but sometimes it's needed, and then you need to do the ugly workaround with a PAT (and I believe it can't even be a fine-grained one). Combine that with policies that set a maximum time limit on tokens, your build becomes brittle, as now you need to chase down the person with admin access.
Then there is the issue of Docker actions. They tell you to pin the action to an sha to prevent replacements. Except the action itself points to a replaceable tag.
Lastly, there is a bug where when you create a report for your action, you cannot specify the parent it belongs to. So your ESLint report could be made a child of your coverage report.
I have never used a CI system more flaky and slow than GitHub Actions. The one and only positive thing about it is that you get some Actions usage for free.
The Azure machines GitHub uses for the runners by default have terrible performance in almost every regard (network, disk, CPU). Presumably it would be more reliable when using your own runners, but even the Actions control plane is flaky and doesn't always schedule jobs correctly.
We switched to Buildkite at $DAYJOB and haven't looked back.
I’ve seen many teams get stuck when they rely too heavily on GitHub Actions’ magic. The key issue is how tightly your build logic and config become tied to one CI tool. If the declarative YAML gets too big and tries to handle complex branching or monorepos, it devolves into a maintenance headache—especially when you can’t test it locally and must push blind changes just to see what happens.
A healthier workflow is to keep all the logic (build, test, deploy) in portable scripts and let the CI only orchestrate each script as a single step. It’s easier to troubleshoot, possible to run everything on a dev machine, and simpler if you ever migrate away from GitHub.
For monorepos, required checks are maddening. This should be a first-class feature where CI can dynamically mark which checks apply on a PR, then require only those. Otherwise, you do hacky “no-op” jobs or you force your entire pipeline to run every time.
In short, GitHub Actions can be powerful for smaller codebases or straightforward pipelines, but if your repo is big and you want advanced control, it starts to feel like you’re fighting the tool. If there’s no sign that GitHub wants to address these issues, it’s totally reasonable to look elsewhere or build your own thin orchestration on top of more flexible CI runners.
There are a lot of subtle pitfalls as well. Like no default timeouts, excess permissions etc.
I wrote about it in detail https://ashishb.net/tech/common-pitfalls-of-github-actions/ And even created a tool to generate good configs http://github.com/ashishb/gabo
I tried to use GitHub Actions on Forgejo and... It's so much worse than using an actual CI pipeline.
With Woodpecker/Jenkins you know exactly what your pipeline is doing. With GitHub actions, not even the developers of the actions themselves know what the runner does.
> use GitHub Actions on Forgejo
What does this even mean? Are you talking about Forgejo Actions, or are you somehow hosting your code on a Forgejo instance but running CI through GitHub?
> With Woodpecker/Jenkins you know exactly what your pipeline is doing.
If you wrote it from the ground up, sure. On the other hand, I've inherited Jenkins pipelines that were written years before I got there and involved three to four different plugins, and they're way worse to work with than the GitHub Actions that I inherited.
> What does this even mean? Are you talking about Forgejo Actions, or are you somehow hosting your code on a Forgejo instance but running CI through GitHub?
yes, Forgejo Actions which is supposed to be a drop in replacement for GH Actions. You can say they're different things but the general idea and level of complexity is the same.
Best one I've used is the CI of sourcehut. So simple and so damn easy to set up.
You basically achieve the same result on github actions if you just ignore all of the github action yaml “magic” settings in the syntax and let your makefile/script do the logic which also makes it trivial to debug locally. But upvote because I do love sourcehut, it’s just so clean!
Yes, but with Sourcehut you can also SSH into the machine, making it trivial to debug.
You can't run AWS lambda or DyanmoDB locally too (well you can but it's a hassle). So by that logic, we shouldn't use them at all. I don't like working with CI too but I'll take GitHub Actions over Jenkins/CircleCI/TravisCI any day.
> You can't run AWS lambda or DyanmoDB locally too (well you can but it's a hassle). So by that logic, we shouldn't use them at all.
No, applying the logic to something like Lambda would mean implementing handlers like:
Then there's no need to go through the hassle of running Lambda functions locally; since we can just run `myFunction` locally instead.Dynamo isn't the same, since it's just a service/API that we call; we don't implement its logic, like we do for CI tasks, Lambda functions, etc.
Whilst you're right that it's a hassle to run DynamoDB locally (although not too bad, in my experience); that's also not necessary. It's fine to run code locally which talks to a remote DynamoDB; just set up the credentials appropriately.
Yeah, and that doesn't stop us from using either of them. What I tried to convey is that GHA isn't ideal and it has a few warts but it's still better than most of the options available out there.
lambda: https://docs.aws.amazon.com/serverless-application-model/lat...
dynamodb: https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
doesn't seem any harder than running any other db
The problem with the analogy is that GHA's interface is quite thick.
GitHub Actions supporting yaml anchors would resolve one of the gripes, which I share.
https://github.com/actions/runner/issues/1182
Monorepos come with a different set of tradeoffs from polyrepos. Both have their pains. We have a similar setup with Jenkins, and have used CUE to tame a number of these issues. We did so by creating (1) a software catalog (2) per-branch config for versions and CI switches
Similarly, we are adopting Dagger, more as part of a larger "containerize all of our CI steps" which works great for bringing parity to CI & local dev work. There are a number of secondary benefits and the TUI / WUI logs are awesome.
Between the two, I have removed much of the yaml engineering in my work
I hate GitHub Actions, and I hate Azure Pipelines, which are basically the same. I especially hate that GitHub Actions has the worst documentation.
However, I’ve come full circle on this topic. My current position is that you must go all-in with a given CI platform, or forego the benefits it offers. So all my pipelines use all features, to offer a great experience for devs relying on them: Fast, reproducible, steps that are easy to reason about, useful parameters for runs, ...
I am biased because I built the rust SDK for dagger. But I think it is a real step forward for CI. Is it perfect? Nope. But it allows fixing a lot of the shortcomings the author has.
Pros:
- pipeline as code, write it as golang, python, typescript or a mix of thr above.
- Really fast once cached
- Use your languages library for code sharing, versioning and testing
- Runs everywhere local, ci etc. Easy to change from github actions to something else.
Cons:
- Slow on the first run. Lots of pulling of docker images
- The DSL and modules can feel foreign initially.
- Modules are definitely a framework, I prefer just building having a binary I can ship (which is why the rust SDK doesnt support modules yet).
- Doesn't handle large mono repos well, it relies heavily on caching and currently runs on a single node. It can work if you don't have 100 of services especially if the builder is a large machine.
Just the fact that you can actually write ci pipelines that can be tested, packaged, versioned etc. Allows us to ship our pipelines as products which is quite nice and something we've come to rely on heavily
I'm genuinely intrigued by Dagger, but also super confused. For example, this feels like extra complexity around a simple shell command, and I'm trying to grok why the complexity is worth it: https://docs.dagger.io/quickstart/test/#inspect-the-dagger-f...
I'm a fanboy of Rust, Containerization, and everything-as-code, so on paper Dagger and your Rust SDK seems like it's made for me. But when I read the examples... I dunno, I just don't get it.
It is a perfectly valid crtitisim dagger is not a full build system that dictates what your artifacts look like. Unlike maybe something like bazel or nix. I think of dagger as a sort of interface that now allows me to test and package my build and ci into smaller logical bits and rely on the community for parts of it as well.
In the end you do end up slinging apt install commands for example, but you can test those parts in isolation. Does my ci actually scan this kind of vulnerability, install postgres driver, when I build a rust binary is it musl and working on scratch images.
In some sense dagger feels a little but like a programmatic wrapper on top of docker, because that is actually quite close to what it is.
You can also use it for other things because in my mind it is the easiest way of orchestrating containers. For example running renovate over a list of repositories, spawning adhoc llm containers (pre ollama), etc. Lots of nice uses outside of ci as well even if it is the major selling point
I'm merely an outsider to Dagger, but I believe the page you linked to would give one the impression "but why golang[1] around some shell literals?!" because to grok its value one must understand that m.BuildEnv(source) <https://docs.dagger.io/quickstart/env#inspect-the-dagger-fun...> is programmatically doing what https://docs.github.com/en/actions/writing-workflows/workflo... would do: define the docker image (if any), the env vars (if any), and other common step parameters
That conceptually allows one to have two different "libraries" in your CI: one in literal golang, as a function which takes in a source and sets up common step configurations, and the other as Dagger Functions via Modules (<https://docs.dagger.io/features/modules> or <https://docs.dagger.io/api/custom-functions#initialize-a-dag...>) which work much closer to GHA uses: blocks from a organization/repo@tag style setup with the grave difference that they can run locally or in CI, which for damn sure is not true of GHA uses: blocks
The closest analogy I have is from GitLab CI (since AFAIK GHA does not allow yaml anchors nor "logical extension"):
1: I'm aware that I am using golang multiple times in this comment, and for sure am a static typing fanboi but as the docs show they allow other languages, tooExactly. When comparing dagger with a lot of the other formats for ci, it may seem more logical. Ive spent so much time debugging github actions, waiting 10 minutes for testing a full pipeline after a change etc. Over an over again. Dagger has a weird DSL in the programming languages as well but at least it is actual code that I can write for loops around or give parameters and reuse. The instead of a groovy file for Jenkins ;)
Shameless plug but I built GitGuard (https://gitguard.dev) to solve the "Pull request and required checks" problem mentioned here (and other problems).
Basically: you set GitGuard as your required check and then write a simple GitGuard workflow like this:
Email in my bio for anyone interested.I do not use GitHub Actions for these purposes, and if I did, I would want to ensure that it is a file that can run locally or whatever else just as well. I don't use GitHub Actions to prevent pull requests from being merged (I will always manage them manually), and do not use GitHub Actions to manage writing the program, for testing the program (it would be possible to do this, but I would insist on doing it in a way that is not vendor-locked to GitHub, and by putting most of the stuff outside of the GitHub Actions file itself), etc.
I do have a GitHub Actions file for a purpose which is not related to the program itself; specifically, for auto-assignment of issues. In this case, it is clearly not intended to run locally (although in this case you could do so anyways if you could install the "gh" program on your computer and run the command mentioned there locally, but it is not necessary since GitHub will do it automatically on their computer).
Why is this team sticking multiple directories that are “independent of each other” into a single repository? This sounds like a clear case of doing version control wrong. Monorepos come with their own set of challenges, and I don’t think there are many situations where they’re actually warranted. They certainly don’t help for completely independent projects.
No project within a single organization is completely independent. In general they all serve to meet a unified business objective and developers often need the global context on occasion.
I used to be a multirepo proponent, but have fallen in love with Bazel and “everything bagel” repo.
> No project within a single organization is completely independent. In general they all serve to meet a unified business objective and developers often need the global context on occasion.
Of course; I was only quoting the article. I am a firm believer in making things as simple as possible until you need something complicated, and monorepos/Bazel definitely get a “complicated” label from me.
Yeah, sounds like the problems are more due to monorepos rather than with GitHub actions. Seems like the pendulum always swings too far. Overdoing microservices results in redundant code and interservice spaghetti. Monorepos have their own set of issues. The only solution is to think carefully about a what size chunk of functionality you want to build, test, and deploy as a unit.
For the first point, some mono repo orchestrators (I'm thinking of at least pnpm) have a way to do : run all the (for example) tests for all the packages that had change from master branch + all packages that depend transitively from those packages.
It's very convenient and avoid having to mess with the CI limitations on the matter
Just have Github Actions run a monorepo tool like turborepo. You're just trying to do to much in a yaml file... The solution for all build pipeline tools are always to do most of your build logic in a bash-script/makefile/monorepo-tool.
I use Github Actions as a fertile testing playground to work out how to do things locally.
For example, if you've ever had to wade into the codesigning/notarization quagmire, observing the methods projects use with Github Actions to do it, can teach you a lot about how to do things, locally.
Many if not all mentioned issues derive from the fact that nowadays pipelines are most of the time - YML based - which is terrible choise for programming , you might want take a look at Sparky which is 100% Raku cicd system thst does not have many of mentioned pitfalls and super flexible …
Disclaimer I am the tool author - https://github.com/melezhik/sparky
> It's a known thing that there is no way of running GitHub Actions locally. There is a tool called act but in my experience it's subpar.
I really hope there will be a nice, official tool to run gh actions locally in the future. That would be incredible.
In my newest hobby project, I decided to bite the bullet and use the flake.nix as single source of truth. And it's surprisingly fast! I used cargo-crane to cache rust deps. This also works locally just running "nix flake check". Much better than dealing with github actions, caches, and whatnot.
Apart from the nix gh action that just runs "nix flake check", the only other actions are for making a github release on certain tags, and uploading release artifacts - which is something that should be built-in IMO.
I once used Team City and Octopus Deploy in a company. And ever since then, dealing with Gitlab Pipelines and Github Actions, I find them so much poorer as a toolkit.
We are very much in the part of the platform cycle where best-in-breed is losing out to all-in-one. Hopefully we see things swing in the other direction in the next few years where composable best-in-breed solutions recapture the hearts and minds of the community.
It's simple. Don't believe that the company purchased by Microsoft wants anything other than for you to use more compute.
All OSS on Github gets free compute. I doubt they want you to waste it.
Every CI system has its flaws but GitHub Actions in my opinion is pretty nice especially in terms of productivity; easy to setup, tons of prebuild actions, lots of examples, etc.
I've used Tekton, Jenkins, Travis, Hudson, StarTeam, Rational Jazz, Continuum and a host of other CI systems over the years but GitHub Actions ain't bad.
I recommend to try earthfiles: https://earthly.dev/earthfile
This basically brings docker layer caching to CI. Only Things that changed are rebuilt and tested.
One options its create your own CI, I think the others tools have pros/cons.
This month I start to create to my team our own tool to build CI, I'm using go lang and create a webhook who call my API and apply what is need.
I'm saying this because you can create the CI with your features.
Google made Release-Please to make monorepo development easier, there is a GitHub Action for it in the marketplace. Would probably make things a lot cleaner for this situation.
Blindly using automation or implementing it with validation will always bite a person in the butt. Been there done that. It is good but it should always be event driven with a point of user validation.
Not sure if I am missing something but you can definitely run (some?) GH actions locally with act: https://github.com/nektos/act
Seen a couple posts on here say otherwise.
Act has limitations because GitHub Actions run via virtualization, while Act runs via containerization. This means that actions behave differently across the two platforms.
He mentioned act in the article.
My recent experience with Github Actions is that it will randomly fail running a pipeline that hasn't changed with an incomprehensible error message. I re-run the action a few hours later and it works perfectly.
This is great.
I also enjoy the "randomly and undebuggably hang until timeout" (25mins?) which is annoying and incomprehensible and costs money.
Rerunning the same jobs always passes.
GitHub Action has supported your own locally hosted runners for years, so I presume "there is no way of running GitHub Actions locally" is referring to something else.
Where did the terrible idea of pipelines as config come from anyway?
> GitHub doesn't care
GitHub cares. GitHub cares about active users on their platform. Whether it's managing PRs, doing code reviews, or checking the logs of another failed action.
They don’t care about things that I care about, including everything the author talked about, and also things like allowing whitespace-ignore on diffs to be set on by default in a repo or per user - an issue that’s been open for half a decade now.
(Whitespace is just noise in a typescript repo with automatic formatting)
https://github.com/orgs/community/discussions/5486
GitHub often actively doesn't act in situations where acting would be prudent, which portrays from an outside perspective a disinterest in those who give their time to document shortcomings. Would you care to guess when the last time that the GitHub API was updated? It's probably much longer than you'd think (2+ years at this point).
Welcome to the jungle.
https://medium.com/@bitliner/why-gitlab-can-be-a-pain-ae1aa6...
I think it’s not only GitHub.
Ideally we should handle it as any other code, that is: do tests, handle multiple environments including the local environment, lint/build time error detection etc
Unhappy to confirm that for any poor souls using Azure DevOps, it's even worse.
Agree. CI should be handled as code.
> My team consists of about 15 engineers
If it's not open source, I have no idea why you'd use GitHub at all. (And even then.)
Keep your eggs in your own nest.
What does this even mean?
I think devbox.sh would solve some of the issues, especially local development. You can also run devbox in CI
assuming that every folder is independent sounds like bad design.
if they're really independent out them in separate repos.
> Jenkins, TeamCity
Yeah-yeah, but it's not like they allow you to run your build definitions locally nor they address some other concerns. With GHA you may use nix-quick-install in a declarative manner, nixify your builds and then easily run them locally and under GHA. In case of jenkins/tc you would have to jump through much more hoops.
That GH Actions and Azure Pipelines both settled for this cursed Yaml is hard to understand. Just make a real programming language do it! And ffs make a local test env so I can run the thing.
[flagged]
Op is lead for this product.