Working with GitHub Actions

bob1029 · on Sept 19, 2019

My experience with GitHub Actions has dramatically shifted as of last week. My first attempt at using them was catastrophic, and I decided it would never work for us. Fast forward about 2 months and everything seems to be working perfectly now. Setting up a test build for PRs on a .NET Core solution took ~10 lines of YAML, and we were in business. The ability to monitor the builds in real-time as well as link to specific line #s in each respective action log is an incredible feature. We will definitely be using this to link issues to build failures. Also, the ability to see how a build script will perform scoped to its PR means that you can review and approve not only the code aspect, but also any required changes to the build/deploy pipeline. Atomic commits across source & CI/CD is a very compelling idea.

One final aspect we started investigating was how we could completely eliminate Jenkins from our stack. We have some deeply-customized jobs that run on that box (for handling bundling of build artifacts, managing metadata for final public releases, etc). What we will likely do is write a post-build agent in .NET Core, include it within the same project repository it will be responsible for working with, and then as part of the solution build it would just get taken care of automatically. Then, we simply need to invoke one more command after our initial dotnet build to trigger the package/deployment process, all from GitHub's machines.

If anyone from the GitHub team is browsing here, I was wondering if there were any plans for caching Action state between builds. That is to say, instead of downloading & installing the .NET Core 2.2 runtime every time an action executes, it could load from some cached instance (if available). This is the only downside I see right now - Jenkins can usually turn our check builds around in about 50% the time of the GitHub Actions, but for our purposes right now we honestly don't mind it. PRs don't need to merge every 60 seconds. Once we get to release builds in GitHub Actions, I think that tune might change a little bit.

ethomson · on Sept 20, 2019

Thanks for the feedback. We're working on caching now, which will help you with your built artifacts between runs. We're also working to tune the list of pre-installed software so that the more common packages (like .NET Core 2.2) are already on the runners and you don't need to download (or cache) them.

mark242 · on Sept 20, 2019

You should probably set up a custom Docker container to act as your "cached instance". If you're doing the same setup steps over and over for each build, and these setup steps are immutable, just throw them into a Dockerfile and create your own custom action environment.

cstuder · on Sept 20, 2019

GitHub provides an action which might be the one you're looking for: https://github.com/actions/setup-dotnet

If you set a specific version, it can use a cached image and is a lot faster than a complete install.

Also have a look at their other `setup-*` actions for similar installations of Node, Python, Go...

bob1029 · on Sept 20, 2019

Yep - This is exactly what we are using right now. It is still a bit slower compared to having it pre-installed on something like a persistent Jenkins instance, but it's such a marginal difference that we will probably overlook it assuming that new caching features roll out eventually.

httgp · on Sept 20, 2019

I don't have a link handy but I remember someone asking for cached builds on Twitter and Nat Friedman responding that they were working on it!

EDIT: Found it! https://twitter.com/natfriedman/status/1164210683979812869 Looks like it is landing on November 13.

o-__-o · on Sept 20, 2019

Always assume big enhancements and fixes to come around satellite or universe. Two major releases a year is a safe run

wesleytodd · on Sept 20, 2019

If you are a programming beginner or want a deep dive which teaches a bunch of unrelated topics, this is a great article!

But, it took me longer to skim this article than to setup my first action. This is making things way more complicated seeming than it actually is IMO. All you need is 3 small files :)

tpaschalis · on Sept 20, 2019

Sorry in advance for the shameless plug.

If anyone is looking for such a get-started-right-now post, I've written one with a sample build-and-upload-to-s3 workflow [0].

It is indeed simple to get started, and I'm personally quite optimistic for Actions in the future!

[0] https://tpaschalis.github.io/gh-actions-go-s3/

kylemh · on Sept 20, 2019

The only downside to GH Actions rn is the inability to cache artifacts, but that'll change by November 13th at the latest.

https://twitter.com/natfriedman/status/1172387801054113793

rkangel · on Sept 20, 2019

How do people find Github Actions compared to Gitlab's CI?

I've had a a first look through the documentation, and saw a couple of interesting differences:

Actions allow a job to be broken up into a series of (re-usable) steps

Can be triggered on a broader set of events

satyadeepk · on Sept 20, 2019

GitHub actions are pretty easy to get started even for CI beginners. Only few things I feel lacking, 1. Environment variables can only be defined on steps, they can't be defined at a job level. So the same code has to be repeated in each step. 2. There's no way to manually trigger a job.

nickcw · on Sept 20, 2019

> 1. Environment variables can only be defined on steps, they can't be defined at a job level

Actually you can define global environment variables with an extra step... here is how I do it. I was quite frustrated until I worked this out!

      - name: Set environment variables
        shell: bash
        run: |
          echo '::set-env name=GOPATH::${{ runner.workspace }}'
          echo '::add-path::${{ runner.workspace }}/bin'
          echo '::set-env name=GO111MODULE::${{ matrix.modules }}'
          echo '::set-env name=GOTAGS::${{ matrix.gotags }}'
          echo '::set-env name=BUILD_FLAGS::${{ matrix.build_flags }}'

satyadeepk · on Sept 24, 2019

That actually works! Thank you.

mdaniel · on Sept 20, 2019

> Environment variables can only be defined on steps, they can't be defined at a job level. So the same code has to be repeated in each step.

I haven't been able to play with Actions to confirm what I'm about to say, but if they use a fully compliant YAML parser you can use YAML anchors to do "copy-paste" without the copy paste; I think GitLab actually fully recommended such an approach before they supported `extends:` in their pipeline document

I'm on my phone or I would link to the YAML spec and GitLab examples in their docs for the gory details

satyadeepk · on Sept 20, 2019

Looks like YAML anchors and aliases are not currently supported. https://github.community/t5/GitHub-Actions/Support-for-YAML-...

tpaschalis · on Sept 20, 2019

Regarding point #2, have you checked external triggers [0] in the documentation? When starting off, it was easier for me to just push an empty/pointless commit to get the whole process running, but setting up the webhook should also be easy.

[0] https://help.github.com/en/articles/events-that-trigger-work...

013a · on Sept 19, 2019

I tested our most complex pipeline on Github Actions (the newer version of the beta, which had the YAML specification, though I tried both). Our most complex pipeline, though universally likely not very complex; maybe 8 distinct linters, unit tests, a build step with a typescript compile + docker build, and an image push.

While they're definitely trying out some novel ideas in the CI world, it was startlingly slow. Using as much parallelism as possible on every platform; CircleCI could do this pipeline in ~5 minutes, BuildKite (what we currently use) hosted on GCP n1-standard-1 instances, could do it in about 6, and Github Actions took closer to 10. I believe the main reason why Circle was faster was that I took some extra time to set up their native cacheing systems for things like npm dependencies and build outputs between steps; we don't have this set up on BuildKite, and that might shave off a few seconds (though, I haven't looked into it, but if we're forced to egress these caches from a BuildKite DC, say in AWS, to our agents in GCP, maybe it wouldn't save any time. The tight integration of Circle is a big advantage here).

My theory is that the big slowdown is the downloading of docker images; not necessarily the beefiness of each build container.

On Circle, we avoided the use of docker and went straight native, except in the final steps where we're actually building an image of course.

On BuildKite, we're actively encouraged to use Docker to isolate builds. Given a single agent will be shared between multiple repositories, this makes total sense. But, the agents are still durable; if a tslint step starts with a pull of the node:carbon image, that image will be cached on each machine that has ever ran the build, practically indefinitely (until the machines are recycled). So the P99 off each pipeline can be higher than that 6 minute mark, but the vast majority of builds will go much quicker.

On GHA, every step needs to be in a docker image (which I like!), but there doesn't appear to be any local image caching in the build containers. If they can add this in some capacity, I'd bet the builds would be much faster. Arguably; they know at any time what images may be needed for a given repository, because the pipeline steps are defined statically in the repository, so they could feasibly keep those cached in a pool of workers that are then grabbed for that repo. Plus, I'd bet over 50% of builds start with the top 50 most popular images on docker hub (alpine, ubuntu, node, python, etc). There's no reason why every agent can't have those pre-downloaded and ready to go, beyond cost.

Excited to see the product improve; I could easily see us switching and paying for it one day.

sa46 · on Sept 20, 2019

I've recently dumped a lot of time into making our CircleCI pipeline faster. I'd love to hear why you switched to BuildKite because we're looking at doing the same thing. My understanding is BuildKite has the following advantages:

- You can have persistent workers since the build machines run in your cluster.

- Much better Docker layer cache hit rates. We use custom Docker images and I've never seen a Docker image cache hit on our CircleCI machines.

- Better download throughput. CircleCI appears to be bandwidth limited to about 15MB per second.

- It's much more straight-forward to programmatically generate workflows instead of hard-coding them in multi-thousand line YAML file.

Some of low-hanging optimizations on CircleCI I've used to reduce build time are:

- Using a shallow git checkout. Reduced checkout time of our 500MB repo from 30 seconds to 2 seconds.

- save_cache into /dev/shm. This cut our 1GB node_modules (yea, it's huge I know) save_cache and restore_cache from 30 seconds to 10 seconds. The HUGE downside is /dev/shm is mounted as noexec so you can't run anything in node_modules/.bin without some hacks.

013a · on Sept 20, 2019

We never really "switched" to Buildkite; its what we started with, and its been great for us. But I'm always experimenting with other options, which has recently included GHA and Circle.

Buildkite has pros and cons, like any CI platform. Managing your own agents is nice in the sense that you get really fine-grained control over their performance and cost. But, that being said, it would take some engineering to create an agent pool which can match the cost of something like Circle.

If you're running an agent 24/7, on, say, an EC2 instance, but you're not actually using it 24/7, there's some serious cost overhead there where you'd see some nice savings on Circle. The great thing about BK is the flexibility; there's infinite potential to do some really smart things, like build an autoscaling agent pool. In fact, they have an AWS CloudFormation stack which does exactly that (we don't run that, but I've looked into it and its really cool: it essentially has a lambda cron which calls the BK API to determine the number of outstanding jobs, dumps that into a CloudWatch metric, then sets up an ASG which can scale-to-zero based on this metric). As far as I can tell, this is a "one-click deploy", so management should be easy.

We run our agents inside Kubernetes on GKE. We've had some minor issues with this. Docker-in-Docker is never fun. So we'll probably look to switching to that stack (and maybe set it up so, instead of the automated ASG, we just have a timeboxed ASG between 9-5 M-F, scale to 1 instance outside of those hours, as we're pretty timezone-centralized). The advantage of GCP/GKE is that we're using preemptable instances; ridiculously easy to set up on GKE and you get that 70% cost savings. The only downside is that, every once in a while, a build just fails because the underlying instance restarts. At our scale, not a huge deal; just re-run it. Maybe happens once a week.

Again, its trade-offs. You can go with more persistent workers, and get amazing docker layer image caching. Or you can go with an auto-scaling route and save a ton of costs. It shouldn't be impossible to find a balance in there which works for your company, and I like that about BuildKite.

I don't think we'd ever switch to Circle, just because I don't see a huge benefit in doing so. We may, one day, switch to GHA. We already use Github, so that integration is nice. If the performance is looking good and the cost is acceptable, then we'd do it; CI is so fungible that it really comes down to minutia like this.

colinchartier · on Sept 20, 2019

I've been working on a hosted tool that uses some kernel trickery to speed up big CI jobs like this - would you be willing to email me more details about your use case? It would really help me prioritize features for my beta!

colin@layerci.com

jeremy_k · on Sept 19, 2019

They're working on it. In a tweet Nat replied "Dependency caching is coming this year! Biggest gap we know we have to fix."[1]

[1] - https://twitter.com/natfriedman/status/1166757981637115904

alexeldeib · on Sept 19, 2019

That's awesome, this would fix one of my largest pain points as well!

jeffrafter · on Sept 19, 2019

Totally agree. Layer caching would also be a big win when your docker images change. In the post I focus on the JavaScript workflow but it is totally feasible to skip docker, skip JavaScript and pull pre-compiled binaries (especially given that you know the target architecture). Ultimately the real benefit will show on large repositories where checkout time is minimal because of lower transfer latency.

ethomson · on Sept 20, 2019

We changed the architecture quite a bit during the beta period to address feedback like this. With what's available today you don't need to run all your steps in a container. You _can_ take advantage of containers, but you don't _need_ to. We host Linux, Windows and macOS virtual machines for you to use and you can just define a commands to run on those machines.

We're working on artifact caching between runs now, which I think will also help some of the speed issues that you're referring to.

We really appreciate the feedback; if you do get around to trying it again, I'd love to know more about what we could improve.

judge2020 · on Sept 20, 2019

> On GHA, every step needs to be in a docker image (which I like!)

The new actions (syntax) runs on Azure and has native VMs you can use to run commands, but you can still mix and match docker images and native commands if you'd like.

je42 · on Sept 20, 2019

how do they host the Mac workers ? Azure doesn't seem to offer Mac VMs ?!

judge2020 · on Sept 20, 2019

> GitHub uses MacStadium to host the macOS virtual environments.

https://help.github.com/en/articles/virtual-environments-for...

Axsuul · on Sept 20, 2019

Had anyone switched from Google Cloud Build to GitHub Actions CI? What has the experience been?

dejj · on Sept 20, 2019

tldr; The author motivates the solution given in the article, but does not explain the problem which GitHub Actions solve. I would have liked to know what to expect before dumping an article into my brain.