APIs have taken a foundational role in our society. They power communications, trade and even logistics at the heart of our economy. The API economy is continuing to grow at a steady clip, and recent research found that developer reliance on APIs accelerated during the pandemic and will continue to increase in 2021.
In parallel, the pace of software delivery has increased. Software delivery enables change, and change enables businesses to experiment, adapt, and thrive. If 2020 taught us anything it is that the world and the market is always evolving. The ability to quickly adapt to the changing realities of the market will differentiate successful and failing businesses. As such, software delivery performance is critical to businesses across almost all industries.
Automate, or Lose
Embracing DevOps and the methods described in books like Accelerate are required for high performance in tech organizations. In other words: automate, or lose. Automate your Builds, automate your tests, automate your deploys, automate your rollbacks. With the right automation, you’ll improve the four key performance indicators of software delivery performance, leading to organizational performance:
Lead time: How quickly are commits deployed in production? The faster they are deployed, the faster the feedback loop is. Automating your builds, tests, and deployments reduces the lead time.
Deployment frequency: The more frequent the deployment, the smaller the batch size. Reducing batch size was one of the keys to the success of the Toyota production system. If your deployments are automated, you can consider on-demand deployments after each change. That’s Continuous Delivery.
Mean Time to Restore: How quickly can you bring the service back up after a failure? Deployment automation — including rollback automation — combined with observability helps you restore your service faster.
Change Fail Percentage: The percentage of deployments failing. Automating tests increases success rate. In addition, frequent deployments also increase success rate, since each deployment contains less changes.
The convergence of these two trends — growth of APIs and the increased pace of software delivery — requires our industry to build better automation tools around API development, testing, and operations. The SmartBear State of the API 2020 reports “For the second year in a row, Performance is rated by customers as the highest measure of API success at 72%. Second is the ability to ensure API Uptime/availability, cited by 52% of the customers surveyed.”
Automation can help with both of those.
The Benefits of Automation
Automation in your software delivery pipelines is like having checklists to prevent avoidable failures. The power of checklists cannot be understated: A simple surgical checklist from the World Health Organization has been adopted in more than twenty countries as a standard for care and has been heralded as “the biggest clinical invention in thirty years”. Other industries like aviation, where reliability is paramount, have also adopted checklists. Automating the checks and steps to make a code change and safely ship it to production allows you to encode and remember expert knowledge. When a new mistake happens, automation can be updated to prevent it in the future.
The first benefit of automation is improved reliability. According to Gartner, the average cost of software downtime is $5,600 per minute. Because there are so many differences in how businesses operate, downtime, at the low end, can be as much as $140,000 per hour, $300,000 per hour on average, and as much as $540,000 per hour at the higher end.
The second benefit of automation is developer productivity. Automation frees talented and innovative engineers from the crushing toil of repetitive tasks and from responding to repeated and avoidable mistakes. It enables them to focus on delivering more value to your customers. The most common software engineer level at Google — L4 — pays an average of $264,000 in total compensation according to levels.fyi. With that kind of compensation, you want to make sure that your team is not wasting time on tasks that could be done better using automation. Having said that, it’s important that automated tools do not increase lead time, since that might cancel out productivity improvements.
Tools for Testing Automation
Using testing and security as an example, here are some tools that can help you improve automation:
Software Component Analysis (SCA): Have you neglected to keep your dependencies up to date? Did someone recently find a security vulnerability in them? If so, your application might be vulnerable to a publicly disclosed vulnerability. Once vulnerabilities are public, attackers use that public information to write exploits. This practice is so commonplace that “Patch Tuesday”, the day Microsoft releases security patches, led to “Exploit Wednesday”. By analyzing the patches released on Tuesday, attackers can identify the vulnerabilities and leverage them to create attacks that still work against unpatched systems. This process happens so quickly that there is usually a rise in the number of attacks against unpatched systems shortly after patches are released.
Tools like Snyk or Dependabot help you to manage your dependencies. Dependabot in particular automatically keeps your dependencies up to date by creating Pull Requests. Deployment automation will deploy those changes to production automatically. No developers need to be involved, as long as your deployment guard rails are automated. Security patches can be automatically applied and deployed, minimizing the time where your production APIs are vulnerable because of a known flaw in your dependencies.
Example of an automatically-created Pull Request to update a dependency
Static analysis (SAST): A great example of useful static analysis is type checking, which is increasingly available for dynamic languages now (Sorbet in Ruby, Mypy in Python, typescript). Those static checks prevent common mistakes, like forgetting to check for “null” or misspelling a variable name, which may impact your service’s availability if those issues are deployed in production. For instance, Stripe built the Sorbet type checker after noticing that the most common failure cases for their services were “NoMethodError” (due to calling a method on a null object) and “NameError” (due to misspelling). For dynamic languages, the best type checkers allow for gradual addition of types, so that you can start getting benefits without changing your entire code base at once.
Security-oriented static analysis tools like Coverity, SemGrep, or LGTM.com may also be helpful as long as they do not produce too many false alerts. Those tools help you detect performance and security issues in your APIs before those issues are merged and deployed to production, decreasing the Change Fail Percentage.
Example of an automatically-added Pull Request comment identifying a security bug before it’s deployed to production
Dynamic analysis (DAST): Fuzz testing has found hundreds of thousands of bugs in a variety of codebases; for example, the Linux kernel, Golang, Python, SQLite, and GRPC. Fuzzing provides unexpected inputs that might not be included in tests that developers wrote, automatically increasing test coverage. While fuzzing made its name finding bugs in C/C++, it now applies to other languages and applications in other cases like APIs. Open source tools like ClusterFuzz and OneFuzz help you run fuzzing asynchronously to automatically generate a test suite that can then be used in your Pull Requests to detect regressions. Fuzzing web APIs in particular makes sure that your Endpoint can handle not only the happy path but unexpected inputs as well. With it, you can find unhandled exceptions, security issues, and due to its dynamic nature, performance regressions as well.
Example CI Integration for fuzzing
Developers need reliability, performance, and security data before code gets deployed, all automatically generated. The tools mentioned above help you get there, enabling faster and higher quality software delivery.