Automating Cobalt Strike with Python


I have expanded the payload_automation Python libraries to allow for synchronously controlling actions in a Cobalt Strike Beacon by adding the Beacon class. This enables you to script out Cobalt Strike actions purely in Python and avoid coding anything in Sleep completely (at least for things I’ve already implemented). 

One important fact to take note of is that the actions happen synchronously. Those who have worked in Sleep/Aggressor know that it’s a fire and forget language in most cases, so waiting until an action is completed or adding logic based on the results of an action is notoriously difficult to accomplish. With this library, we can synchronize the actions and in most cases, easily capture the output of a specific action in Python and perform actions based on that output. This is a big step in simplifying the automation of Cobalt Strike Beacons and gives way for many different applications. 

As an example of how this can be leveraged, I wrote a Threat Runner Python script which can take a SCYTHE Threat in JSON format and execute the actions through a Cobalt Strike Beacon. 

A Quick Recap of the Payload Automation Libraries

Back in June of 2021, I wrote a blog post ( to accompany the release of the Payload Automation libraries. As always, I owe a lot to @Mcgigglez16 who helped me develop and helps me maintain these libraries. I am always learning new things from him and he helps make sure that the code doesn't suck. At the time, those libraries lived up to their name and helped me automate all sorts of payloads for testing and for my team. Since then, we have worked to expand these libraries to go a bit beyond the scope defined by the name, and started to provide functions to help automate other things in Cobalt Strike, such as reporting and Beacon actions. 

To quickly recap the approach taken to automate most of this, we leveraged Python’s pexpect library to control the execution of the Cobalt Strike headless Aggressor process, and send data to Aggressor and capture the output. This allows us to use pexpect like a Python wrapper for Cobalt Strike. With this, we can execute nearly any Sleep function from Python and get the output back in Python for processing. The general idea of this approach came from the code of Verizon’s RedShell. We were also not the first to integrate Python and Sleep, as PyCobalt took tremendous steps in this direction, but we took a different approach where we call Sleep from Python, and PyCobalt goes the other direction, calling Python from Sleep. Both have their advantages and overall PyCobalt is another project definitely worth checking out and leveraging where it works best for you.

You can always grab the latest version of the libraries here:

Red Team Automation Tooling

In the past 5-10 years, Red Team Automation tooling has been on the rise, and more  and more organizations are leveraging this type of tooling for Red Teaming and Purple Teaming.

I’m personally a huge fan of automation and look to automate everything that I can. “If it’s worth doing once, it’s worth automating,” that’s what I always say. 

I understand the push back against automation in the Red Teaming space. Red Teaming is very much an approach or a mindset to bring to a situation and isn’t always just a technical thing to be performed. The human element in Red Teaming is very important and finding a novel approach to a problem or defense isn’t something that can be automated (yet…).

But Red Teaming also can involve threat emulation and sometimes those emulations are based on well known details. Those details can come from personal experiences of the operator, Threat Intelligence sources, previous findings, insider knowledge, or any other source, but the details and approaches can often be mapped out decently in advance. This presents an opportunity for partial or full automation of an attack or attack path. 

I believe that value comes from automating some of these threats to make the attacks repeatable. This gives an organization the opportunity to compare various controls against an attack, allowing for a baseline to compare potential or actual changes against. If you are hit by nearly the same or the exact same attack 6 months later, do your controls hold up the same, or do they get better or worse? This is a hard question to answer when the attacks are not automated. A lot of variables come into play which can skew the results to that question. 

I don’t believe that Red Teaming should ever be “alert validation,” but I do think there is value in being able to regularly repeat attacks, and validate that alerts are firing as intended. This sounds contradictory, but what I mean by that is that Red/Purple Teams should not aim to trigger alerts and make them fire to validate the alert, but they should be able to repeat an attack in the same way, and help confirm that an alert, which was expected to catch that attack, did fire appropriately. To many, this is a subtle difference, but it is a very important difference

Automation also gives defenders the ability to replay attacks against the network for training opportunities or the opportunity for another analyst to work the same situation. If Analyst A passed on an alert, how do we know if the issue is the technology, the people, or the process, until we have given Analyst B the opportunity to work a similar alert? By being able to replay attacks, we can give multiple analysts an opportunity to work the same types of events and gain data points about the entire make-up of a SOC, people, processes, and technology, to determine areas of common weakness, and come up with focused points of improvement.

Certain tools have come up in the industry to meet such a need. Below are some examples:

Each of these tools have their benefits and drawbacks. Some are free, but somewhat complex to set up and maintain. Others are simple to set up, but light up EDRs like a Christmas Tree or require allow-listing with EDR tools. 

Sleep and Aggressor Challenges

One tool that does not show up on this list is Cobalt Strike. Although Cobalt Strike does have a built-in scripting engine called Aggressor, which is far better than most other C2 tools, it was intended to be a manual tool for Red Team operators. Aggressor is built on a language called Sleep, which is a Java-based scripting language written by Raphael Mudge. Aggressor handles nearly everything asynchronously, meaning that if I run a function that performs an action, my next function will run before that action completes. This really makes automating a C2 tool hard. 

Imagine that you are emulating a fairly sophisticated attacker in a production environment and your beacon first checks in at 2pm on a Friday. The attacker you are simulating operates with 1 hour callbacks. You could use Sleep to execute actions every hour and between each action, well… sleeping your automation script. Works great for 1-3 hours while the person is still signed online, one action per hour, great OPSEC. Then the person closes down their computer for the weekend and the beacon doesn’t check back in until 8am on Monday. What happened to your automation and OPSEC. All of a sudden, that beacon has 40 commands queued up to execute all at once. If you are up against any sort of behavioral monitoring solution or if you trip an alert and the defender is looking at the surrounding activity, everything is blown and you just emulated a much weaker adversary. This is a big challenge with working in an asynchronous language. If we could synchronize the actions, then each action would not be queued and executed in the Beacon until the previous action was completed. This allows you to maintain the pace of the actual adversary you are attempting to emulate while still automating the actions.

Another challenge working in Sleep is command output. Only two current Beacon API calls have callbacks, which is the ability to provide a function to execute at the end of the action. The two which have callbacks are bls() and bps(), and these have callbacks to support the UI in Cobalt Strike. Every other Beacon API call just tells the Cobalt Strike server to tell the Beacon to do something. It doesn’t provide you a way to get the results of the action you just told it to execute. No callback, no return value, nothing. Just fire and forget.

If you want the output of the command you just executed, you need to subscribe to the expected event, hope that you get the right event you are looking for, then parse the text output. One example would be running bexecute_assembly() to run Seatbelt on a host, then using `on beacon_output {}` to capture the output. But what happens if the output spans multiple callbacks, or a different beacon has some output from a different command, or Seatbelt fails because you messed up the path and it goes to beacon_error instead of beacon_output? What if you need to do something on the first beacon_output, but need to do something completely different on the next beacon_output? Aggressor does not support clearing or overriding the code set for events, it only appends your code to any previous sets for event listeners. All of these would quickly break nearly any Sleep automation script unless it was painstakingly built. Trust me, I’ve had to handle all of these situations with automation in Sleep and they are not fun. 

Adding the Beacon Class

To solve many of these issues, I sat down and tried to tackle the problem with Python, by adding a Beacon class to the Payload Automation libraries. This class tracks data about the Beacon, command, result, and output history, and metadata. It gives every task a unique identifier as well, to make reporting easier. 

With this class, I was able to start to synchronize actions in a Beacon and solve most of the issues that I ran into with Sleep. The next couple of sections will cover my approach to solving those issues.

Secret Ingredients: bclear() and regex, so much regex

My approach to synchronize the actions in a Beacon was two-fold. First, I would clear the activity in a Beacon with the bclear() API call. This helped to ensure that only my intended action would be executed, so that I get the output that I’m expecting, and not some other output. This is fairly critical for the next step.

Secondly, I used regex with pexpect and an event listener that targeted output specifically from my beacon. I know what you must be thinking, “You just said event listeners are bad because you can’t override previous event listeners, only append to them.” This is true, but only true per script instance. If I disconnect the Aggressor client, my event listeners are cleared. This is terrible when you have to work in the UI, but not an issue when in pure Python. This just means I have to kill my current Java Aggressor client process with the first event listener and spawn a new one and connect again. The old event listener dies with the old process, and the new one is safe to run and catch the next output. This makes it easy to do whatever we want with each output.

There is at least one downside to this approach: RIP my Event Log…:

Since I need to connect so often, the Event Log is just crushed. I highly recommend a Team Server dedicated to automation if you are using these libraries.

Once I am able to get the output for a specific command that I executed, I can then start to define success/failure criteria. To do this, I allow each function to provide some regex values which can help determine if a command succeeded or failed. If nothing is provided and there are no defaults for that function, then it is assumed successful.

With the output from commands available directly in Python, you can start to process that data in Python as well and make logic decisions based on the results of a previous command. In this screenshot, I was able to use PowerShell to enumerate the AppLocker policy and create a list of all the valid AppLocker bypasses to select one for my persistence option:

To anyone who has worked with Sleep before, it becomes very clear how difficult this would be to implement as part of a larger automation script, but in Python it can be accomplished in just a few fairly simple lines of code.

Leveraging the Payload Automation Beacon Class to
Build a Threat Runner

Although I have my use cases for these libraries that I use at work, I can’t really share those, so I wanted to come up with a different use-case that would hopefully be helpful to a large audience. This is where I came up with the idea to write a PoC script which can use Cobalt Strike to execute publicly available threat playbooks. Since Cobalt Strike is used by a large number of Threat Actors, I thought it could be helpful to all the Threat Intelligence and Purple Teams out there to be able to get some quick and easy wins from emulating threats with Cobalt Strike. This would likely give the most realistic emulation of any of the tools out there when the actor you are emulating uses the same C2 tools in their attacks, especially if you can get details on their profile configurations.

I have released a demo of me using the Threat Running Python script to execute the Conti playbook released by SCYTHE as part of their Threat Thursdays. You can check out the demo here:

And you can play with the code here:

It is worth noting that this is truly a PoC runner for this and isn’t thoroughly tested with all of SCYTHE’s available Threats. SCYTHE also has native capabilities that Cobalt Strike doesn’t include, such as SCYTHES file module to create files on disk with random or specific data, or the crypt module for encrypting/decrypting files on the target host. It wouldn’t be overly difficult to duplicate the functionality of these modules with BOFs so that the code and logic for these modules would be client-side and not server-side. I may do that in the future to make this a little less hacky, but currently it is somewhat like I’m emulating their emulation.


Popular posts from this blog

No Shells Required - a Walkthrough on Using Impacket and Kerberos to Delegate Your Way to DA

Executing Macros From a DOCX With Remote Template Injection

One Click to Compromise -- Fun With ClickOnce Deployment Manifests