Threat Hunting With Pandas

Description: This is an introductory post to a threat hunting series that I will be doing on my blog post where I will dive into various threat samples (malware, pcaps, etc) in order to discuss methods to discover various threats. In this particular post, I will be diving into a pcap of the Foudre C2 backdoor which can be found in this Github repo and using Pandas.

Getting Our Data Ready

The C2 data that we currently have is in a pcap format, but we can’t throw this file into something like Pandas to analyze it efficiently in this particular format.

To solve this, we can use zeek & use the -e argument to output the log files it generates into a JSON log format which we can then read with Pandas: /opt/zeek/bin/zeek -Cr C2_Foudre_Backdoor_DGA.pcapng -e 'redef LogAscii::use_json=T;'

This will output 4 log files:

http.log
packet_filter.log
files.log
conn.log

files.log seems pretty interesting so let’s transition to Pandas to analyze these files!

Traffic Analysis

The files.log file doesn’t contain many entries so we can perform some manual analysis

I’ll start off by looking at the columns that are available for viewing which can be done by calling the columns method on the dataframe:

print(jsonPD.columns)

ts
fuid
uid
id.orig_h
id.orig_p
id.resp_h
id.resp_p
source
depth
analyzers
duration
local_orig
is_orig
seen_bytes

missing_bytes
overflow_bytes
timedout
mime_type

Let’s take a look at the source, destination, mime type, and destination port of these requests: print(jsonPD[["id.orig_h", "id.resp_h", "id.resp_p", "source", "mime_type"]])

   id.orig_h       id.resp_h  id.resp_p source   mime_type
0  10.0.2.15  185.56.137.138         80   HTTP         NaN
1  10.0.2.15  185.56.137.138         80   HTTP   text/html
2  10.0.2.15  185.56.137.138         80   HTTP  text/plain
3  10.0.2.15  185.56.137.138         80   HTTP  text/plain

From this, we can see that 10.0.2.15 is reaching out to 185.56.137.138 on port 80 (HTTP) and seems to initially reach out and interact with an HTML file, then after that 2 more requests are sent which deal with plaintext files. This looks to be a possible GET request for a check in to a C2 server to check for commands, then two requests after which may deal with sending any requested data to the C2 server.

A quick search for the 185.56.137.138 IP address reveals that this IP address is possibly linked to an Iranian APT group:

Recall that we also have a http.log to further investigate, let’s investigate this file next. The columns associated with http.log include:

ts
uid
id.orig_h
id.orig_p
id.resp_h
id.resp_p
trans_depth
method
host
uri
version
user_agent
request_body_len
response_body_len
status_code
status_msg
tags
resp_fuids
resp_mime_types
orig_fuids
orig_mime_types

Let’s take a look at the the hosts that are involved in this communication: print(jsonPD[["host"]])

           host
0  db54a845.top
1  db54a845.top
2  db54a845.top
3  db54a845.top

Cross-referencing this domain with VirusTotal, we see that it is said to be associated with Spyware:

Let’s see what URIs of this particular domain the victim host is reaching out to: print(jsonPD[["method", "uri"]])

  method                                                                                                                            uri
0    GET                                                                                   /de/?d=2020298&v=00021&t=2020-10-24--7-28-20
1    GET                                                                                                    /de/db54a845.top2020298.sig
2    GET  /2014/?c=MSEDGEWIN10&u=IEUser&v=00021&s=TehN002&f=datadir1&mi=747f3d96-68a7-43f1-8cbe-e8d6dadd0358&b=64&t=2020-10-24--7-28-20
3   POST                                                                                                       /en/?2020-10-24--7-28-20

We see 3 GET requests and 1 POST request. Let’s view the status response message to these requests: print(jsonPD[["status_msg", "uri"]])

0      Found                                                                                   /de/?d=2020298&v=00021&t=2020-10-24--7-28-20
1         OK                                                                                                    /de/db54a845.top2020298.sig
2  Not Found  /2014/?c=MSEDGEWIN10&u=IEUser&v=00021&s=TehN002&f=datadir1&mi=747f3d96-68a7-43f1-8cbe-e8d6dadd0358&b=64&t=2020-10-24--7-28-20
3         OK                                                                                                       /en/?2020-10-24--7-28-20

So the only request that resulted in an error was the 3rd one sent to: 2014/?c=MSEDGEWIN10&u=IEUser&v=00021&s=TehN002&f=datadir1&mi=747f3d96-68a7-43f1-8cbe-e8d6dadd0358&b=64&t=2020-10-24--7-28-20.

Next, let’s look at how close in succession these requests are being made by looking at the timestamp associated with each request. Each timestamp is in a Unix epoch format, but we can quickly convert each one and replace it within the dataframe by using Pandas' to_datetime function:

jsonPD[column] = pd.to_datetime(jsonPD[column], unit="s")
print(jsonPD[column])

0   2020-10-24 14:28:20.219521024
1   2020-10-24 14:28:20.348509952
2   2020-10-24 14:28:20.468709888
3   2020-10-24 14:28:20.662238976

We can see that they appear in close succession, just microseconds apart: print(jsonPD[column].dt.microsecond)

0    219521
1    348509
2    468709
3    662238

If we wanted to go through the data frame and replace each timestamp instance with its human-readable format, we can use this:

for column in jsonPD:
    if column == "ts":
        jsonPD[column] = pd.to_datetime(jsonPD[column], unit="s")
        print(jsonPD[column].dt.microsecond)

From these timestamps being in such close succession, we can likely gauge that our victim host is reaching out quite frequently to a suspicious server via HTTP & making GET & POST requests. Cross-referencing the destination host, we’re able to see that it’s a C2 server that’s potentially associated with an APT group.