Threat Hunting With Pandas
Description: This is an introductory post to a threat hunting series that I will be doing on my blog post where I will dive into various threat samples (malware, pcaps, etc) in order to discuss methods to discover various threats. In this particular post, I will be diving into a pcap of the Foudre C2 backdoor which can be found in this Github repo and using Pandas.
Getting Our Data Ready
The C2 data that we currently have is in a pcap
format, but we can’t throw this file into something like Pandas to analyze it efficiently in this particular format.
To solve this, we can use zeek
& use the -e
argument to output the log files it generates into a JSON log format which we can then read with Pandas:
/opt/zeek/bin/zeek -Cr C2_Foudre_Backdoor_DGA.pcapng -e 'redef LogAscii::use_json=T;'
This will output 4 log files:
http.log
packet_filter.log
files.log
conn.log
files.log
seems pretty interesting so let’s transition to Pandas to analyze these files!
Traffic Analysis
The files.log
file doesn’t contain many entries so we can perform some manual analysis
I’ll start off by looking at the columns that are available for viewing which can be done by calling the columns
method on the dataframe:
print(jsonPD.columns)
ts
fuid
uid
id.orig_h
id.orig_p
id.resp_h
id.resp_p
source
depth
analyzers
duration
local_orig
is_orig
seen_bytes
missing_bytes
overflow_bytes
timedout
mime_type
Let’s take a look at the source, destination, mime type, and destination port of these requests:
print(jsonPD[["id.orig_h", "id.resp_h", "id.resp_p", "source", "mime_type"]])
id.orig_h id.resp_h id.resp_p source mime_type
0 10.0.2.15 185.56.137.138 80 HTTP NaN
1 10.0.2.15 185.56.137.138 80 HTTP text/html
2 10.0.2.15 185.56.137.138 80 HTTP text/plain
3 10.0.2.15 185.56.137.138 80 HTTP text/plain
From this, we can see that 10.0.2.15
is reaching out to 185.56.137.138
on port 80 (HTTP) and seems to initially reach out and interact with an HTML file, then after that 2 more requests are sent which deal with plaintext files. This looks to be a possible GET request for a check in to a C2 server to check for commands, then two requests after which may deal with sending any requested data to the C2 server.
A quick search for the 185.56.137.138
IP address reveals that this IP address is possibly linked to an Iranian APT group:
Recall that we also have a http.log
to further investigate, let’s investigate this file next. The columns associated with http.log
include:
ts
uid
id.orig_h
id.orig_p
id.resp_h
id.resp_p
trans_depth
method
host
uri
version
user_agent
request_body_len
response_body_len
status_code
status_msg
tags
resp_fuids
resp_mime_types
orig_fuids
orig_mime_types
Let’s take a look at the the hosts that are involved in this communication:
print(jsonPD[["host"]])
host
0 db54a845.top
1 db54a845.top
2 db54a845.top
3 db54a845.top
Cross-referencing this domain with VirusTotal, we see that it is said to be associated with Spyware:
Let’s see what URIs of this particular domain the victim host is reaching out to:
print(jsonPD[["method", "uri"]])
method uri
0 GET /de/?d=2020298&v=00021&t=2020-10-24--7-28-20
1 GET /de/db54a845.top2020298.sig
2 GET /2014/?c=MSEDGEWIN10&u=IEUser&v=00021&s=TehN002&f=datadir1&mi=747f3d96-68a7-43f1-8cbe-e8d6dadd0358&b=64&t=2020-10-24--7-28-20
3 POST /en/?2020-10-24--7-28-20
We see 3 GET requests and 1 POST request. Let’s view the status response message to these requests:
print(jsonPD[["status_msg", "uri"]])
0 Found /de/?d=2020298&v=00021&t=2020-10-24--7-28-20
1 OK /de/db54a845.top2020298.sig
2 Not Found /2014/?c=MSEDGEWIN10&u=IEUser&v=00021&s=TehN002&f=datadir1&mi=747f3d96-68a7-43f1-8cbe-e8d6dadd0358&b=64&t=2020-10-24--7-28-20
3 OK /en/?2020-10-24--7-28-20
So the only request that resulted in an error was the 3rd one sent to: 2014/?c=MSEDGEWIN10&u=IEUser&v=00021&s=TehN002&f=datadir1&mi=747f3d96-68a7-43f1-8cbe-e8d6dadd0358&b=64&t=2020-10-24--7-28-20
.
Next, let’s look at how close in succession these requests are being made by looking at the timestamp associated with each request. Each timestamp is in a Unix epoch format, but we can quickly convert each one and replace it within the dataframe by using Pandas' to_datetime
function:
jsonPD[column] = pd.to_datetime(jsonPD[column], unit="s")
print(jsonPD[column])
0 2020-10-24 14:28:20.219521024
1 2020-10-24 14:28:20.348509952
2 2020-10-24 14:28:20.468709888
3 2020-10-24 14:28:20.662238976
We can see that they appear in close succession, just microseconds apart:
print(jsonPD[column].dt.microsecond)
0 219521
1 348509
2 468709
3 662238
If we wanted to go through the data frame and replace each timestamp instance with its human-readable format, we can use this:
for column in jsonPD:
if column == "ts":
jsonPD[column] = pd.to_datetime(jsonPD[column], unit="s")
print(jsonPD[column].dt.microsecond)
From these timestamps being in such close succession, we can likely gauge that our victim host is reaching out quite frequently to a suspicious server via HTTP & making GET & POST requests. Cross-referencing the destination host, we’re able to see that it’s a C2 server that’s potentially associated with an APT group.