Splunk Syntax Cheat Sheet

30-04-2021 admin

This is part six of the 'Hunting with Splunk: The Basics' series.

Splunk Machine Learning Toolkit, Streaming ML framework, and the Splunk Machine Learning Environment. SPL2 Several Splunk products use a new version of SPL, called SPL2, which makes the search language easier to use, removes infrequently used commands, and improves the consistency of the command syntax. See the SPL2 Search Reference. Tcpdump Cheat Sheet A commonly used and priceless piece of software, tpcdump is a packet analyzer that packs a lot of punch for a free tool. We put together a list of essential commands and put them in the tcpdump cheat sheet to help you get the most out of it. Cheat sheet for splunk search command line. Contribute to xmonster-tech/splunksheet development by creating an account on GitHub.

If you have spent any time searching in Splunk, you have likely done at least one search using the stats command. I won’t belabor the point, but it's such a crucial capability in the context of threat hunting that it would be crime to not talk about it in this series.

When focusing on data sets of interest, it's very easy to use the stats command to perform calculations on any of the returned field values to derive additional information. When I say stats, I am not just referring to the stats command; there are two additional commands that are worth mentioning—eventstats and streamstats. Like many Splunk commands, all three are transformational commands, meaning they take a result set and perform functions on the data.

Let’s dive into stats.

Stats

The stats command is a fundamental Splunk command. It will perform any number of statistical functions on a field, which could be as simple as a count or average, or something more advanced like a percentile or standard deviation. Using the keyword by within the stats command can group the statistical calculation based on the field or fields listed.

Here is a good basic example of how to apply the stats command during hunting. I might hypothesize that the source destination pairs with the largest amount of connections starting in a specific netblock are of interest to dig deeper into.

The search is looking at the firewall data originating from the 192.168.225.0/24 netblock and going to destinations that are not internal or DNS. The stats command is generating a count, grouped by source and destination address. Once the count is generated, that output can be manipulated to get rid of single events and then sorted from largest to smallest.

Another use for stats is to sum values together. A hypothesis might be to look at firewall traffic to understand who my top talkers to external hosts are, not from a connection perspective, but from a byte perspective. Using the stats command, multiple fields can be calculated, renamed and grouped.

In this example, the same data sets are used but this time, the stats command is used to sum the bytes_in and bytes_out fields. By changing the sort, I can easily pivot to look at the top inbound byte volumes or even the low talkers based on lowest byte count (which might be its own hypothesis). As a side note, if I saw the result set above I might ask why I am seeing many hosts from the same subnet all communicating to the same destination IP, with identical byte counts, both in and out. The point is there are numerous ways to leverage stats.

Eventstats

With these fundamentals in place, let’s apply these concepts to eventstats. I like to think of eventstats as a method to calculate “grand totals” within a result set that can then be used to further manipulate these totals to introspect the data set further.

Another hypothesis I might want to pursue is identifying and investigating the systems with the largest byte counts leaving the network; but to effectively hunt, I want to know all of the external hosts that my system is connecting to and how much data is going to each host.

Using the same basic search criteria as the earlier search, we slightly augmented it to make sure any bytes_out are not zero to keep the result set cleaner. Eventstats is calculating the sum of the bytes_out and renaming it total_bytes_out grouped by source IP address. That output can then be treated as a field value that can be outputted with additional Splunk commands.

The bands highlighted in red show the source IP address with the bytes_out summed to equal the total_bytes_out.

Another hypothesis that I could pursue using eventstats would be to look for systems that have more than 60 percent of their traffic going to a single destination. If a system is talking nearly exclusively to a single external host, that might be cause for concern or at least an opportunity to investigate further.

Common Splunk Queries

Going back to the earlier example that looked for large volumes of bytes_out by source and destination IP addresses, we could evolve this and use eventstats to look at the bytes_out by source as a percentage of the total byte volume going to a specific destination.

Building on the previous search criteria, I calculate the eventstats by summing the bytes_out grouped by source IP address to get that “grand total.” Now I can start transforming that data using stats like I did earlier and grouping by source and destination IP. If I stopped there, I would have the sum of the bytes_in, bytes_out, the total_bytes_out and the source and destination IP. That’s great, but I need to filter down on the outliers that I'm hypothesizing about.

Using the eval command, the bytes_out and total_bytes_out can be used to calculate a percentage of the overall traffic. At that point, I'm formatting the data using the table command and then filtering down on the percentages that are greater than 60 and sorting the output.

I now have a set of source IP addresses that I can continue to interrogate with the knowledge that a high percentage of the data is going to a single destination. In fact, when I look at my output, I find an interesting outcome which is that my top 14 source addresses are all communicating to the same external IP address. That alone might be something interesting to dig further on, or it might be a destination that should be whitelisted using a lookup. This approach though allows me to further refine my search and reinforce or disprove my hypothesis.

Streamstats

On to streamstats. Streamstats builds upon the basics of the stats command but it provides a way for statistics to be generated as each event is seen. This can be very useful for things like running totals or looking for averages as data is coming into the result set.

If I were to take the results from our earlier hunt, I could further hypothesize that communications outbound from my host occur in bursts. I could then use streamstats to visualize and confirm that hypothesis.

Building off the previous example, the source IP address 192.168.225.80 generated 77% of its traffic to a specific destination. We could investigate further and look at the data volume over time originating from that address.

The search I start with is the same basic search as the other examples with one exception—the source is no longer a range but a specific address. Because I would like the information to aggregate on a daily basis, I'm sorting by date. Streamstats is then used to get the sum of the bytes_out, renamed as total_bytes_out and grouped by source IP address. Finally, we table the output, specifically date, bytes_out and the total_bytes_out.

The output can be viewed in a tabular format or visualized, preferably as a line chart or area chart. As you can see from the output, the daily bytes_out added to the previous day’s total_bytes_out will equal today’s total_bytes_out.

Stats, eventstats and streamstats are all very powerful tools to further refine the result set to identify outliers within the environment. While this blog focused on network traffic and used sums and counts, there is no reason not to use it for host-based analysis as well as leveraging statistics like standard deviations, medians and percentiles.

Happy hunting!

In looking into compromised systems, often what is needed by incident responders and investigators is not enabled or configured when it comes to logging. To help get system logs properly Enabled and Configured, below are some cheat sheets to help you do logging well and so the needed data we all need is there when we look.

Cheat Sheets to help you in configuring your systems:

The Windows Logging Cheat SheetUpdated Feb 2019
The Windows Advanced Logging Cheat SheetUpdated Feb 2019
The Windows HUMIO Logging Cheat Sheet Released June 2018
The Windows Splunk Logging Cheat Sheet Updated Sept 2019
The Windows File Auditing Logging Cheat Sheet Updated Nov 2017
The Windows Registry Auditing Logging Cheat Sheet Updated Aug 2019
The Windows PowerShell Logging Cheat Sheet Updated Sept 2018
The Windows Sysmon Logging Cheat Sheet Updated Jan 2020

MITRE ATT&CK Cheat Sheets

The Windows ATT&CK Logging Cheat Sheet Released Sept 2018
The Windows LOG-MD ATT&CK Cheat Sheet Released Sept 2018

The MITRE ATT&CK Logging Cheat Sheets are available in Excel spreadsheet form on the following Github:

Acer al922 drivers download for windows 10, 8.1, 7, vista, xp. ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Update Log:

SysmonLCS:Jan 2020 ver 1.1

Fixed GB to Kb on log size

WSplunkLCS:Sept 2019 ver 2.22

Minor code tweaks, conversion

WSysmonLCS:Aug 2019 ver 1.0

Initial release

WRACS:Aug 2019 ver 2.5

Added a few more items

WSLCS:Feb 2019 ver 2.21

Fixed shifted box, cleanup only

WLCS:Feb 2018 ver 2.3

Added a couple items from Advanced
Adjust a couple settings
General Clean up
Referenced the Windows Advanced Logging Cheat Sheet