Forensic timeline Splunking
Fast and powerful searching of timeline data
Computer forensic timeline analysis has come a long way in recent years. I was first introduced to “super timelines” by Rob Lee in the SANS 508 course in 2010 and have been a big fan of the tools and methodology since then.
By using fls and log2timeline to extract file system and other temporal data, a computer forensic investigator can effectively reconstruct many system and user activities on a computer system. However one challenge this creates is efficiently analysing the mass of data that’s extracted.
I’ve been looking at various methods to solve this and have found Splunk to be one the of most powerful. I'm pleased to present the results of this work today at Ruxcon 2011.
This blog provides a summary of the process of creating a super timeline and analysing it in Splunk, plus the files you'll need to customise Splunk to analyse timeline data.
The slides from my presentation are also available here. Slides from SANS 508 course are provided with permission from SANS.
Time travel anyone?
My only minor credit in this process is the application of Splunk to timeline analysis. All other tools and methods are thanks to the outstanding work of others, including:
- Kristinn Gudjonsson for developing log2timeline
- Rob Lee for his guidance and support of timeline analysis and his work through SANS
- Brian Carrier for developing the Sleuthkit
- Earlier work by Dan Farmer and Wietse Venema in The Coronor’s Toolkit
- And all the team at Splunk
There are lots of reasons to like timeline analysis.
Context :: Computer forensics is largely about context. Timeline analysis reverses the traditional paradigm where investigators focus on finding a specific artefact to prove a suspicion, such as blindly running keyword searches. Timelines let the data speak for itself, thereby not only finding the key evidence, but often also uncovering other unexpected artefacts to corroborate and clarify our key findings.
Speed :: Secondly, timeline analysis can quickly identify key artefacts and focus further analysis. Temporal data can typically be extracted from a computer and loaded into an analysis tool like Splunk within an hour, or in some cases less than ten minutes. Splunk will comfortably work even on basic hardware. I've personally used it in court on my MacBook Air to instantly validate statements being made about system activity while witnesses were giving evidence. However you should be prepared to spend some quality time analysing your search results in detail.
Knowledge :: Finally, don’t let the clean interface of Splunk fool you; this is not point-and-click forensics. Timeline analysis requires creativity and knowledge of the underlying evidence to use effectively. It often creates (initially, at least) as many further questions as answers. Why is that timestamp so? What’s going on between these two files? What’s this strange system activity in the background? Why isn't that timestamp the way I expected? Analysing the apparent anomalies that timelines present demands further research and testing from the examiner, which promotes constant learning and development. It’s hard work, but makes us better examiners for it.
Super timeline creation
These instructions describe my common steps to extract timeline data. Your needs may vary, so refer to the man pages as required.
The tools can be obtained separately from these places:
A great alternative for getting Sleuthkit and log2timeline – and many more forensic tools – is to use the SIFT Workstation, a prebuilt Linux ISO lovingly constructed, patched and configured by Rob Lee of the SANS Institute. Simply download and run in VMware to get an instant forensic analysis environment with minimal setup. It doesn't get much easier!
Splunk needs some basic customisation to work cleanly with timeline output. These are provided in two files named props.conf and tranforms.conf which can be downloaded from these links and copied to the /etc/system/local directory of your Splunk installation; i.e. not the /etc folder of the OS itself.
That’s it! Splunk is now ready to ingest your timeline data.
Choosing mactime or log2timeline output format
There are several options for outputting timeline data, the main ones being the older mactime format or the newer log2timeline CSV format, which provides more fields therefore more granular searching. I've based these instructions on the mactime format, but have created custom field identifiers for both in the Splunk customisation files described above. The choice is yours.
Step 1: Extract file system timestamps
Start by extracting file system metadata using fls from Sleuthkit, which should be run against a disk image or block device (e.g. /dev/sda or /dev/sda1):
fls -m "" -o offset –r image.dd > fls.body
- -m "" = output in mactime format and append a label, which we'll leave blank
- -o offset = sector offset to the partition
- -r = recurse through the file system
- image.dd = disk image name (or could use a block device)
- fls.body = output format; "body" denotes mactime body file
The output looks something like this:
Not particularly useful for most humans, so convert into a more readable CSV format with mactime like so:
mactime –b fls.body –d > fls.csv
- -b fls.body = the body file to parse
- -d = output in CSV format
- fls.csv = output file name
Now the output should look something like:
Thu Nov 17 2011 11:19:59,282,m.c.,r/rr-xr-xr-x,0,0,190064-128-1,"/Users/Scott/Desktop/desktop.ini"
Ahh, that's better.
I should point out that an alternative to using fls is to point log2timeline at your $MFT with the mft input module selected.
Step 2: Extract other temporal metadata
Now extract temporal data from other file types using log2timeline. Unlike fls, log2timeline is run against a mounted filesystem:
log2timeline -f exif,pdf,mft -o mactime –r -w log2timeline.body /mnt/volume
- -f exif,pdf,mft = specifies the file types to parse; refer man page for more details
- -o mactime = output data in mactime format
- -r = recurse through directories
- -w log2timeline.body = write the output to this file, again denoting mactime format
- /mnt/volume = mount point of the file system to examine
Now also convert the output to CSV as before:
mactime –b log2timeline.body –d > log2timeline.csv
A note about timezones
The tools fls, log2timeline and mactime all allow you to specify the timezone of the source data using a common switch –z timezone. If omitted, they default to the local timezone of your OS, which can also be specified with –z local.
I’ve experienced situations where timestamps can sometimes present incorrectly from some log2timeline modules. To avoid this, I recommend:
- Using a clean installation, or a prebuilt one like SIFT Workstation
- Testing the output of your tools with known data before using on a live case
Step 3: Final tweaks
Now we have two CSV files for Splunk to ingest: fls.csv and log2timeline.csv
Before we import these files into Splunk, we need to make them conform to the Splunk customisations we applied earlier, by changing the headers of these files (using your favourite text editor) from this:
Simply because having single word lowercase field names will make it easier to run searches later.
Step 4: Import your timeline data into Splunk
Splunk stores the data it ingests into various indexes, the main being main. I recommend creating a new index for each case, to keep their case data segregated.
To create a new index:
- Select the Manager menu, located on the top right of the Splunk home screen
- Select Indexes under the Data group of options
- Select the green New button
- Enter a suitable name for the index; e.g. we use unique case numbers and include multiple hosts from each case in the same index, with different host fields (see below) as searching across multiple hosts (e.g. computer and USB drive) can be quite insightful
- You should be able to leave the other options blank
- Click Save
Now to import data into your new index:
- From the Splunk home screen, select Add Data
- Select A file or directory of files
- Choose Next under Consume any file on this Splunk server
- On the Get data from files and directories screen, choose Upload and index a file, then select the Choose File button
- Select the first CSV file to upload (e.g. fls.csv)
- Select More settings
- I like to set the Host field value to a short name representing the device from which the timeline was extracted
- In the Source type drop-down menu, select mactime (or log2timeline if you exported using the log2timeline CSV format)
- Under Index, select the index you just created
- Select Save
- Repeat with any other CSV files to be analysed
Splunk will now upload your data and process into the index you selected.
How does Splunk treat timeline data?
Splunk allows developers to create customised apps for a wide range of data analysis needs. The main app is the Search app, which can be accessed from the Splunk home page; this is where we’ll be working from here.
Let’s start by simply showing all timeline entries in your case. In the Search app, enter the following, where case is the name of the index you created earlier:
You’ll notice Splunk will dynamically graph the results on a visual timeline as search results are retrieved.
Splunk will extract the timestamp from every event in the timeline CSV files and treat all other fields as text. The timestamp used by Splunk for each event is provided in the _time field, which is the first column of all search results.
By default, results are displayed in the Events List view, but I prefer the Events Table, which can be selected with the grid button under the timeline.
Expanding the >> button to the left of the Events Table button expands the Field discovery pane, which shows the fields that Splunk has recognised in your data. You can select to display whichever fields are required. Note in particular the various date fields that Splunk automatically stores (e.g. date_year, date_mday, date_hour, etc.) which can be very useful for searching. Note that you don't have to display a particular field to use it in your search.
I prefer to start with a simple set of display fields, in this order too:
So, what kind of searches can be performed in Splunk?
It's really up to your creativity and your understanding of the underlying data.
Here are some ideas to get you started. It’s worth mentioning however that while the searches below will help you both get a broad understanding of activities that have occurred and quickly focus in on specific events, the forensic examiner should still review the results line by line and analyse all the relevant entries around events in question.
Timelines usually contain date outliers, such as epoch dates (1 Jan 1970) and future dates (e.g. 25 Feb 2354).
Let's use one of those built-in date fields to restrict the results to this year:
Or down to a certain month:
index=case date_year=2011 date_month=november
Note that search terms are not case sensitive but are treated as text, so anything more than a simple term should be quoted.
Maybe we want to browse the creation times of all directories, sorted by name?
index=case mode=”r*” meta=”*b” | sort file
Or reverse the sort order:
index=case mode=”r*” meta=”*b” | sort file desc
You’ll notice Splunk limits sorted results to 10,000 entries. This can be overcome like so:
index=case mode=”r*” meta=”*b” | sort 25,000 file desc
We can use the text of log2timeline module outputs (in the file field) to select certain records only. For example, to select all ntuser.dat entries showing program execution, sorted by name, try:
index=case file=”*time of launch*” | sort file
Or maybe we want to analyse a user’s access to a certain document type, sorted by descending time:
index=case file=”*scott*” file=”*.pdf*” | sort _time desc
Splunk assumes a logical AND operator between search terms. We can add OR operators explicity as required, such as to extend our search to other file types:
index=case file=”*scott*” (file=”*.pdf*” OR file=”*.doc*” OR file=”*.xls*”) | sort _time
How about looking at all files access by a user from index.dat entries:
index=case file=”*scott*” file=”*file://*”
Seeing accesses from a strange drive letter? Let’s see where else that drive appears:
Interested in Recycle Bin activity? Try:
index=case file=”*recycle*” | sort asc
Want to know what activity is associated with a given metadata address (e.g. MFT entry)?
Know a user has a Hotmail account but can’t find it? Try:
index=case file=”*scott*” file=”*@hotmail.com*”
Wondering if a user attached a file to a webmail message? Try:
index=case file=”*scott*” file="[Internet*" file=”*attach*”
Want to export the results of a search as CSV?
index=case file=”*scott*” | outputcsv scott-search.csv
Things to remember
Here are some important questions to be asking when analysing timelines:
- Where does this timestamp originate?
- How is it affected by the relevant operating system, file system and application involved?
- Is it stored in GMT or local time and how is it affected by timezone?
- Did this timestamp come from this computer or another one?
- What's the meaning of this file system timestamp activity; e.g. what could an entry with a type of "m.c." mean?
- What's the timestamp granularity? e.g. no time in FAT last access timestamps, no seconds in internal Office metadata
- What's the accuracy of the relevant underlying system clocks?
And of course remember, timelines are not a complete history since timestamps update.
Some other useful tips
Here are some other general tips that may save you time:
- Try log2timeline -u to have it self update
- To display dates in Australian format rather than US, reconnect to your Splunk server with the GB locale specified: http://localhost:8000/en-GB/
- To stop and start Splunk, run this from the bin directory of your Splunk server: sudo ./splunk start | stop
- To clean the events within an index, use: sudo ./splunk clean eventdata –indexcase
- To clear all indexes, use: sudo ./splunk clean eventdata
Splunk provides various licensing options. At the time of writing, it will install an enterprise license with full functionality, then revert to a license with a daily indexing limit of 500MB. It's your responsibility to stay compliant, which you can check through the Licensing options in the Manager screen.
This is only the tip of the timeline Splunking iceberg, so more details may follow if the community finds this useful.
19 Nov 2011
- Released for Ruxcon 2011
22 Nov 2011
- Corrected mistake with timezone switch -z
- Removed mft input module from log2timeline command to avoid confusion
2 Aug 2012
- Fixed links to props.conf and transforms.conf