Harvest Linux forensic data for operational triage of an event.
This tool will produce a considerable amount of Json logs.
If you just want to run it, download the "lin_fh" binary.
This tool's output is meant to be used by forensic practioners to investigate suspicious events on live Linux systems.
Gnome Autostart Locations:
~/.config/autostart
KDE Autostart Locations:
$HOME/.kde/Autostart
$HOME/.config/autostart
$HOME/.config/plasma-workspace/env
$HOME/.config/plasma-workspace/shutdown
Misc. Autostart Locations:
/etc/xdg/autostart
/var/spool/cron
Services:
/etc/init.d
/etc/systemd
$HOME/.config/systemd/user
Udev rules:
/usr/lib/udev/rules.d
/usr/local/lib/udev/rules.d
User cron jobs:
/var/spool/cron/crontabs
Linux Forensic Harvester
Author: Brian Kellogg
License: MIT
Disclaimer:
This tool comes with no warranty or support.
If anyone chooses to use it, you accept all responsibility and liability.
Must be run as root.
Usage:
lin_fh [options]
lin_fh -fkcl
lin_fh [--ip <ip> --port <port>] [--depth <depth>]
lin_fh [--ip <ip> --port <port>] [--limit]
lin_fh [--i <ip> -p <port>] [--suidsgid] [--limit]
lin_fh (-s, --suidsgid) [--limit]
lin_fh (-r <regex> | --regex <regex>) [-ls] [-d <depth>]
lin_fh --max <bytes> [--limit] [-d <depth>]
lin_fh (-l | --limit)
lin_fh --start <start_time> [-d <depth>]
lin_fh --end <start_time> [-d <depth>] [-ls]
lin_fh --start <start_time> --end <end_time>
lin_fh -s [-d <depth>]
lin_fh [-sl] (-x <hex> | --hex <hex>)
lin_fh (-h | --help)
Options:
-d, --depth <depth> Max directory depth to traverse [default: 5]
-f, --forensics Gather general forensic info
-h, --help Print help
-l, --limit Limit CPU use
-k, --rootkit Run rootkit hunts
-m, --max <bytes> Max size of a text file in bytes to inspect the content
of for interesting strings [default: 100000]
- Text files will always be searched for references
to other files.
Remote logging:
-i, --ip <ip> IP address to send output to [default: NONE]
-p, --port <port> Destination port to send output to [default: 80]
Time window:
This option will compare the specified date window to the file's
ctime, atime, or mtime and only output logs where one of the dates falls
within that window. Window start is inclusive, window end is exclusive.
--start <UTC_start_time> Start of time window: [default: 0000-01-01T00:00:00]
- format: YYYY-MM-DDTHH:MM:SS
--end <UTC_end_time> End of time window: [default: 9999-12-31T23:59:59]
- format: YYYY-MM-DDTHH:MM:SS
Custom hunts:
-r, --regex <regex> Custom regex [default: $^]
- Search file content using custom regex
- Does not support look aheads/behinds/...
- Uses Rust regex crate (case insensitive and multiline)
- Tag: RegexHunt
-s, --suidsgid Search for suid and sgid files
- This will search the entire '/' including subdirectories
- Can take a long time
- /dev/, /mnt/, /proc/, /sys/ directories are ignored
-x, --hex <hex> Hex search string [default: FF]
- Hex string length must be a multiple of two
- format: 0a1b2c3d4e5f
- Tag: HexHunt
Cgroup harvesting:
-c, --cgroup Harvest cgroup information
- Process-level: TxCgroup entries
- Reads /proc/<pid>/cgroup for each process
- Parses cgroup paths to extract:
- Container runtime and ID (Docker, Podman, runc, Kubernetes)
- Systemd unit/slice names
- User session IDs
- Kubernetes pod IDs
- Cgroup metadata-level: TxCgroupMeta entries
- Reads /sys/fs/cgroup resource state
- Memory, CPU, PIDs limits and usage
- data_type: Cgroup (process-level)
- data_type: CgroupMeta (cgroup metadata)
Note:
Must be run as root.
A log with data_type of 'Rootkit' will be generated if the size of file read into
memory is less that the size on disk. This is a simple possible root kit identification
method.
- See: https://github.com/sandflysecurity/sandfly-file-decloak
To capture network output, start a netcat listener on your port of choice.
Use the -k option with netcat to prevent netcat from closing after a TCP connection is closed.
Files larger than 256MB will not be hashed.
Files larger than '--max' will not be inspected for interesting strings.
Must be run as root
sudo apt install musl-tools
rustup target add x86_64-unknown-linux-musl
cargo build --release
Further procfs parsingExpand on interesting strings to capture in "FileContent" data_type- Add static examination of binaries, including interesting strings
- Add other persistence mechanisms
Report on local users, /etc/passwd, and group, /etc/groups, membershipIdentification of "interesting" log entriesOutput via network commsWeb shell detectionShell historiesSetuid / setgid- Traps
- Document parent and child data type relation
- Add more interesting strings / commands to search for in file contents specific to Linux
- ...
Output is in Json for import into ELK or any other Json indexer. I may add other log formats.
No configuration files are currently included. Everything is compiled in to acheive easier remote use of the tool. Just copy file to host and run. Pipe / redirect the output with standard Linux tools. At some point I will probably add a network send option.
parent_data_type- if a log was generated due to something found in another log this field will hold thedata_typeof the parent log that caused this log to be generated (e.g. file path was found in a file's content and therefore the tool went and gathered metadata on that file referenced in the first file's content)data_type- the source of telemetry the log is reporting ontags- tags are added to this array field when something interesting is found by a built-in hunt Anything of interest (a hunt, e.g. for rootkits or interesting stings/content) will be noted in thetagsfield.
Information gathered on:
- Cgroup
- Data Type:
Cgroup— per-process cgroup membership, reads/proc/<pid>/cgroup - Data Type:
CgroupMeta— cgroup resource state from/sys/fs/cgroup - What is a cgroup?
- Control groups (cgroups) are a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk, network, etc.) of a set of processes. Every process belongs to exactly one cgroup. Cgroups form a tree: a process is always a member of all cgroups on the path from a leaf node up to the root.
- On cgroup v2 (modern systems) there is a single unified hierarchy mounted at
/sys/fs/cgroup/unified. On cgroup v1 each controller (memory, cpu, pids, etc.) has its own hierarchy. This tool reads bothcgroup.procs(v2) andtasks(v1).
Cgroupfields (per-process log)Field Forensic significance pidThe process ID belonging to this cgroup comm/command_lineWhat the process is running — useful for confirming identity cgroup_pathThe relative cgroup path from /proc/<pid>/cgroupcgroup_contentRaw /proc/<pid>/cgrouplines for correlationcontainer_runtimeExtracted runtime: docker,podman,runc, orkubernetescontainer_idContainer ID (64-hex for runc, prefixed for docker/libpod, pod-qualified for k8s) systemd_unit/systemd_sliceSystemd unit and slice name (e.g. system.slice:sshd.service)user_session_idLogind session for interactive user processes kubernetes_pod_id/kubernetes_classK8s pod UID and resource class (besteffort, burstable, guaranteed) CgroupMetafields (cgroup-level resource log)Field Forensic significance cgroup_pathHierarchical path under /sys/fs/cgroupcgroup_pidsAll PIDs in this cgroup and its descendants (read from cgroup.procs/tasks)pids_currentKernel-reported count of tasks in the cgroup pids_maxHard limit on number of tasks; unlimitedmeans no limitpids_peakHighest number of tasks that have existed simultaneously memory.current/memory.maxCurrent memory RSS+cache usage and hard limit memory.statBreakdown: anon, file, kernel stack, slab, pgfault, pgmajfault, … memory.eventsCounters that increment when thresholds are hit (oom, low, high) cpu.maxCPU bandwidth limit (quota period, e.g. 100000 100000= 100 %)cpu.statCPU time consumed in usecs by user, system, irq cgroup_controllersControllers enabled in this cgroup cgroup_subtree_controlWhich controllers can be delegated to child cgroups io.stat/io.max/io.eventsBlock-device I/O accounting and limits cgroup.progenyPIDs of child cgroups (not processes inside them) cgroup.freezeWhether the cgroup and its descendants are frozen (stop scheduling) - Spotting evil — cross-field comparisons
pids_currentvscgroup_pidslength — if the kernel counter says 3 but thecgroup.procsfile lists 12 processes, the kernel may be lying (rootkit hiding processes) or a process has escaped its cgroup.pids_currentvspids_peak— ifpids_currentis 0 butpids_peakis very high, a burst of processes (fork bomb, crypto miner, mass compromise) existed and may have left artifacts.memory.currentnearmemory.max— process is memory-starved; could indicate a memory-based DoS or a misconfigured container that can consume all host memory.cpu.maxatmax 1with elevatedcpu.statusage — CPU is being throttled; compare withpids_currentto see if a small number of processes are hogging bandwidth.container_idonCgroupMetabut no matchingCgrouplog — a cgroup exists on disk that no running process claims, which can mean a container was torn down improperly or a rogue cgroup was created for persistence.systemd_unitthat is not a known service — unexpected units (e.g..serviceor.scopenames that do not map to installed packages) are a strong indicator of dropped binaries masquerading as systemd units.
- Threads and
clone()- Linux threads are not a separate kernel object — they are processes that share
address space via
clone()with theCLONE_THREADflag. All threads of the same pthread family share the sametgid(thread group ID, i.e. the original PID) and have uniquepidvalues. - The cgroup tracks tasks, not threads.
pids_currentcounts every task (thread) in the cgroup, not just thread groups. So a single-threaded process contributes 1 and a 10-thread application contributes 10. - To get the number of distinct processes (thread groups) you count unique
tgidvalues in/proc/[pid]/status(Threads:field gives per-process thread count). The total threads = sum of allThreads:values =cgroup_pidslength. - Forensic tip: a seemingly benign process with a very high thread count compared to its siblings may indicate a malicious loader, C2 agent, or a process that has been repurposed.
- Linux threads are not a separate kernel object — they are processes that share
address space via
- Data Type:
- Cron jobs
- Data type:
Cron
- Data type:
- Drive mounts
- Data type:
MountPoint
- Data type:
- Groups
- Data type:
LocalGroup
- Data type:
- Interesting File Content
- Encoded strings
- Tag:
Encoding,Base64,Obfuscation
- Tag:
- File referenced in a file's content
- Tag:
FilePath- If a file's forensic data was harvested due to it being referenced in another file this tag is added
- Tag:
- IPs (v4 and v6)
- Tag:
IPv4,IPv6
- Tag:
- Shell code
- Tag:
ShellCode
- Tag:
- UNCs
- Tag:
Unc
- Tag:
- URLs
- Tag:
Url
- Tag:
- Web shells
- Tag:
WebShell
- Tag:
- Custom hex search
- Tag:
Hex
- Tag:
- Custom Regex
- Tag:
Regex
- Tag:
- Right to left trickery
- Tag:
RightLeft
- Tag:
- Shell references (sh, bash, zsh, ...)
- Tag:
Shell
- Tag:
- Possible suspicious commands
- Tag:
Suspicious
- Tag:
- Encoded strings
- Link files
- Data type:
ShellLink
- Data type:
- Loaded Kernel Modules
- Data type:
KernelModule
- Data type:
- Network connections (via procfs)
- Data type:
NetConn
- Data type:
- Possible rookit
- Data type:
Rootkit
- Data type:
- Processes (via procfs)
- Data type:
Process - Process file (file of the process on disk)
- Process' open files
- Data type:
ProcessOpenFile
- Data type:
- Process' loaded libraries
- Data type:
ProcessMap
- Data type:
- Process' mem mapped files
- Data type:
ProcessMaps
- Data type:
- Data type:
- Users
- Data type:
LocalUser
- Data type:
NOTE: Live machine analysis for rootkits is not entirely reliable. Well written rootkits will probably not be able to be discovered reliably with live machine forensics.
- Any logs generated due to a rootkit hunt will have
Rootkitset as theirparent_data_type - File data that is found in memory mapped read files not found via a standard file read
- Tag:
DataHidden
- Tag:
- Directory with hidden contents
- Tag:
DirContentsHidden
- Tag:
- Tainted kernel module information
- Tag:
KernelTaint
- Tag:
- Hidden processes
- Tag:
ProcHidden
- Tag:
- World readable run lock files
- Tag:
ProcLockWorldRead
- Tag:
- Odd run lock files
- Tag:
ProcLockSus
- Tag:
- Legit process mimicry
- Tag:
ProcMimic
- Tag:
- Processes thread mimicry
- Tag:
ThreadMimic
- Tag:
- Hidden sys modules
- Tag:
ModuleHidden
- Tag:
- Raw packet sniffing processes
- Tag:
PacketSniffer
- Tag:
- Process takeovers
- Tag:
ProcTakeover
- Tag:
- Proccess run as root with socket and no deps outside of libc
- Tag:
ProcRootSocketNoDeps
- Tag:
- Odd character devices
- Tag:
CharDeviceMimic
- Tag:
- https://github.com/tstromberg/sunlight/tree/main
- https://sandflysecurity.com/blog/how-to-detect-and-decloak-linux-stealth-rootkit-data/
- https://www.linkedin.com/pulse/detecting-linux-kernel-process-masquerading-command-line-rowland/
Some file contents are examined looking for other interesting strings. For example, if another file is referenced within a file, that file's metadata will also be retreived. Other strings of interest found in file contents are reported: IPs, file paths, URLs, shellcode, Base64 and misc encodings, and UNC paths.
Process information is retreived via ProcFS parsing.
The "data_type" field is used to report what the metadata in that log is pulled from. e.g. File, FileContent, Process, ... .
The "parent_data_type" field is used to report if that log was generated due to examining another data_type. e.g. the "FileContent" data_type may trigger a "File" data_type if a file path is found in a file's contents.
The network connection logs do not show originator or responder perspectives simply because procfs reports the IPs as local and remote. You can make a good guess as to whether a network connection is incoming or outgoing based upon which port is higher than the other. But, this will not always yeild the correct direction.
If you want to change the field name(s) of any fields please edit the struct field names in the data_def source file.
This tool comes with no warranty or support. If anyone chooses to use it, you accept all responsability and liability.
// file paths we want to watch all files in
const WATCH_PATHS: [&str; 14] = [
"/etc",
"/home",
"/lib/modules",
"/proc",
"/root",
"/srv",
"/tmp",
"/usr/lib/systemd/system",
"/usr/local/var/www/html",
"/usr/share/nginx/html",
"/usr/share/nginx/www",
"/var/log",
"/var/spool/cron",
"/var/www",
];
// files mime types whose content we want to look at for interesting things
const WATCH_FILE_TYPES: [&str; 25] = [
"abiword",
"/pdf",
"/pkix-cert+pem",
"/rtf",
"/vnd.iccprofile",
"/x-desktop",
"/x-object",
"/x-pcapng",
"/x-perl",
"/x-sh",
"/x-tcl",
"/xml",
"bittorrent",
"excel",
"javascript",
"json",
"msword",
"officedocument",
"opendocument",
"powerpoint",
"presentation",
"stardivision",
"text/",
"wordperfect",
"yaml",
];