What’s grep?
I use grep often in my daily life on either MacOS or my various linux machines (I tinker a lot 🐞)
So having done sysadmin work in the past, I often want to quickly look over some commands and find some details, currently mainly on the filesystem but in enterprises the use case was mainly looking through log files.
So what is the grep command? From the GNU docs:
Given one or more patterns, grep searches input files for matches to the patterns. When it finds a match in a line, it copies the line to standard output (by default), or produces whatever other sort of output you have requested with options.
grep stands for “global regular expression print” and is fundamentally a pattern matching tool. It uses regular expressions (regex) to search through text and is optimized for line-by-line searching.
For example, looking at ps command (we will cover this in another post), process status with -e = list all running processes in a linux system and -f = displays full format we can then pipe | this output into grep and search through the results. In this case looking for anything running to do with docker.
Key grep features:
- Fast pattern matching - Optimized for searching large files quickly
- Regular expression support - Powerful pattern matching with regex
- Multiple file handling - Can search across multiple files simultaneously
- Context options - Show lines before/after matches with
-A,-B,-Cflags
ps -ef | grep docker
Now docker has its own commands so I would recommend using that but in this example I wanted to just see from the OS side what processes are running that matches docker.
What’s awk or gawk?
The awk command or GNU awk in specific provides a scripting language for text processing
or from the manual page:
awk - pattern-directed scanning and processing language
Ah ok so it’s not at all the same thing right, it’s a scripting language so in my mind, powerful 🚀
awk is named after its creators: Aho, Weinberger, and Kernighan. Unlike grep which is primarily for pattern matching, awk is a full programming language designed for text processing and data extraction.
Key awk capabilities:
- Field processing - Automatically splits lines into fields (columns)
- Pattern-action programming - Execute actions when patterns match
- Built-in variables -
NR(line number),NF(number of fields),FS(field separator) - Mathematical operations - Can perform calculations on data
- Control structures - if/else, loops, functions
How do I use it?
awk [ -F fs ] [ -v var=value ] [ 'prog' | -f progfile ] [ file ]
Common awk options:
-F fs- Set field separator (default is whitespace)-v var=value- Set variables-f progfile- Read awk program from file
Show me an example 🎢
➜ Projects cat jeanthink.txt
jean is thinking about thinking for the sake of think
➜ Projects awk '{print $1 $NF}' jeanthink.txt
jeanthink
Field variables explained:
$0represents the entire line$1represents the first field (word)$NFstands for “Number of Fields” and represents the last field$2, $3, etc.represent the 2nd, 3rd fields respectively
More practical examples:
# Print only the 3rd column from a CSV file
awk -F',' '{print $3}' data.csv
# Sum values in the 2nd column
awk '{sum += $2} END {print sum}' numbers.txt
# Print lines longer than 50 characters
awk 'length($0) > 50' file.txt
Conclusion 🐒
Well conclusion monkey says I need to learn a lot more to really tune the finger memory but already I can see the use case and a powerful way of working on the command line.
When to use grep:
- Simple pattern searching in files or command output
- Quick filtering of log files
- Finding specific lines that match a pattern
- When you need speed for large file searches
When to use awk:
- Processing structured data (CSV, TSV, columnar data)
- Performing calculations on data fields
- Complex text transformations and formatting
- When you need programming logic (conditionals, loops)
- Generating reports from data
Real-world comparison:
# Using grep - find all ERROR lines in a log
grep "ERROR" application.log
# Using awk - find ERROR lines AND extract timestamp + message
awk '/ERROR/ {print $1, $2, $5}' application.log
I’ll be investigating another command sed (stream editor) but for now I am happy to know on ordinary day tasks for quick finds I’ll likely use grep but if I wanted to get become proficient at text processing and data manipulation, awk will be in my toolbox.