In the previous post I told you a lot about regular expressions, but I did not tell how you can make use of them. Regular expressions make some really strong commands that you can use to search your computer from the terminal even stronger. In this post I want to introduce three of them grep
, locate
and find
. While these commands serve a similar function, they all serve a different purpose, unfortunately, their handling is pretty also different.
grep
The grep
command is the most powerful search command, cause it can look inside a file and search it’s content (assuming that it is a file that can be read). We have already used it in an earlier post to filter the output of the history
command in a pipe. You can do the same with many commands, which is at times easier that reading the full output.
But let’s start with something simpler. I have a text file (named test.txt) that contains the words “Hello World” (remember that from the previous post?) and we want to search this file for the word “world”, so we can use
grep 'world' test.txt
this doesn’t give a match cause, as you might remember, “world” isn’t “World” but grep
knows in it’s basic form GNU basic regular expressions, so you can simply use
grep '[wW]orld' test.txt
Instead of this, you can also use the -i
switch to make the search case insensitive.
grep -i 'world' test.txt
As you can see in the output grep
returns the whole line, not just the matching text, this is useful when searching text files/outputs. You can for example search the manpage of grep
for the -i
switch using
man grep | grep '\-i\b'
(I add a word boundary \b
to make it not match longform switches starting with --i
)
To display multiple lines surrounding the match, you can use -B
(before), -A
(after) or -C
(both) followed by a number, use a command like this
man grep | grep -C2 '\-i\b'
to display two lines around the matches in above example. To get just the match you can use the -o
switch
grep -o 'is' test2.txt
This extracts all matches, also if there are multiple matches in one line. To count the lines containing a match, you can use the -c
switch
grep -c 'is' test2.txt
As a little game, if you combine these two, you can count all occurrences
grep -o 'is' test2.txt | grep -c 'is'
(you can do it shorter with another command, but this works as well)
You can also search multiple files within a directory to search all the files that contain the word “is” in the subdirectory “test/” you can use
grep '\<is\>' test/*
This gives the filenames and the matching lines, to just get the file names you can use the -l
switch
The -l
switch is particularly useful, when you want to write the file names to a file
grep -l '\<is\>' * >> result.txt
To use extended regular expressions with grep
you use either the command egrep
or grep -E
grep -E '(w|W)orld' test.txt
Over all grep
is a pretty straight forward command and as you can see in these few example it is a really powerful one. As usual, I have presented just a few of the possibilities. And by combining grep
with other commands, you can make it even more powerful. One example is the strings
command, that can extract strings from all kinds of files. That way you can for example search a jpg that contains meta data for the camera manufacturer
strings test.jpg | egrep -io '(nikon|canon)'
(This screenshot was sponsored by NIKON NIKON NIKON NIKON, unfortunately, not)
Some of you might ask, if there are GUI tools that provide such a function. Yes, you can for example use Searchmonkey or regexxer (both available trough official sources). I am using these too, they are really great and pretty powerful tools and there is nothing wrong with a GUI.
locate
locate
is a cool command when it comes to finding files on your system quickly. There are a few implementations of this function available, the recent *buntus use the mlocate
implementation. The previous locate
was part of the GNU findutils. One drawback of locate
is that it tries to be compatible with both GNU locate and another implementation called slocate. This also leads to a somewhat weird documentation.
locate
is no live search, it uses a database, that is updated on a daily basis using the updatedb
command via cronjob (that is like the task planner you have on Windows), you can also update manually using sudo updatedb
. updatedb by default excludes a couple of directories,you can adjust which in /etc/updatedb.conf via the PRUNEPATH, in your *buntu that is by default “/tmp /var/spool /media /home/.ecryptfs”. I am using “/tmp /var/spool /media /home /mnt”, so it indicates just the system files, but that is just a personal preference.
To find a file on you system you can for example use
locate libflashplayer.so
to find libflashplayer.so, the response is almost immediate. In this case the search pattern doesn’t represent a regular expression (else we would have to escape the period). The search pattern also does not have to be a file name, you can also search for directories. You can even add multiple patterns and unless you specify something else the patterns are treated like they were enclosed in asterisk wildcards (*PATTERN*), this changes as soon as you add wildcards or bracket expressions to your pattern (people talk about glob patterns or globbing in that case). In other words, the base usage of locate
is designed to be pretty flexible and not to be very exact.
To match only the file or folder name (files or folders make no difference on Linux) instead of the entire path you can use the -b
switch
locate -b libflashplayer.so
you could also search for just flash, which leads to more results but would also match libflashplayer.so. you could also ad player as a second search pattern but this would lead to more results cause patterns are connected via a logical OR, to make this an AND, you have to use -A
(for all)
locate -bA flash player
so if you don’t know a name exactly, you can narrow it down pretty well.
To use regex with locate
you have to use the --regex
, this supports GNU/Linux extended regular expressions, if you use -r
or --regexp
it should use EMACS regular expressions, however, that is, from the documentation, a bit unclear.
find
find
is compared to locate
the stronger command cause it can really search your file system and not just a database that might be outdated. It also provides a lot of different options to search, but that makes it also a pretty heavy tool, that is not that easy to use. You can even do things that maybe you should not do. But it’s base usage is easy to learn.
If you just type find
in a directory, it lists all the entries of the directory including it’s subdirectories (in /
that can take long). You can also specify the directory using absolute or relative paths, you can even add more than one like
find foo/ bar/
to search the subdirectories foo and bar. Good, that’s not really spectacular. To search for a name pattern, you have to specify it using -name
(note that it is just one minus)
find -name "test*.txt"
-name
finds files and folders (like already explained with locate) that match the pattern. To find just files you have to set -type f
, for folders it’s -type d
To get a more detailed output you can use the -ls
switch
find -name "test*.txt" -ls
If you need to search for a path instead of a file/folder name, you have to use -path
like
find -path "*oo/te*"
To use regex with find
you have to use -regex
, this uses per default EMACS regular expressions, for POSIX extended regular expressions, you have to set -regextype posix-extended
. The regex searches the entire path. So you could write above command also like this
find -regex ".*oo/te.*"
This a really a very small part of what find
can do for you, in addition you can search based on time, file size, owner, permission. You can combine the various search options, the standard is a logical AND but you can also use OR and NOT. You can even perform actions on the files found. But that really, really, really goes far beyond what I wanted to show in this article.
TL;dr
With grep
, locate
and find
you have a set of powerful tools at hand to search inside of files and in your file system. If you want to find out more about these command, use the man pages or the info, it’s really best you simply try them.