Using the Terminal – Part 5: grep, locate and find

Using the Terminal – Part 5: grep, locate and find

In the previous post I told you a lot about regular expressions, but I did not tell how you can make use of them. Regular expressions make some really strong commands that you can use to search your computer from the terminal even stronger. In this post I want to introduce three of them grep, locate and find. While these commands serve a similar function, they all serve a different purpose, unfortunately, their handling is pretty also different.

grep

The grep command is the most powerful search command, cause it can look inside a file and search it’s content (assuming that it is a file that can be read). We have already used it in an earlier post to filter the output of the history command in a pipe. You can do the same with many commands, which is at times easier that reading the full output.

But let’s start with something simpler. I have a text file (named test.txt) that contains the words “Hello World” (remember that from the previous post?) and we want to search this file for the word “world”, so we can use
grep 'world' test.txt
this doesn’t give a match cause, as you might remember, “world” isn’t “World” but grep knows in it’s basic form GNU basic regular expressions, so you can simply use
grep '[wW]orld' test.txt
Instead of this, you can also use the -i switch to make the search case insensitive.
grep -i 'world' test.txt
Screenshot from 2015-04-15 15:15:44

As you can see in the output grep returns the whole line, not just the matching text, this is useful when searching text files/outputs. You can for example search the manpage of grep for the -i switch using
man grep | grep '\-i\b'
(I add a word boundary \b to make it not match longform switches starting with --i)
Screenshot from 2015-04-15 15:34:44
To display multiple lines surrounding the match, you can use -B (before), -A (after) or -C (both) followed by a number, use a command like this
man grep | grep -C2 '\-i\b'
to display two lines around the matches in above example. To get just the match you can use the -o switch
grep -o 'is' test2.txt
This extracts all matches, also if there are multiple matches in one line. To count the lines containing a match, you can use the -c switch
grep -c 'is' test2.txt
As a little game, if you combine these two, you can count all occurrences
grep -o 'is' test2.txt | grep -c 'is'
(you can do it shorter with another command, but this works as well)
Screenshot from 2015-04-15 19:59:40

You can also search multiple files within a directory to search all the files that contain the word “is” in the subdirectory “test/” you can use
grep '\<is\>' test/*
This gives the filenames and the matching lines, to just get the file names you can use the -l switch
Screenshot from 2015-04-15 17:01:48
The -l switch is particularly useful, when you want to write the file names to a file
grep -l '\<is\>' * >> result.txt
Screenshot from 2015-04-15 18:26:14

To use extended regular expressions with grep you use either the command egrep or grep -E
grep -E '(w|W)orld' test.txt
Screenshot from 2015-04-15 19:10:40

Over all grep is a pretty straight forward command and as you can see in these few example it is a really powerful one. As usual, I have presented just a few of the possibilities. And by combining grep with other commands, you can make it even more powerful. One example is the strings command, that can extract strings from all kinds of files. That way you can for example search a jpg that contains meta data for the camera manufacturer
strings test.jpg | egrep -io '(nikon|canon)'
Screenshot from 2015-04-16 19:51:45(This screenshot was sponsored by NIKON NIKON NIKON NIKON, unfortunately, not)

Some of you might ask, if there are GUI tools that provide such a function. Yes, you can for example use Searchmonkey or regexxer (both available trough official sources). I am using these too, they are really great and pretty powerful tools and there is nothing wrong with a GUI.

locate

locate is a cool command when it comes to finding files on your system quickly. There are a few implementations of this function available, the recent *buntus use the mlocate implementation. The previous locate was part of the GNU findutils. One drawback of locate is that it tries to be compatible with both GNU locate and another implementation called slocate. This also leads to a somewhat weird documentation.

locate is no live search, it uses a database, that is updated on a daily basis using the updatedb command via cronjob (that is like the task planner you have on Windows), you can also update manually using sudo updatedb. updatedb by default excludes a couple of directories,you can adjust which in /etc/updatedb.conf via the PRUNEPATH, in your *buntu that is by default “/tmp /var/spool /media /home/.ecryptfs”. I am using “/tmp /var/spool /media /home /mnt”, so it indicates just the system files, but that is just a personal preference.

To find a file on you system you can for example use
locate libflashplayer.so
to find libflashplayer.so, the response is almost immediate. In this case the search pattern doesn’t represent a regular expression (else we would have to escape the period). The search pattern also does not have to be a file name, you can also search for directories. You can even add multiple patterns and unless you specify something else the patterns are treated like they were enclosed in asterisk wildcards (*PATTERN*), this changes as soon as you add wildcards or bracket expressions to your pattern (people talk about glob patterns or globbing in that case). In other words, the base usage of locate is designed to be pretty flexible and not to be very exact.

To match only the file or folder name (files or folders make no difference on Linux) instead of the entire path you can use the -b switch
locate -b libflashplayer.so
you could also search for just flash, which leads to more results but would also match libflashplayer.so. you could also ad player as a second search pattern but this would lead to more results cause patterns are connected via a logical OR, to make this an AND, you have to use -A (for all)
locate -bA flash player
so if you don’t know a name exactly, you can narrow it down pretty well.
Screenshot from 2015-04-15 21:36:55

To use regex with locate you have to use the --regex, this supports GNU/Linux extended regular expressions, if you use -r or --regexp it should use EMACS regular expressions, however, that is, from the documentation, a bit unclear.

find

find is compared to locate the stronger command cause it can really search your file system and not just a database that might be outdated. It also provides a lot of different options to search, but that makes it also a pretty heavy tool, that is not that easy to use. You can even do things that maybe you should not do. But it’s base usage is easy to learn.

If you just type find in a directory, it lists all the entries of the directory including it’s subdirectories (in / that can take long). You can also specify the directory using absolute or relative paths, you can even add more than one like
find foo/ bar/
to search the subdirectories foo and bar. Good, that’s not really spectacular. To search for a name pattern, you have to specify it using -name (note that it is just one minus)
find -name "test*.txt"
-name finds files and folders (like already explained with locate) that match the pattern. To find just files you have to set -type f, for folders it’s -type d
Screenshot from 2015-04-15 22:37:31

To get a more detailed output you can use the -ls switch
find -name "test*.txt" -ls
Screenshot from 2015-04-15 22:52:38

If you need to search for a path instead of a file/folder name, you have to use -path like
find -path "*oo/te*"
To use regex with find you have to use -regex, this uses per default EMACS regular expressions, for POSIX extended regular expressions, you have to set -regextype posix-extended. The regex searches the entire path. So you could write above command also like this
find -regex ".*oo/te.*"
Screenshot from 2015-04-15 23:42:47

This a really a very small part of what find can do for you, in addition you can search based on time, file size, owner, permission. You can combine the various search options, the standard is a logical AND but you can also use OR and NOT. You can even perform actions on the files found. But that really, really, really goes far beyond what I wanted to show in this article.

TL;dr

With grep, locate and find you have a set of powerful tools at hand to search inside of files and in your file system. If you want to find out more about these command, use the man pages or the info, it’s really best you simply try them.