Using the Terminal – Part 7: Wildcards and Glob Patterns

Using the Terminal – Part 7: Wildcards and Glob Patterns

I have mentioned and used them already a few time, so this time I want to briefly explain the globbing concept in your terminal in more detail.

A glob pattern is pretty similar to a regular expression, but it is simpler. It’s great advantage is, that it is part of the shell and therefor always available. Also the base concepts are shared amongst many operating systems, however, the exact implementation can be different. I don’t want to go into details on this but since I have explained regex before I will explain the difference to them.

In general most commands will only match an exact pattern, so if you want to delete a file, you have to know the file name exactly. rm file.txt will only remove file.txt and no other file that contains the pattern, e.g. file.txt~. However, there are certain characters, that you can insert, to make the pattern less specific. Often these characters are called wildcards or jokers.

Note that bash knows a few options for glob patterns, in the following I will only address the standard *buntu settings. Your .bashrc already contains one option # shopt -s globstar, but it is commented out (so not used and I won’t be using it here as well).

The Asterisk

The asterisk (or star) * stands for an unspecified amount of characters (including zero), that is similar to the asterisk quantifier in regex. To get the same function in regex, you need to combine it with the period .*. Its behavior is identical throughout the various implementations, so you might know it also from Windows. The most likely most popular pattern is *.*.

Examples:

  • a*c – will match every pattern that start with a and ends with c, that is ac as well as abc however, you could also enter whatever you want between a and c
  • *.txt – will match everything that ends with .txt, so in general all txt files

Screenshot from 2015-04-19 17:06:21

As you can see in the example, the definition of the asterisk is pretty weak, so the amount of matches can be fairly large which is making it powerful but also dangerous depending on the command you use. If you just specify the asterisk, your command will match everything. Fortunately, some commands are smart enough too not let you do too much damage, for example rm -rf *.txt will not delete all txt files in all sub directories, but you should really not get more unspecific, rm -rf * is a killer even accidentally added (like rm -rf * *.txt), at least it wont allow you to remove the root directory but still it can do a bit of damage (try using -i with such commands).

The Question Mark

In contrast to the asterisk the ? is very restrictive, it stands for exactly one character, it is equal to the period in regex and not to the question mark. That is a really confusing thing. In regex, the question mark is a quantifier for zero or one, so it has no function by itself, and when combined with the period .? it means something similar but still different (any character zero or one times).

To use the question mark you already have to know pretty well what you are searching for. For example:

  • a?c – will match every pattern that starts with a and ends with c and has exactly one character in between like aac, abc and acc but it would not match ac
  • 2015-03-?? – can be used to match all dates of March 2015

Screenshot from 2015-04-19 17:06:58

However, you can not use the question mark in a way like this *.jp?g to match all files the end on either jpg or jpeg cause it would never match jpg. That makes the biggest difference to regex where the equivalent expression (.*\.jpe?g) can be used for exactly that. Other implementations of glob patterns might handle that differently, there it could mean zero or one.

Square Brackets

The square brackets [] are pretty similar to those known from regex, they define a range of characters. The biggest difference is that regex uses the caret ^ to invert the range while in glob pattern you use the exclamation mark !.

  • a[abc]c – will match ever pattern that starts with a and end with c and has either a, b or c in between, aac, abc, acc but not ac or adc
  • a[a-z]c – will will match ever pattern that starts with a and end with c and has one of the characters a to z between them
  • a[!abc]c – will match all patterns that start with a and end with c expect this with a, b, c or nothing between them, so it will not match ac, aac, abc and acc.

You need to be careful with character ranges like [a-z], it means aAbBcC…z, it will match small and capital letters for everything but the z, so the better range is [a-Z] but [a-zA-Z] should work as well.

Screenshot from 2015-04-19 17:20:49

Curly Brackets

Using curly brackets {} you can define alternatives in a comma separated list, which is pretty similar to (|) in regex. A pattern like {pattern1,pattern2} matches either pattern1 or pattern2. Examples:

  • {ab,bc,ac} – will match either ac, ab or ac
  • a{?,}c – will match any pattern that starts with a and end with c and has exactly one character or none in between, so it will also match ac

Screenshot from 2015-04-19 17:24:11

This last example gets us closer to the behavior of the ? in regex, to search for both jpg and jpeg files one can use *.jp{e,}g, so e? in regex matches {e,} in a glob pattern. Wait, not so fast there is one drawback: *.jp{e,}g will search for *.jpg and *.jpeg, that is okay as long as both are found, if e.g. nothing matches *.jpeg it will produce an error.

You can use the curly brackets also to define character ranges but their use is different. If you type echo {a..z}, the terminal will print all (small) characters from a to z. That sounds good but ls *.t{a..z}t is not the same as ls *.t[a-z]t. It’s the same problem as mentioned above, it will produce error messages for every unsuccessful search.

So, while the curly brackets are expected to have a behavior like an OR they act more like an AND, instead of searching for (pattern1 OR pattern2) it does a search for pattern1 and a search for pattern2. This is called brace expansion, the expansion in brackets gets expanded to all possible patterns and all possible patterns will be searched and if one of these searches fails, it returns an error.

Still this is a nice function, try for example echo {-100..100..5} or echo {a..z..2} and see what happens.

Tl;dr

As you can see from the examples glob patterns are much less flexible than regex patterns which limits their use. However, they are by far not useless because they are a feature of the shell and are available for many commands that do not support regex.

But as always you should be careful in case you are not sure what exactly your pattern will match, in particular with the very unspecific asterisk. If you use such unspecific patterns always use ls to test or make the command interactive.

2 thoughts on “Using the Terminal – Part 7: Wildcards and Glob Patterns

  1. These tutorials have been great for me (a former windows user). Thanks for the hard work, it’s a great place to start.

    Like

Leave a comment