GIT – This lab is intended for beginners who have experience with basic Linux commands (such as cp, cd, mv, ls, rm, rmdir, and mkdir). You might want to brush up on your basic commands if you are really rusty.
- A user account with a home directory on a Linux machine. Any standard, major Linux distribution (and most minor ones, too) should be fine.
- Either physical or remote access to the Linux machine and the ability to log into the machine. Physical access is best and easiest, but remote access works great too.
For this lab, you don’t even need sudo or root access!
Windows users: You must have an SSH-capable terminal program, such as Putty, installed on your machine so you can log into the Linux machine using SSH.
Upon completion of this lab, you will be able to:
- use grep to search for text either within a file or in the output of another command,
- use an inverse pattern to find lines of text that do not contain a given string
- use output redirection to string multiple commands together
- Courier font indicates screen output (command results, messages, and prompts that the system displays)
- Courier Bold font indicates commands that you enter
As in the previous lab, the bash prompt in this lab will be written as [email protected]:~$. Yours will be different, but don’t worry about it.
The exercises in this lab focus on manipulating text inside files. Since this is a Linux lab and not a 1980s-era typing tutorial, you might want to use a GUI text editor to cut and paste the following blocks of text and save them as files in your home directory. If you want to get some practice using your editor, use nano, pico, or vito create the files.
File 1: fruits.txt
kiwi orange apple plum glitterberries banana purpleberries berries
File 2: purplestuff.txt
crayons balloons anime hair tinted glasses plums purpleberries unicorns bass guitar
When you are finished, check to be sure the files are in your home directory:
~$ ls *.txt fruits.txt purplestuff.txt
Computers are good at storing and processing data. Much of the data you store on them is in the form of plain-text files, especially in Linux. Text files are not just things like recipes for peanut butter cookies; they may contain important configuration files, email, amateur political commentary, calendar entries, even grocery shopping lists.
When a computer processes data, it often provides some sort of plain-text feedback, hopefully the information you wanted. Sometimes though, the computer provides text about an error, crash output, even information you didn’t want. In any case, there’s usually a lot of text to sort through on a computer!
Being human, you tend to take in certain pieces of information and ignore others; if you didn’t filter information, you would otherwise be swamped with things you didn’t care about or need to know. Wouldn’t it be great if your computer, which is supposed to make your life easier, were able to help you in the same way by showing you only only the things you want to see? Well, you’re about to learn how to make that happen!
In Linux, you can use the commands grep, sort, wc, and their associated flags to find and display text in ways that are meaningful and useful to you.
Let’s Start Greppin’ in the GNU World…
Note: The text processing commands in this lab display the contents of files in the way you request; these commands do not change the contents of the files on your disk unless otherwise stated.
One of the simplest text processing tools is the cat command, which stands for “concatenate”. You learned about cat in the Basic Linux Commands Lab, but since you’ll be using it a bit more today, the following is a quick review. When the cat command is given a list of one or more files to display, it displays the contents of the files on “standard output,” which is your monitor’s screen by default.
Use the cat command to display the contents of fruits.txt.
~$ cat fruits.txt kiwi orange apple plum glitterberries banana purpleberries blueberries
Notice that fruits.txt contains a list of fruits that are listed in no particular order. If you’d like to see the fruits listed in alphabetical order, use the sort command, as shown.
~$ sort fruits.txt apple banana blueberries glitterberries kiwi orange plum purpleberries
Note that the contents of fruits.txt did not change; the sort command simply presented the contents in the way we requested, that is, sorted alphabetically!
Although the example file does not contain duplicate lines, if it did, you could eliminate the duplicate lines by using the -u (unique) flag with the sort command.
The wc (word count, not the loo!) command shows you information about the text file rather than showing the contents of the text file.
By default, wc provides the following output:
- the number of “newlines” (that is, the number of invisible end-of-line characters created by pressing Enter in an editor) in the file
- the number of words (groups of characters separated by spaces) in the file
- the number of bytes the file occupies on your storage media (hard drive, floppy disk, etc.)
Try wc on fruits.txt, as shown.
~$ wc fruits.txt 9 8 68 fruits.txt
The fruits.txt file:
- has 9 “newlines” (the last line of the file contains no text)
- contains 8 words (note that the file only contains one-word fruits)
- is 68 bytes in size
You can use flags with wc to specify that you want to see only certain information, such as:
- -w for word count
- -l (lower-case L) for newline count
- -c for byte count
You can combine the flags with one another (and more!) to get the output you want presented in the order you want.
Read the man page for wc to learn what else it can tell you about a given file, and then see if you can answer the following questions:
- What is the length of the longest line in the purplestuff.txt file?
- How many characters are in each file?
The grep command is is a very powerful command that is often used as a verb in spoken and written “computer-geek” English. The term grep was derived from the description of what it does, that is, “Globally matching a Regular Expression and Printing the lines”. Of course, that explanation makes little sense unless you know what a regular expression is. While you can find entire very-long books on the subject, the “executive summary” is that a regular expression (often shortened to “regex”) is a symbolic way of describing a text pattern.
Regexes can be very complex, but you can start with a simple one: an exact match for a series of characters. For example, the pattern “ninja” will match “ninja”, “ninjas”, and “notninja”, since these examples all contain the pattern. However, it would not match “pirate” nor would it match “nin-ja” since the pattern is not matched exactly.
Use grep to find a simple, exact pattern within the fruits.txt file:
~$ grep kiwi fruits.txt kiwi
Notice that grep outputs those lines in the file which match the pattern. In this case, there was only one line that matched “kiwi”.
What if there aren’t any lines that match the pattern?
Now use grep to look for “kiwi” in purplestuff.txt:
~$ grep kiwi purplestuff.txt
No lines match, so grep exits with no output. (Fortunately, it does not scold us for looking for purple kiwis!)
Try using grep to output the lines in fruits.txt that match “berries” :
~$ grep berries fruits.txt glitterberries purpleberries berries
This time grep outputs three lines that match “berries,” including two that contain the word “berries” within other words. If you want to see only the lines that are an exact match for your pattern and nothing more, type grep and add the -w flag. The -w flag tells grep to look for whole words only.
~$ grep -w berries fruits.txt berries
Sometimes, it’s more useful to search for things that don’t match a specific pattern. Suppose you wanted to find all the fruits in fruits.txt that are not berries? While it is possible to use a regular expression to describe a pattern that matches everything except “berries”, grep provides a much simpler way. The -v flag tells grep to find “inverse matches”. That is, grep will match anything that is not the pattern you pass to it; it’s like asking grep to behave as though it’s “Opposite Day”.
~$ grep -v berries fruits.txt kiwi orange apple plum banana
You can combine these (and many other) flags to get even more specific results by adding them together (the same way you were able to use ls -lah to ask for all files, long form, in human-readable size — see man for more information).
Question: What would you enter if you needed to find everything in fruits.txt that did not match the exact word “berries”? Remember, this would mean you do want to find words like “glitterberries”, but you do not want to find the word “berries” all by itself.
Redirecting output with > and >> to a file
Earlier, you learned that cat technically prints its output to “standard output”, which by default is your monitor’s screen. (After all, it would be difficult to use a computer if it never showed you what it was doing.) You also learned that commands like grep and sort don’t actually change the contents of a file; these commands only display the information you requested in the way you requested (for example, alphabetically).
However, what if you wanted to take the output from any of these commands and put it into a file? Would you have to transcribe it letter by letter, once again turning this into a 1980s typing tutor extravaganza? Would you need to cut and paste it into an editor somehow? Sure, both of those methods would work. But what if you simply “told” the computer, “Computer, instead of showing the output to me on the screen, please write it into a file for me.”?
What you are asking for is for the computer to “redirect the output”, and the powerful bash shell (the default Linux shell) has the ability to do that! You can append a “greater than” (>) symbol, followed by a filename, to any command that normally produces output and then exits (i.e. non-interactive programs). That “>” symbol tells the computer to send the output of your command to the file you specify.
Caution! Be aware that when using the > redirect, previously existing information in the file (if the file already exists) will be overwritten (erased). You will not be asked if you’re sure.
To see how this works, redirect the output of the following grep command, as shown.
~$ grep -v berries fruits.txt > not_berries.txt
Your command created a new file that contains all of the non-berry fruits that are listed in fruits.txt. No output was sent to the screen, since the command redirected the output to the new file. Use the cat command to view the contents of your not_berries.txt file:
~$ cat not_berries.txt kiwi orange apple plum banana
The purplestuff.txt file also contains items that are not berries that you might want to add to not_berries.txt. However, if you use “>” to redirect the output to not_berries.txt, the new output will overwrite the items we sent there from fruits.txt.
In this case, instead of using a single “>” redirect, you will use a double “>>” redirect. The double redirect tells the computer to redirect the output as before, but to append (add) the output to end of the file if the file already exists.
Use grep and the double “>>” redirect to add the non-berries from purplestuff.txt to your not_berries.txt file, as shown:
~$ grep -v berries purplestuff.txt >> not_berries.txt
Use the cat command to view the new contents of your not_berries.txt file:
~$ cat not_berries.txt kiwi orange apple plum banana crayons balloons anime hair tinted glasses plums unicorns bass guitar
If you saw a blank line in the not_berries.txt file (you might or might not have), where did that come from? It was not added by grep, > or >> because these commands don’t add anything that weren’t in the original files. If you had a blank line somewhere in one of your example files, it doesn’t just vanish when we do things like sort or concatenate the files. And quite often, text files end up with a blank line or two at the end.
Piping output to another command
The pipe character (|) is another type of redirect that allows you to chain commands together to do even more powerful text processing and manipulation. The pipe character tells the computer to take the output of one command so and use it as input to another command.
Note: the pipe character (|) is not the letter L nor the letter I. On your keyboard, it usually looks like a vertical broken bar (and for U.S. keyboards, it usually shares the same key as the backslash (\) character).
For example, how would you find all the items in purplestuff.txt that are not berries AND that start with the letter “b” using a single line of commands? Sounds like you need to build two instructions and use the pipe (|) command between them. Bash will execute the statement before the pipe (|) first, and then send its output as input to the command following the pipe (|).
Begin by writing the grep command that will match all items in purplestuff.txt that are not berries (grep -v berries purplestuff.txt). Next, add the pipe | character to signal a redirect. Finally, write the grep command that will match items that begin with the letter “b” (grep ^b).
Note: if you were to use “grep b” in the second half of the command, it would match lines that contain “b” anywhere in the line. Use the carat (^) symbol before the “b” to indicate that you only want to match items where “b” begins the line. Descriptive symbols like ^ are part of what makes regular expressions so powerful.
Here’s what your example output will look like:
~$ grep -v berries purplestuff.txt | grep ^b balloons bass guitar
By the way, you can chain as many commands together as you need to using multiple pipes! That will come in handy as you learn more about Linux.
Now that you know the basic operation of |, you can learn more about how to use it in combination with other commands.
One command you will find useful in combination with | is “tee”. The tee command writes the input it receives to a file you specify and also sends the text to standard output (usually your monitor). The tee command is really helpful when you want to use > or > to save the results of a grep command, but you also need to see those results immediately.
Try making a copy of not_berries.txt and, at the same time, displaying the contents of the new file on your monitor using tee with cat, as shown:
~$ cat not_berries.txt | tee still_not_berries.txt kiwi orange apple plum banana crayons balloons anime hair tinted glasses plums unicorns bass guitar
Greppin’ multiple files at once
You can concatenate several files into one stream of output, and then use grep to find patterns within all of that output.
Try using cat, on both fruits.txt and purplestuff.txt and redirecting the output to grep, as shown:
~$ cat fruits.txt purplestuff.txt | grep berries glitterberries purpleberries berries purpleberries
Since you can see the output on the screen, you can instantly tell if either of the files contain any entries for “berries”:
Note that the grep part of the command returned “purpleberries” twice, since “purpleberries” was in both files. Can you think of a way to change the commands to prevent the double entry?
The cat command can accept the wildcard character (*), which is shorthand for “all”. Using cat with * says, “concatenate all files in the current directory”. You might find this useful when you want to combine the contents of all files in a directory into one stream of output. (Note for Windows users: unlike DOS, the wildcard in Linux matches any dots in the filename as well as other symbols, so you don’t have to specify *.* to get all files. Many Linux files don’t even have dots and extensions.)
You might have some other files in this directory, so let’s restrict ourselves to text files. To display lines that contain “berries” for each of the files in the directory, usecat with the wildcard, pipe and grep, as shown:
~$ cat *.txt | grep berries glitterberries purpleberries berries purpleberries
Because still_not_berries.txt and not_berries.txt do not contain the string “berries”, the output of this command is the same as the previous command.
Pop quiz: the 29 hit super-combo chain
Using everything you’ve learned so far, you should be able to do some pretty complicated super-Ninja grep moves. To test your grepabiltily (not a real word!), see if you can solve the following problem without looking at the answer that follows.
Question: How many unique types of berries are there within fruits.txt and purplestuff.txt?
The rules: Write a single chain of commands. Note, that you can use as many pipe characters (|) as you need.
Did you get it?
In case you didn’t here’s one possible solution:
- Concatenate fruits.txt and purplestuff.txt into a single stream of output
- Do a grep for “berries”
- Use sort with the -u flag to remove any duplicates
- Use wc with the -l (lower-case L) flag to count the number of output lines.
One command line does it all in one fell swoop!
~$ cat fruits.txt purplestuff.txt | grep berries | sort -u | wc -l 3
By now, you should be able to:
- use grep to search for text within a file
- use grep to search for text within another command’s output
- use simple inverse patterns to find text that does not contain a specific string
- use output redirectors like > and >> to move information into other files
- use | to string together multiple commands* with cat, grep, sort, and wc
* You may be interested to know that pipes can be used with many other commands as well! Just imagine the possibilities!
Additional Examples and Exercises
Lookin’ for files in the GNU world
Use a pipe to combine the ls and grep commands into a single command that lists all the files in /usr/bin that have the word “make” anywhere in the filename.
Advanced work: How many p’s?
Still working with fruits.txt and purplestuff.txt, see if you can discover how to use the tools you have learned about to:
- Determine how many unique words in the files start with the letter “p”,
- List the words you find, sorted in reverse alphabetical order
- Return the filename where each string (including duplicates) was found, along with the returned string.
If you get stuck, read the man pages for the commands to see what flags and options are available for each.