What's the big deal about sed, anyways?

Seek and destroy. Or, you know, replace

2014-03-12
howto

There comes a time in any young Linux SysAdmin’s life where they have to learn the importance of shell scripting in Bash. One of the more useful commands is one called sed (stream editor).

sed is designed to take an input (either from stdin or from a file), edit the text of the input, and throw the edited version back out at you (which can then be redirected from stdout to a file). Unlike a text editor, it does this in real time (which means it can be scripted, which is super nifty).

There’s far more about sed than I could explain in this blog post, but Bruce Barnett does a fantastic job of explaining what sed is and what it can do on this page of his website.

I find myself using sed all the time because of how powerful it is and all of the things it can do.

Recently I was faced with the task of taking a log file and converting it into a CSV format. The format of the logfile was the output from ipmitool on an Ubuntu server and the format looked like this:

...data data data...
Mon Mar 10 02:50:06 EDT 2014
ESM Frt I/O Temp | 31h | ok  | 12.0 | 28 degrees C
ESM CPU 1 Temp   | 32h | ns  |  3.1 | No Reading
ESM CPU 2 Temp   | 33h | ns  |  3.2 | No Reading
ESM Riser Temp   | 34h | ok  | 16.0 | 35 degrees C
ESM CPU 1 Temp   | 35h | ok  |  3.1 | 34 degrees C
ESM CPU 2 Temp   | 36h | ok  |  3.2 | 33 degrees C
BP Bottom Temp   | 03h | ok  | 15.1 | 29 degrees C
BP Top Temp      | 02h | ok  | 15.1 | 29 degrees C
Mon Mar 10 02:55:06 EDT 2014
ESM Frt I/O Temp | 31h | ok  | 12.0 | 28 degrees C
ESM CPU 1 Temp   | 32h | ns  |  3.1 | No Reading
ESM CPU 2 Temp   | 33h | ns  |  3.2 | No Reading
ESM Riser Temp   | 34h | ok  | 16.0 | 35 degrees C
ESM CPU 1 Temp   | 35h | ok  |  3.1 | 38 degrees C
ESM CPU 2 Temp   | 36h | ok  |  3.2 | 39 degrees C
BP Bottom Temp   | 03h | ok  | 15.1 | 29 degrees C
BP Top Temp      | 02h | ok  | 15.1 | 29 degrees C
...data data data...

In particular we are interested in the dates and the temperatures of each CPU to graph temperatures over time. This log lives in a file called data.txt and we want it to output to a file called data.csv.

Let’s see what we can do with this.

First, we want to search for anything with ‘Mon’ in the name (to filter out the date, if you had a logfile with multiple days instead of just ‘monday’ you could use the month or even the year number, it just has to be something that is unique to the date string) or for the ID numbers of the sensors that we want to filter. This can be accomplished using grep:

grep -E '^Mon*|35h|36h' data.txt

Note: the -E allows us to use extended regular expressions, basically allowing us to search for multiple things at once.

This turns the above logfile into:

...data data data...
Mon Mar 10 02:50:06 EDT 2014
ESM CPU 1 Temp   | 35h | ok  |  3.1 | 34 degrees C
ESM CPU 2 Temp   | 36h | ok  |  3.2 | 33 degrees C
Mon Mar 10 02:55:06 EDT 2014
ESM CPU 1 Temp   | 35h | ok  |  3.1 | 38 degrees C
ESM CPU 2 Temp   | 36h | ok  |  3.2 | 39 degrees C
...data data data...

Next we want to just take the temperature data from the above output. This is where sed comes in.

Basically, we want to delete everything up to (and including) the final pipe delimiter and the space immediately after it. The sed command to do this might look similar to the following:

sed -r 's/^.+\| //g'

The -r says to read from a file, which I now realize is a bit unnecessary as the input is being piped in from grep, but the -r was there when I was testing the individual portions because I was reading from the data.txt file, and the script works with the -r option. It works for me, but YMMV.
Next comes the regular expression. This is arguably the most important part of the sed command because this is what is going to find the text that you want to replace. Without having this part correct, the sed command is basically rendered useless (or, worse, destructive to other parts of the file that you want to keep intact). General rule: if you’re experimenting with sed and regex, make a copy of the original file. There are some flags that you can set that will overwrite the source file.
Anyways, breaking this part down a bit:
s/tolookfor/toreplacewith/g says to search for “tolookfor” and to replace it with “toreplacewith.” The ‘g’ at the end means to do this globally, or to change all occurances found.
“^.+| “ means to start at the beginning of each string (^), look for any number of occurrences of any character (.+) until you get to a pipe and a space (the backslash is there to escape the pipe character).
There is nothing in between the next two slashes, which is the equivalent of s/toolookfor//g. This basically means to replace anything that you find with nothing (ie, delete everything that you find).

The output of this command when fed the previous output looks like this:

...data data data...
Mon Mar 10 02:50:06 EDT 2014
34 degrees C
33 degrees C
Mon Mar 10 02:55:06 EDT 2014
38 degrees C
39 degrees C
...data data data...

We’re starting to get somewhere.

In some cases, you would want to leave the “degrees C” part in this file, but I already know that this is going to be the unit used (and am going to put it on the graph), so I want to find and get rid of that part, which is what the following sed command does:

sed -r 's/ degrees C//g'

After this second sed command this is what we’re left with:

...data data data...
Mon Mar 10 02:50:06 EDT 2014
34
33
Mon Mar 10 02:55:06 EDT 2014
38
39
...data data data...

Now all that is left to do is to turn this into a CSV. This is accomplished using the nifty paste command. This is originally used to take two files and join them together using a user-specified delimiter, but can be used to combine things from standard input as well, as follows:

paste -d, - - -

-d, means to use a comma as a delimiter, and the three dashes mean to read from standard input three times. As the input is delimited by a newline, it will read in three lines at a time and delimit them with commas.

Finally, we have output that looks like this:

...data data data...
Mon Mar 10 02:50:06 EDT 2014,34,33
Mon Mar 10 02:55:06 EDT 2014,38,39
...data data data...

Combining all of these commands, we end up with the following one-liner:

grep -E '^Mon*|35h|36h' data.txt | sed -r 's/^.+\| //g' | sed -r 's/ degrees C//g' | paste -d, - - - > data.csv

(this text might wrap around, but it was originally written as a single line and does work this way)

This will parse an entire logfile into a CSV that can easily be imported into a program such as Excel for graphing.

Sed is a powerful command, and it can do lots more than I’ve listed here. For more info on sed, check out the link a bit further up.


Comments: