Wednesday, March 25, 2009

SAR Visual Data Reports

One of the more important tools that an Administrator can use is sar to generate history on its system's performance. Sar ships natively with Solaris, but needs to be enabled and configured before being able to use it.

Problem: Sar data is usable, but not visually pleasing. We want to create a solution that works well with reports and allows us to show quickly and effortlessly to upper management how well (or poorly) our systems are responding. How do we create reports that are visual and informative?

Analysis: First, we need to look at our goal. Do we want something that we can run on-the-fly or that's regularly scheduled? For this case, I want to have something that's regularly scheduled and archived; we'll create graphs that we can put behind our intranet to view progression. While doing this, we'll keep in mind that we may want to run a report from within Webmin.

For our graphs, we can do one of two things: we can reformat the data to be imported to an Exchell spreadsheet, where we can generate the graph, and possibly use pivot tables (if we had enough data); or we can download and install gnuplot. We want automation at its best, and because the Domain Administrator controls whether or not we can run macros, it's best to stay native to the OS we love: Solaris. Unfortunately, Solaris doesn't ship with gnuplot, but we can easily get a stable version through our blastwave resources.

# /opt/csw/bin/pkgutil -i gnuplot

Now that we have our graphing program installed, we need to configure sar:


# svcadm enable sar
# svcs -xv sar
svc:/system/sar:default (system activity reporting package)
State: online since Mon Mar 23 10:03:06 2009
See: man -M /usr/share/man -s 1M sar
See: /var/svc/log/system-sar:default.log
Impact: None.


Running "sar" should give us data, but we haven't populated it yet. User "sys" already has templates in its crontab for sar, but we'll schedule our own for every 10 minutes:


# EDITOR=vi;export EDITOR; crontab -e sys
#ident "@(#)sys 1.5 92/07/14 SMI" /* SVr4.0 1.2 */
#
# The sys crontab should be used to do performance collection. See cron
# and performance manual pages for details on startup.
#
1,11,21,31,41,51 * * * 1-6 /usr/lib/sa/sa1
# 0 * * * 0-6 /usr/lib/sa/sa1
# 20,40 8-17 * * 1-5 /usr/lib/sa/sa1
# 5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A


Now, we'll let that populate at its regular intervals. We don't collect data on Sunday because there's no batch processes running on that day. This will make our scripting a little difficult, but we have ways of getting around it. When working on our script, we're going to want to run a report on the previous [sar] day's data--not today's. Now that we've got everything in place, let's work on the solution.

Solution: If it we didn't skip a day, we could just use gnu's date command with yesterday's date. First, let's create a timestamp script that takes care of this:

#!/bin/ksh
#########################################################
# sarstamp.ksh #
# A program to create a timestamp based on the last #
# (previous day) sar file generated #
#########################################################

YEAR=`date +%Y`
MODAY=`ls -ltr /var/adm/sa|tail -2 |head -1| sed -e 's/Jan/01/g;s/Feb/02/g;s/Mar/03/g;s/Apr/04/g;s/May/05/g;s/Jun/06/g;s/Jul/07/g;s/Aug/08/g;s/Sep/09
/g;s/Oct/10/g;s/Nov/11/g;s/Dec/12/g' | awk '{print $6$7}'`

echo ${YEAR}${MODAY}

If we didn't want each sar graph to have the timestamp match its sar-file creation date, we wouldn't have needed the timestamp script. We'll create a gnuplot script called "sar.gpl," and save our files as png files. We'll also want a grid, titles, and times for our x-axis. Using the sar data, we'll plot in this example, the %USR, %IDLE, Freemem, and Freeswap data as an 800x600 png file.

#!/opt/csw/bin/gnuplot -persist
set terminal png size 800,600

set grid

# Set time formats
set xdata time
set timefmt "%H:%M:%S"
set xrange ["00:00:00":"24:00:00"]
set xlabel "Time"

set key left box


set yrange [0:20]
set ylabel "% Usr"
set output "/tmp/sar-`hostname`-`/root/bin/sarstamp.ksh`-usr.png"
plot "/tmp/sar-cpu.dat" using 1:2 title 'Usr' with lines

set yrange [50:100]
set ylabel "% Idle"
set output "/tmp/sar-`hostname`-`/root/bin/sarstamp.ksh`-idle.png"
plot "/tmp/sar-cpu.dat" using 1:5 title 'Idle' with lines

unset yrange
set yrange [*:*]

set ylabel "FreeMem"
set output "/tmp/sar-`hostname`-`/root/bin/sarstamp.ksh`-mem.png"
plot "/tmp/sar-mem.dat" using 1:2 title 'Freemem' with lines

set ylabel "Swap"
set output "/tmp/sar-`hostname`-`/root/bin/sarstamp.ksh`-swap.png"
plot "/tmp/sar-mem.dat" using 1:3 title 'Freeswap' with lines

Now that we've got our sar script created, we'll create a nice wrapper around the whole thing to re-format the sar data to usable output (columned output with data and no text).

#!/bin/ksh
########################################################################
# #
# sar_graphs.ksh #
# #
# Author: Alan T. Landucci-Ruiz #
# #
# Program to generate graphical representation of the previous #
# day's sar data #
# #
########################################################################
HOST=`/bin/hostname`
MAILTO="myname@host.com"

SARFILE=`ls -1tr /var/adm/sa | tail -2 | head -1`

sar -f /var/adm/sa/${SARFILE} | egrep -v '[a-z]|^$' > /tmp/sar-cpu.dat
sar -r -f /var/adm/sa/${SARFILE} | egrep -v '[a-z]|^$'> /tmp/sar-mem.dat

/opt/csw/bin/gnuplot sar.gpl

rm /tmp/sar-cpu.dat
rm /tmp/sar-mem.dat

Easy enough. Now we can copy our sar data from /tmp to any location on our web server. Alternatively, we can have our web-server's location NFS-mounted and save directly. Later on, we'll use this same tool to plot drive usage.

Friday, March 13, 2009

Patch Reporting Tool for Solaris

Problem: Our recent security assessment found that no patch monitoring or patch notification system was in place. The Security Team is pressuring us to come up with a solution. The solution must include the following: (1) patch notification, (2) patch monitoring, and, if possible (3) patch deployment. Sun's xVM Ops Center would be perfect for this, so I asked for budget restrictions. Our manager said it would be best to fit this just below $0.*

Analysis: We already have a Sun support contract, which is good because it gives us access to patches from sunsolve.sun.com. The Linux support contract was not renewed, which is bad, because it disallows us access to the up2date repositories. We'll concentrate on the Sun area for now. Let's look into our current software implementations and see if there's anything available that can help us.

We already implement Webmin, but there doesn't look to be any Patch Monitoring or Notification modules available. Checking Google doesn't help; I just find patches for Webmin. We also have Altiris, but it's bulky and has a very steep learning curve. As a last step, we'll try Daddy Google.

The first page that comes up is from Sun. Under the section "Intelligent Patch Management," we see some reference to scripting.


Three options for patches:
1. Individual download from SunSolve - login required
2. Automatic system updates - Activate the "Patch Update Manager" feature in Solaris 10 with a valid Solaris Subscription
3. Life Cycle Management - Sun xVM Ops Center has an intelligent patch management tool for Solaris and Linux.

Sun Update Tools
Knowledge-based software update services for Solaris and Linux
Free Scripted Patch Tools

Auxiliary Files

Great! It says there are three options. Option 1 is a decent option for downloading patches, but it doesn't really help me know which patches I need. Option 2 isn't a viable option for our servers, as we don't want to automatically update our production [Oracle] servers, but it may be a good solution for our workstations. Option 3 is ideal because it also updates Linux, but it's not affordable within our budget.

Looking on... "Enterprise" usually implies $$$, so we're not going to look at that. We register our inventory, so I'll spend a little time looking at that. The GUI is java-based, which seems slow and buggy. I can run individual reports, but I can't find a way to automate this or really tell me what patches I need.

"Free Scripted Patch Tools" sounds like the perfect place for me. The link sends me to a page that only demonstrates how to use wget to download patches, not what patches I need. Oddly, it tells you how to use blastwave's wget, but if you have Solaris 10, you already have wget in /usr/sfw/bin.

Patchdiag.xref shows me what patches I need, but I still need a tool that will cross-reference this with my running systems and let me know which patches I need. Well, because Patchdiag.xref is a cross-reference file for patchdiag, let's see if Sun's Patchdiag tool is still available. Patchdiag is sitll available and still free, but its report is ugly. Despite its flaws, this looks like the place to start.

Testing: To automate our process, we need to check if the patchdiag.xref is publicly available and accessible from our systems. Using what we learned about wget, we use wget to download the patchdiag.xref file.

Also, we want to have a cached copy of a patchdiag.xref. I created an /opt/SUNWpatchdiag directory and subdirectories bin/, doc/, man/ and etc/ for the patchdiag.xref. We install our Patchdiag files appropriately.

A simple run of 'patchdiag -h' to look at the options shows that we can use a different copy of the patchdiag.xref file using the -x option, and -l for a more detailed (long) listing; however, both seem to give me the extent of detail that I want. We want the -x option so that we can download the latest patchdiag.xref from Sunsolve and use it. We can also do this on a remote host, but have to specify the file containing its "showrev -p" output. We'll avoid this option for now, but keep it in mind for later. The drawback for this is that you can't get the long output.

Conclusion: Using this and what I know of awk, I can hack together a simple script that allows me to reformat the output to either csv, text, or html. Now that I have this script, I can put it into cron and have it mail me the server reports periodically so I can know if they are up to date. The next step is to add it to Webmin so that the Security Team has access to run a report on-the-fly. Webmin uses perl, and I suddenly realize that I should have written the script in perl to begin with! I have a new project.

*Story of my life.

Wednesday, March 4, 2009

Oracle I/O Tuning

Problem: Oracle I/O is complaining of low performance and recommends tuning the vol_maxio parameters.


Analysis: BigAdmin has a good article on tuning your I/O parameters. One that I'm concerned with is the vol_maxio. According to the article, "VERITAS recommends that this tunable not exceed 20 percent of kernel memory or physical memory (whichever is smaller), and that you match this tunable to the size of your widest stripe." We are currently using concatenated filesystems instead of stripes (I'm unsure why; Symantec helped us set it up that way).

This forum article from sunmanagers.org also has reference to the stripe width, but says that Oracle has a limit of 1M the maximum IO size for release 8i.

If I check the maxio with adb:

# adb -k /dev/ksyms /dev/mem
maxphys/D
physmem 1fb99f
maxphys/D
maxphys:
maxphys: 131072
vol_maxio/D
vol_maxio:
vol_maxio: 2048

We find that it's already set to 1M (2048).

Due to the limits of the maxio specified by Veritas and Oracle, it doesn't really look like playing with the maxio number will buy us much, if we've already optimized the volume. So, let's take a look at the volume.

Almost all the articles I've researched stated benefits with using Direct I/O, but Veritas uses, by default, Quick I/O (ref manpage for mount_vxfs). Miracle Benelux has a good article when describing other limitations when specifying direct I/O, but there's no clarity as to the differences between direct I/O, Quick I/O, or Concurrent I/O (cio). Blog O' Matty's article references that direct I/O was enabled on his system by the following parameters: "mincache=direct” and “convosync=direct," but there is no reference to CIO or QIO.

Let's recap: I started out looking at maxio because Oracle said we're not getting the I/O they expected. From there, I was limited by Oracle and Veritas, so I checked the volume; many found benefits with using Direct I/O, and we're using Quick I/O. In addition to changing this, I also need to check the Oracle Buffer Cache.


Testing:
Now, let's check with the DBAs. According to them, Oracle allows larger I/O sizes than 1M and that the max, 32M should be sufficient enough for Oracle. We set this appropriately on one of our test machines' /etc/system and rebooted:

* Increase the maximum value of IO for Oracle tuning
set vxio:vol_maxio=65535

We moved a service group over to the node after rebooting and the DBAs tested it successfully with positive results.

I tried to remove the QIO by adding "noqio" to the mount options, but Oracle didn't like it. He did, however, enjoy it when I added the "mincache=direct” and “convosync=direct" options.


Conclusion: Oracle I/O was optimized on a VXFS by setting vxio:vol_maxio to 65535 in /etc/system; it was further tuned by adding the "mincache=direct,convosync=direct" option to the mounts.