Friday, November 21, 2014

Songs about Deduplication

Songs about Deduplication

(or: A Study of Pompeii)


         One of my favorite songs out on the radio right now is by Bastille, called Pompeii.  I love this song because it's the first song of its kind!  It's a song about deduplication and how important it is in today's technological world.  The song even starts out with the background singers singing, "we dedupe! We dedupe!"

Let's take a look at the first couple of stanzas:
I was left to my own devices.
Many days fell away with nothing to show.
And the walls kept tumbling down
In the city that we love
Great clouds roll over the hills
Bringing darkness from above

          It's obvious he's talking about the storage where he keeps his data (several devices), and he lost many days of data, due to corruption ("walls [...] tumbling down").  The impending clouds represent the upper-level management about to rain down on him, telling him he needs to restore from backup.  And, as if by magic, he beautifully illustrates therecovery from a deduplication server--possibly an appliance:
But if you close your eyes,
Does it almost feel like
Nothing changed at all?
And if you close your eyes,
Does it almost feel like
You've been here before?
How am I gonna be an optimist about this?
How am I gonna be an optimist about this?

         With this, we can see how deduplication benefits us, when we compare the differences of the backups, the change is so minimal, that it feels like nothing's changed at all (except for those last little things that did change, of course).  His question for optimisim is poetically answered in its repetition: we're subtly told to back up our stuff on a reliable deduplication system.
We were caught up and lost in all of our vices
In your pose as the dust settled around us

         Stunned in silence, we realize that we were worried for nothing.  Finally, we realize an important truth: we still have to search among the wreckage and figure out what caused the corruption in the first place. 
Oh where do we begin?
The rubble or our sins?
Oh oh where do we begin?
The rubble or our sins?

         Do we learn from our experience, or do we just continue on and hope we don't lose everything again?  We're okay, because we have a deduplication system that works.





Editor's note:  Some of you may argue that this is incorrect, that this is not what the song is about, but I make the counterpoint: what's protecting your data?
Lyrics thanks to azlyrics: http://www.azlyrics.com/lyrics/bastille/pompeii.html

Monday, July 30, 2012

LDAP on Solaris

One of the things I've been working on for the past few years, off and on, is getting LDAP working on Solaris.  Now, I can get an LDAP server to run, and I can get an LDAP client to run on Solaris, but when it comes to PAM authentication, it's like it just doesn't seem to want to do it.

So here's my suggestion if you're working on this: don't.  It's a bit of a pain and a huge learning curve.  If your company doesn't have a need for the LDAP, then I would recommend keeping whatever system you have in place (NIS), or if you want something more secure, use NIS+.

If you're Linux only, you may be well ahead of the curve.

Wednesday, October 19, 2011

NetBackup Scheduling Timelines - Part 2

Last time, we looked at how to create our drive usage using dummy data. Now, we need to get some real data. To start with, I've installed OpsCenter and I created a custom report based on the Tabular Backup Report:



I customized the report by clicking on "Edit Report" and changing the filters so I only have the policies and clients that I know use drives. I set the range to be the past 60 days.

Once done, I saved the report to a custom report in my Public Reports folder, so I can access it easily whenever I want to generate the data. To export the data, I click on the export button and export it to a CSV. I can then open the CSV and remove the first and last lines (headers and footer) of the CSV. We're now ready to process it. Now that we have the real data, there's an obvious problem: our time isn't in military time. There's also a not-so-obvious problem: our dates have commas in them and we've made our commas be our separator. If we just split on the commas, we'll get some extra fields. First, let's take care of the comma problem. All the commas are within double quotes. If we split the line by double quotes, the even-numbered fields will be the values between double quotes. We can operate on each of those to remove the commas, then recreate the line without the commas inside the quotes. The code looks something like this:

sub remove_commas_from_fields
{
my $LINE=$_[0];
my $RETURNLINE="";
my $LINDEX=2;
my @SLINE=split ("\"",$LINE);
for ($LINDEX=1;$LINDEX<$#SLINE;$LINDEX+=2)
{
$SLINE[$LINDEX] =~ s/,//g;
}

foreach $LINE (@SLINE)
{
$RETURNLINE .= "$LINE";
}
return $RETURNLINE;
}


Note that this also removes all commas from the numeric values (730,520,210 becomes 730520210) and removes all double quotes. We then add this preprocessor function to our main program, just after grabbing the current line:

# We grab the first line of the file and extract the values.
$CURR_LINE=;
$CURR_LINE=remove_commas_from_fields($CURR_LINE);
...
$CURR_LINE=;
$CURR_LINE=remove_commas_from_fields($CURR_LINE);
($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) =
split(',',$CURR_LINE);

Now that takes care of the first problem. Let's take a look at the second:time is in the wrong format. I have two options. I can either rewrite myround_to_incr program to adjust for the change or I can make the change beforecalling the round_to_incr. The first is fairly simple, but causes me to haveto rewrite a subroutine I already know is working well. Within this subroutine, I'll also lose any date data, as the subroutine only accounts fortime. The second will allow me to preserve the integrity of the round_to_incr subroutine and I can save the full data by moving it to another field. Thus, I write another subroutine to rewrite the date/time field:

sub convert_time
{
my $HHMM=$_[0];
my @HHMMARRAY = split('[ :]',$HHMM);
my $INDEX0 = $#HHMMARRAY;
my $INDEX1 = $#HHMMARRAY - 1;
my $INDEX2 = $#HHMMARRAY - 2;
my $AMPM = $HHMMARRAY[$INDEX0];
my $MIN = $HHMMARRAY[$INDEX1];
my $HOUR = $HHMMARRAY[$INDEX2];

if ( $AMPM eq "PM")
{
$HOUR += 12;
}
my $RETURNVAL = "$HOUR" . "$MIN";
}

sub convert_date
{
my $LINE=$_[0];
my @SITEM;
my $RETURNLINE="";
my $LITEM;
my @SDATE;

my @SLINE=split(",",$LINE);

my @SDATE = split(" ",$SLINE[0]);
push(@SLINE,"$SDATE[0] $SDATE[1] $SDATE[2]"); # Preserve the first field; append it to the list.
my @SDATE = split(" ",$SLINE[1]);
push(@SLINE,"$SDATE[0] $SDATE[1] $SDATE[2]"); # Preserve the second field; append it to the list.

$SLINE[0] = convert_time($SLINE[0]);
$SLINE[1] = convert_time($SLINE[1]);

foreach $LITEM (@SLINE)
{
$RETURNLINE .= "$LITEM,";
}
return $RETURNLINE;
}

Note that I also save the date field in "Month Day Year" format. This will allow me to add the date to the output line:

[...]
# We grab the first line of the file and extract the values.
$CURR_LINE=;
$CURR_LINE=remove_commas_from_fields($CURR_LINE);
$CURR_LINE=convert_date($CURR_LINE);
($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) =
split(',',$CURR_LINE);
$DATA_START = round_to_incr($DATA_START); # added 2010-07-02
$DATA_END = round_to_incr($DATA_END); # added 2010-07-02
$DATA_STARTDATE = (split(',',$CURR_LINE))[18];
[...]
# Second, we need to check if our data line's start time matches
# the current time.
# If it is, we claim a drive, and go to the next line.
# If it isn't, we print the output and increment the time.
if ( $TIME == $DATA_START )
{
$NEXT=nextempty;
$DRIVE[$NEXT] = $DATA_CLIENT;
$DRIVE_START[$NEXT] = $DATA_START;
$DRIVE_END[$NEXT] = $DATA_END;
$CURR_LINE=;
$CURR_LINE=remove_commas_from_fields($CURR_LINE);
$CURR_LINE=convert_date($CURR_LINE);
($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) =
split(',',$CURR_LINE);
$DATA_START = round_to_incr($DATA_START); # added
2010-07-02
$DATA_END = round_to_incr($DATA_END); # added 2010-07-02
$DATA_STARTDATE = (split(',',$CURR_LINE))[18];
[...]


Making the above changes, our final program looks like this:
But we can't just run this against our export. Remember: we still have some parent policies and policies that didn't back up. We want to skip these. And looking at the data, it looks like we have some "Image Cleanup" policies running that didn't use a drive. I can write a quick awk command to only grab the lines that backed up data and put them into a new CSV file:

$ head -10 OpsCenter_Use_With_Generate_Drive_Usage_17_10_2011_03_13_PM.csv
"Aug 18, 2011 3:22 PM","Aug 18, 2011 3:30 PM",helga,Linux_OS,0,0,48432,-,Differential Incremental,snert,snert,Backup,1, ,0,Successful,0,
"Aug 18, 2011 3:22 PM","Aug 18, 2011 3:30 PM",helga,Linux_OS,"73,661",281,48433,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,90.094,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",hamlet,Linux_OS,0,0,48434,-,Differential Incremental,snert,snert,Backup,1, ,0,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:09 PM",helga,Linux_OS,0,0,48440,-,Differential Incremental,snert,snert,Backup,1, ,0,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",hagar,Linux_OS,0,0,48436,-,Differential Incremental,snert,snert,Backup,1, ,0,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:05 PM",honi,Linux_OS,0,0,48439,-,Differential Incremental,snert,snert,Backup,1, ,0,Partially Successful,1,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",luckyeddie,Linux_OS,0,0,48435,-,Differential Incremental,snert,snert,Backup,1, ,0,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",hamlet,Linux_OS,"19,613","1,916",48441,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,153.062,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",luckyeddie,Linux_OS,"22,919","4,950",48442,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,530,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",hagar,Linux_OS,"17,261","1,293",48443,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,45.156,Successful,0,
/tmp $ awk -F\, '/Backup|Restore|Duplicate/{ if ($7 != 0) {print;} }' OpsCenter_Use_With_Generate_Drive_Usage_17_10_2011_03_13_PM.csv > NetBackup_Export.csv
/tmp $ head -10 NetBackup_Export.csv
"Aug 18, 2011 3:22 PM","Aug 18, 2011 3:30 PM",helga,Linux_OS,"73,661",281,48433,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,90.094,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",hamlet,Linux_OS,"19,613","1,916",48441,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,153.062,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",luckyeddie,Linux_OS,"22,919","4,950",48442,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,530,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:04 PM",hagar,Linux_OS,"17,261","1,293",48443,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,45.156,Successful,0,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:05 PM",honi,Linux_OS,"75,343","1,047",48445,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,130.062,Partially Successful,1,
"Aug 18, 2011 6:00 PM","Aug 18, 2011 6:09 PM",helga,Linux_OS,"73,519",266,48446,-,Differential Incremental,snert,snert,Backup,1,Daily_Incremental,86.781,Successful,0,
"Aug 18, 2011 9:00 PM","Aug 18, 2011 9:27 PM",zook,Exchange_t-z,"31,900","2,106",48449,-,Differential Incremental,snert,zook,Backup,1,Exchange-Mailbox,"3,103.531",Partially Successful,1,
"Aug 18, 2011 9:19 PM","Aug 18, 2011 9:21 PM",hernia-1,Windows_OS-File-RootDrives,"7,031","2,054",48452,-,Differential Incremental,snert,hernia-1,Backup,1,Daily_Differential,106.635,Successful,0,
"Aug 18, 2011 9:19 PM","Aug 18, 2011 9:24 PM",hernia-1,Windows_OS-File-RootDrives,"8,662","1,565",48453,-,Differential Incremental,snert,hernia-1,Backup,1,Daily_Differential,272.293,Successful,0,
"Aug 18, 2011 9:19 PM","Aug 18, 2011 9:24 PM",hernia-2,Windows_OS-File-RootDrives,"7,108","1,600",48455,-,Differential Incremental,snert,hernia-1,Backup,1,Daily_Differential,187.156,Successful,0,
/tmp $ ./generate_drive_usage.pl | head -100
Aug 18 2011,0,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,30,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,100,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,130,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,200,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,230,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,300,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,330,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,400,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,430,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,500,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,530,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,600,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,630,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,700,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,730,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,800,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,830,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,900,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,930,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1000,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1030,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1100,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1130,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1200,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1230,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1300,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1330,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1400,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1430,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1500,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1530,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1600,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1630,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1700,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1730,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1800,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1830,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1900,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,1930,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,2000,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,2030,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,2100,zook,,,,,,,,,,,,,,,,,,,
Aug 18 2011,2130,,,,,,,,,,,,,,,,,,,,
Aug 18 2011,2200,giggles04,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 18 2011,2230,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,2300,hernia,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,2330,hernia,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,0,hernia,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,30,hernia,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,100,hernia,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,130,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,200,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,230,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,300,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,330,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,400,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,430,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,500,,giggles04,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,530,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,600,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,630,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,700,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,730,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,800,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,830,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,900,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,930,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,1000,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,1030,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,1100,,,giggles05,,,,,,,,,,,,,,,,,
Aug 19 2011,1130,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1200,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1230,prod2p,prod1p,prod1p,prod2p,,,,,,,,,,,,,,,,
Aug 19 2011,1300,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1330,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1400,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1430,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1500,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1530,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1600,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1630,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1700,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1730,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1800,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1830,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1900,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,1930,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2000,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2030,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2100,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2130,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2200,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2230,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2300,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,2330,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,0,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,30,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,100,,,,,,,,,,,,,,,,,,,,
Aug 19 2011,130,fearless,lutep,,,,,,,,,,,,,,,,,,
/tmp $


So there it is. I hope this helps, especially if you have tape drives in your environment. At the very least, it should give you a nice spreadsheet you can give to your boss and say, "Here's what ourtape drives are doing all day." You should be able to find the new script and sample data in the Downloads section of Symantec Connect. The script has one bug in that it doesn't end cleanly (it keeps trying to parse), so you will have to 'CTRL-C' out of it when it's reached the end. Here's the full script if you'd rather copy/paste it:

#!/bin/perl
#####################################################################################
# #
# generate_drive_usage.pl() #
# #
# Written By: Alan T. Landucci-Ruiz #
# http://solarisdeveloper.blogspot.com #
# #
# Abstract: This program generates CSV output of Tape drive usage, based on CSV #
# input. It is designed to help facilitate scheduling of tape drive #
# allocations when creating and moving backup schedules. #
# #
# #
# Args: none #
# #
# Variables: #
# $TIME - The time component of our output CSV. #
# $TINC - The increment component of our CSV. #
# $N_DRIVES - The number of drives we have. #
# @DRIVE - Our "DRIVE" array: holds the string of allocation. #
# @DRIVE_START - Time that the drive is allocated. #
# @DRIVE_END - Time that the drive is unallocated. #
# $INFILE - The input file csv. #
# $CURR_LINE - The line being processed from the input CSV. #
# $NEXT - Index of our next empty drive. #
# $IDX - Index counter. #
# $DATA_CLIENT - Client that is allocating the drive. #
# $DATA_START - Start time for the client. #
# $DATA_END - End time for the client. #
# $DATA_POLICY - Policy of the client that is allocating the drive. #
# $LAST_END_TIME - The latest end time of all drives. #
# $DATA_STARTDATE- The date/time pair of the start of the backup #
# #
# Known Issues: #
# Currently, if a last end time is the next day's time, but earlier than the #
# currently known last end time, then it will use the currently known last end #
# time instead of the earlier one the next day. #
# e.g., 0200 tomorrow will be considered earlier than 1400 today. #
# #
#####################################################################################
use strict;

# Global variables
my $TIME=0000;
my $TINC=30;
my $N_DRIVES=20;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
my $IDX;
my $DATA_CLIENT;
my $DATA_START;
my $DATA_END;
my $DATA_POLICY;
my $LAST_END_TIME=-1;
my $DATA_STARTDATE;

#####################################################################################
# init_drives() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, blank out the indexed drive (unallocating it), and set start #
# and end times to '-1'. #
#####################################################################################
sub init_drives()
{
foreach $IDX (0..$N_DRIVES-1)
{
$DRIVE[$IDX] = "";
$DRIVE_START[$IDX] = -1;
$DRIVE_END[$IDX] = -1;
}
}

#####################################################################################
# nextempty() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, if the indexed drive is blank (i.e., empty), return that #
# that number. #
#####################################################################################
sub nextempty
{
foreach $IDX (0..$N_DRIVES-1)
{
if ( $DRIVE[$IDX] eq "" ) { return $IDX; }
}
}

#####################################################################################
# round_to_incr() #
# Rounds the argument to the next '$TINC' minute mark off the hour (up or down) #
# #
# Args: #
# $time - This is the time that needs to be rounded in HHMM format. #
# #
# Pseudocode: #
# - Grab the minutes by getting the modulus of $time and 100 #
# e.g. $time = 1422 --> 1400 = (1422 % 100) = 22 #
# - Grab the hour by subtracting the minutes from the time. #
# 1400 = 1422 - (22) #
# - Find out how close we are to the $TINC minute mark by creating our $rem #
# variable: our minutes modulus $TINC (e.g., above, 22 %30 = 22). #
# - If our remainder is less than 15, we round down, by subtracting the remainder #
# of our modulus from the actual minutes. #
# e.g. $minutes = 44 --> 44 - (44 % $TINC) = 44 - 14 = $TINC. #
# - Otherwise, we round up by adding the difference of ther remainder and $TINC. #
# $minutes = 22 --> 22 + 30 - (22 % 30) = 22 + 30 - 22 = 30 #
# $minutes = 48 --> 48 + 30 - (48 % 30) = 48 + 30 - 18 = 60 #
# - If our minutes component is less than 60 (less than 1 hour), we add that to #
# our hour component. Otherwise, we just ad 100 to the hour component to get #
# the rounded time. #
# - If our time has exceeded or is at 2400, we subtract 2400. #
# #
#####################################################################################
sub round_to_incr
{
my ( $time ) = @_;

my $minutes = $time % 100;
my $hour = $time - $minutes;

my $rem = $minutes % $TINC;

if ( $rem < 15 ) { $minutes = $minutes - $rem; }
else { $minutes = $minutes + $TINC - $rem; }

if ( $minutes < 60 ) { $time = $hour + $minutes; }
else { $time = $hour + 100; }

if ($time >= 2400) { $time = $time - 2400 ;}

return $time;
}


#####################################################################################
# print_row() #
# The main printing function of our program; prints the time and drive allocations#
# for each drive. #
# #
# Args: none #
# #
# Pseudocode: #
# - Print the time in HHMM format #
# - Starting with the lowest index (0), #
# For each index, print a comma, then the indexed drive allocation. #
# - Print carriage return. #
#####################################################################################
sub print_row()
{
print "$DATA_STARTDATE,$TIME";
foreach $IDX (0..$N_DRIVES-1)
{
print ",$DRIVE[$IDX]";
}
print "\n";
}

#####################################################################################
# get_last_end_time() #
# Finds the latest end time for the current drive allocations. At this point, #
# does not factor if the end time is the next day. #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# For each index, if the drive end time is greater than the last one we checked, #
# store that value to return. #
# - Return value stored. #
#####################################################################################
sub get_last_end_time()
{
my $RETURNVAL=0;

foreach $IDX (0..$N_DRIVES-1)
{
if ( $DRIVE_END[$IDX] > $RETURNVAL ) { $RETURNVAL=$DRIVE_END[$IDX]; }
}
return $RETURNVAL;
}

#####################################################################################
# print_debug() #
# A simple subroutine to print out the current variables. #
# #
# Args: none #
# #
#####################################################################################
sub print_debug()
{
print "TIME: $TIME\n";
print "DATA_START: $DATA_START\n";
print "DATA_END: $DATA_END\n";
print "DATA_CLIENT: $DATA_CLIENT\n";
print "LAST END TIME: $LAST_END_TIME\n";
print "CURR_LINE: $CURR_LINE\n";
}

#####################################################################################
# remove_commas_from_fields() #
# This subroutine removes commas and double-quotes from fields that have commas #
# within double quotes. #
# #
# Args: Comma-delineated line (data) #
# #
#####################################################################################
sub remove_commas_from_fields
{
my $LINE=$_[0];
my $RETURNLINE;
my $LINDEX=2;
my @SLINE=split ("\"",$LINE);
for ($LINDEX=1;$LINDEX<$#SLINE;$LINDEX+=2)
{
$SLINE[$LINDEX] =~ s/,//g;
}

foreach $LINE (@SLINE)
{
$RETURNLINE .= "$LINE";
}
return $RETURNLINE;
}


#####################################################################################
# convert_time() #
# This subroutine converts standard time format into military time. #
# #
# Args: "HH:MM AM/PM" time string #
# Returns: "HHMM" military time string #
# #
#####################################################################################
sub convert_time
{
my $HHMM=$_[0];
my @HHMMARRAY = split('[ :]',$HHMM);
my $INDEX0 = $#HHMMARRAY;
my $INDEX1 = $#HHMMARRAY - 1;
my $INDEX2 = $#HHMMARRAY - 2;
my $AMPM = $HHMMARRAY[$INDEX0];
my $MIN = $HHMMARRAY[$INDEX1];
my $HOUR = $HHMMARRAY[$INDEX2];

if ( $AMPM eq "PM")
{
$HOUR += 12;
}
my $RETURNVAL = "$HOUR" . "$MIN";
}

#####################################################################################
# convert_date() #
# This subroutine converts the first two date/time strings of a comma-delineated #
# string into military time, and appends the date to the end. #
# #
# Args: Comma-delineated string with the first and second field having a string #
# date/time format of "MONTH DAY YEAR HH:MM AM/PM" #
# Returns: Comma-delineated string with the first and second field having a string #
# time format of "HHMM" in military time, with the date in the format #
# "MONTH DAY YEAR" appended to the end. #
# #
#####################################################################################
sub convert_date
{
my $LINE=$_[0];
my @SITEM;
my $RETURNLINE="";
my $LITEM;
my @SDATE;

my @SLINE=split(",",$LINE);

my @SDATE = split(" ",$SLINE[0]);
push(@SLINE,"$SDATE[0] $SDATE[1] $SDATE[2]"); # Preserve the first field; append it to the list.
my @SDATE = split(" ",$SLINE[1]);
push(@SLINE,"$SDATE[0] $SDATE[1] $SDATE[2]"); # Preserve the second field; append it to the list.

$SLINE[0] = convert_time($SLINE[0]);
$SLINE[1] = convert_time($SLINE[1]);

foreach $LITEM (@SLINE)
{
$RETURNLINE .= "$LITEM,";
}
return $RETURNLINE;
}


#####################################################################################
# main() #
# The main body of the program. #
# #
# Args: none #
# #
# Pseudocode: #
# - Initialize the drive variables (call sub init_drives). #
# - Open the $INFILE for reading. #
# - Read in the first line. #
# - Split that line into our fields: #
# $DATA_START $DATA_END, $DATA_CLIENT, $DATA_POLICY #
# - Start processing within a do-while-loop: #
# - For each index of drives, if the drive is claimed and the drive end time #
# is at the current time, then we release the drive (reset indexed variables).#
# - If the current time is the same as the start time from the line we just got,#
# then claim the drive, by storing the client name, start, and end times. #
# We also grab the next line from our input file $INFILE. #
# Otherwise, we print out our row and increment the time by $TINC minutes. #
# - We find out when our latest end time for allocation is and we store it. #
# - processing ends when (1) we've reached the end of the file, and (2) the time #
# has passed the latest end time. #
# - We close our input file. #
# #
#####################################################################################
sub main()
{

init_drives;

# Things we need before we can start processing:
# 1. Current Time "" got it (above)
# 2. Current Line "" Need to start on first line

my $FD=open(FD,"<$INFILE") or die "Cannot open: $!";

# We grab the first line of the file and extract the values.
$CURR_LINE=;
$CURR_LINE=remove_commas_from_fields($CURR_LINE);
$CURR_LINE=convert_date($CURR_LINE);
($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);
$DATA_START = round_to_incr($DATA_START); # added 2010-07-02
$DATA_END = round_to_incr($DATA_END); # added 2010-07-02
$DATA_STARTDATE = (split(',',$CURR_LINE))[18];

# Now we can start our processing. We do this until the end of the file.
do
{
# First, we'll check to see if each drive is claimed and if it is, we check if
# it's reached its end time. If so, we release that drive.
foreach $IDX (0..$N_DRIVES-1)
{
if ( $DRIVE[$IDX] ne "" && $DRIVE_END[$IDX] == $TIME )
{
# To release the drive, we reset all values.
$DRIVE[$IDX] = "";
$DRIVE_END[$IDX] = -1;
$DRIVE_START[$IDX] = -1;
}
}

# Second, we need to check if our data line's start time matches
# the current time.
# If it is, we claim a drive, and go to the next line.
# If it isn't, we print the output and increment the time.
if ( $TIME == $DATA_START )
{
$NEXT=nextempty;
$DRIVE[$NEXT] = $DATA_CLIENT;
$DRIVE_START[$NEXT] = $DATA_START;
$DRIVE_END[$NEXT] = $DATA_END;
$CURR_LINE=;
$CURR_LINE=remove_commas_from_fields($CURR_LINE);
$CURR_LINE = convert_date($CURR_LINE);
($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);
$DATA_START = round_to_incr($DATA_START); # added 2010-07-02
$DATA_END = round_to_incr($DATA_END); # added 2010-07-02
$DATA_STARTDATE = (split(',',$CURR_LINE))[18];
} else
{
print_row();
$TIME = round_to_incr($TIME + $TINC);
}

$LAST_END_TIME = round_to_incr(get_last_end_time());

} until ( eof(FD) && $TIME > $LAST_END_TIME && $CURR_LINE eq "" );
# We want to stop processing after (a) we've reached the end of the file, and (b)
# and (b) we've gone past the last end time.

close(FD);
}

main;

Friday, September 10, 2010

More SnOracle

I'll be brief, but here are the bullet points:

1. Oracle is dropping OpenSolaris
  Granted, it will be replaced with Solaris 11 Express (including a Free RTU developers license).

2. Oracle is suing Google for using Java-related technology (Seriously?)

3. Oracle is going to be charging for ZFS.

  So far, this is only hearsay.  I haven't seen any articles detailing
such.  I'm going to venture most likely that they will charge for ZFS
to be used on any operating system except Solaris, in such case ZFS
will be provided free of charge.

Thursday, July 1, 2010

Charting Drive Usage from Backups


PROBLEM


One of the biggest problems with the free reporting tools that are available for NetBackup is that they don't have visual tools for schedule planning. I've done charts manually by taking the start times and end times and made a spreadsheet, but there should be a way to do this automatically from a given set of data. That's what I'm going to attempt to do here.



ANALYSIS


First, I'll start with some sample data. I know I want my start time and end time and I'll want to know either the name of the policy or the name of the client. I'll include both in my sample data, but I'll just need one in my report.
I'll also exclude backups that have 0 bytes associated to them. This is often the parent job or is a duplicate. Alternatively, I can search to see if the job has a parent. These exclusions/inclusions will be done from the data aggregation side and will not be covered here. To create my sample data, I'll start by creating what I want my chart to look like, then create the data from it.



Date/Time

Drive 1

Drive 2

Drive 3

Drive 4

Drive 5

Drive 6

0000







0030







0100

hagar






0130

hagar

honi

snert




0200

hagar

helga

snert




0230


helga

snert




0300

hamlet


snert




0330

hamlet


snert




0400

hamlet


snert




0430

hamlet






0500

hamlet

kvack





0530

hamlet

kvack





0600







0630

hernia






0700









From this table, I can see that hagar's backup started at 1:00 and ended by 2:30. In between that time, Honi's backup started at 01:30 and ended at 2:00, where helga's backup ran until 3:00. Snert's backup started the same time as honi's, but ended much later, at 4:30. hamlet had a backup start at 3:00 and end at 6:00. Kvack ran from 5:00 until 6:00. After 30 minutes of no backups, hernia's backup started at 6:30. Using the Start Time as my sort key for the data, the following CSV would be an appropriate set of data:


0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux


So the format I have for my CSV is “Start Time”, “End Time”, “client”, “policy”.


From Table 1, I can see that I want Time to be the major axis in gathering data. Time will increment as I sample the data. Notice, too, that two clients start their backup at the same time, so I will have to factor that in, as well.

Now, let's break down Table 1, comparing to my data set and see what's really going on. I'm starting out with a time set at midnight (0000), and I increment that by 30 minutes for each table row. I'm also going to say that my “current data line” is the first line of my data set. Since I sorted my data by the start time, I compare the start time of the current data line. If I haven't yet reached the start time, I increment my table row time. I do this until I find I've matched. When I've reached the start time of my current line, I “claim” the first available drive.

At the same time, I'm also looking for end times of “claimed” drives. But I realize that I should do this first, so that if a drive is “released” at 0300, that same drive can now be used at 0300. This is only a minor preference and can sometimes be erroneous, as I are rounding to the nearest half-hour. Because of this rounding, I will have to be careful not to omit small backups that ran for less than 15 minutes (0200-0210 would be listed as 0200-0200 and would cancel out), but I'll get to that later. In summary, my pseudo-code would look like this:


Time starts at 0000

Grab current data line

for each drive, if drive is claimed, if end time for claimed drive is current time, release drive.

If current data line's start time is current time, claim next unused drive; mark start time for the unused drive.

If current time has not exceeded current data line's start time yet, increment the time

Otherwise, I'll go to the next data line.


Now, Let's see how this pseudo-code stacks up to my sample data:


Time starts at 0000.

Grab current data line (0100, 0230, hagar, hagar-windows-full)

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0000 = 0100? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0000 <>


0000









Current data line: 0100, 0230, hagar, hagar-windows-full

Current Time: 0030

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0030 = 0100? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0030 <>


0000







0030









Current data line: 0100, 0230, hagar, hagar-windows-full

Current Time: 0100

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0100 = 0100? YES!), Claim drive #1.

If current time has not exceeded current data line's start time, increment the time:

is 0100 <>


0000







0030









Current data line: 0130, 0200, honi, honi-data

Current Time: 0100

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0100 = 0130? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0100 <>


0000







0030







0100

hagar








Current data line: 0130, 0200, honi, honi-data

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0130? YES!), claim next available drive (#2).

If current time has not exceeded current data line's start time, increment the time:

is 0130 <>


0000







0030







0100

hagar








Current data line: 0130, 0430, snert, snert-database-only

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? No.

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0130? YES!), claim next available drive (#3), with end time of 0430.

If current time has not exceeded current data line's start time, increment the time:

is 0130 <>



0000







0030







0100

hagar








Current data line: 0200, 0300, helga, helga-system-files

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? No.

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0200? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0130 <>


0000







0030







0100

hagar






0130

hagar

honi

snert






Current data line: 0200, 0300, helga, helga-system-files

Current Time: 0200

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? Yes! Release this drive

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0200 = 0200? YES!), claim next unclaimed drive (#2), with new end time of 0300

If current time has not exceeded current data line's start time, increment the time:

is 0200 <>


0000







0030







0100

hagar






0130

hagar

honi

snert






Current data line: 0300, 0400, hamlet, hamlet-windows-full

Current Time: 0200

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0300) yet? No.

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0200 = 0300? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0200 <>


0000







0030







0100

hagar






0130

hagar

honi

snert




0200

hagar

helga

snert






Note that I run into a problem when I go to the next day. my clock only goes to 2330. The next day is 0000. What happens if my time is 2200 and the current data's start time is 0100? Because I'm testing with a less than (2200 <>




SOLUTION


The above output is fine, as it's generated logically in my brain and just typed on here. but when are I creating the output? Ill, because each line is a function of the time, I output every time the time is incremented. So, from my pseudo-code, let's write a simple perl program to do what I've been doing by hand:


#!/bin/perl
use strict;


# Global variables
my $TIME=0000;
my $TINC=0030;
my $N_DRIVES=8;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;


# First, I want to initialize all my drive usages:
# client, start time, and end time.
sub init_drives()
{

local $IDX;

foreach $IDX (0..$N_DRIVES-1)

{

$DRIVE[$IDX] = “”;

$DRIVE_START[$IDX] = -1;

$DRIVE_END[$IDX] = -1;

}
}


sub nextempty
{

local $IDX;

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] eq "" ) { return $IDX; }

}
}


sub print_row()
{

print “$TIME”;

foreach $IDX (0..$N_DRIVES-1)

{

print “,$DRIVE[$IDX]”;

}

print “\n”;

}


sub main()
{

my $IDX;


init_drives;

# Things I need before I can start processing:

# 1. Current Time – got it (above)

# 2. Current Line – Need to start on first line

my $FD=open(FD,”<$INFILE”) or die “Cannot open: $!”;


# I grab the first line of the file and extract the values.

$CURR_LINE=;

my ($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) =

split(','$CURR_LINE);

# Now I can start my processing. I do this until the end of the file.

do

{

# First, I'll check to see if each drive is claimed and if it is, I check if

# it's reached its end time. If so, I release that drive.

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] ne “” && $DRIVE_END[$IDX] eq $TIME )

{

# To release the drive, I reset all values.

$DRIVE[$IDX] = “”;

$DRIVE_END[$IDX] = -1;

$DRIVE_START[$IDX] = -1;

}

}

# Second, I need to check if my data line's start time matches

# the current time. If it is, I claim a drive. If it isn't, I increment

# the time.

if ( $TIME == $DATA_START )

{

$NEXT=nextdrive;

$DRIVE[$NEXT] = $DATA_CLIENT;

$DRIVE_START[$NEXT] = $DATA_START;

$DRIVE_END[$NEXT] = $DATA_END;

} else

{

print_row;

$TIME = $TIME + 30;

}


} until (eof(FD));


close(FD);
}



Of course, if you try to run the code, you'll find that it has a few issues with it, but I'll add a few subroutines and clean it up a bit.. Let's break down the subroutines first, though. I initialize the drives with my init_drives subroutine:


#####################################################################################
# init_drives() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, blank out the indexed drive (unallocating it), and set start #
# and end times to '-1'. #
#####################################################################################
sub init_drives()
{

foreach $IDX (0..$N_DRIVES-1)

{

$DRIVE[$IDX] = "";

$DRIVE_START[$IDX] = -1;

$DRIVE_END[$IDX] = -1;

}
}



Second, I need a subroutine that finds my next empty drive:


#####################################################################################
# nextempty() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, if the indexed drive is blank (i.e., empty), return that #
# that number. #
#####################################################################################
sub nextempty
{

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] eq "" ) { return $IDX; }

}
}



Third, I are incrementing by 30 minutes each time, but I'm using basic math to do it, so 30 + 30 = 60, which isn't what I want. Also, I want to round everything to the closest increment. In this case, I create one subroutine to do both rounding and fixing:


#####################################################################################
# round_to_incr() #
# Rounds the argument to the next '$TINC' minute mark off the hour (up or down) #
# #
# Args: #
# $time - This is the time that needs to be rounded in HHMM format. #
# #
# Pseudocode: #
# - Grab the minutes by getting the modulus of $time and 100 #
# e.g. $time = 1422 --> 1400 = (1422 % 100) = 22 #
# - Grab the hour by subtracting the minutes from the time. #
# 1400 = 1422 - (22) #
# - Find out how close I are to the $TINC minute mark by creating my $rem #
# variable: my minutes modulus $TINC (e.g., above, 22 %30 = 22). #
# - If my remainder is less than 15, I round down, by subtracting the remainder #
# of my modulus from the actual minutes. #
# e.g. $minutes = 44 --> 44 - (44 % $TINC) = 44 - 14 = $TINC. #
# - Otherwise, I round up by adding the difference of ther remainder and $TINC. #
# $minutes = 22 --> 22 + 30 - (22 % 30) = 22 + 30 - 22 = 30 #
# $minutes = 48 --> 48 + 30 - (48 % 30) = 48 + 30 - 18 = 60 #
# - If my minutes component is less than 60 (less than 1 hour), I add that to #
# my hour component. Otherwise, I just ad 100 to the hour component to get #
# the rounded time. #
# - If my time has exceeded or is at 2400, I subtract 2400. #
# #
#####################################################################################
sub round_to_incr
{

my ( $time ) = @_;


my $minutes = $time % 100;

my $hour = $time - $minutes;


my $rem = $minutes % $TINC;


if ( $rem < minutes =" $minutes">

else { $minutes = $minutes + $TINC - $rem; }


if ( $minutes < time =" $hour">

else { $time = $hour + 100; }


if ($time >= 2400) { $time = $time - 2400 ;}


return $time;
}



We also need a subroutine that outputs everything I have in my basic CSV format. Let's call this “print_row”, since I'm printing one row of CSV every time.



#####################################################################################
# print_row() #
# The main printing function of my program; prints the time and drive allocations #
# for each drive. #
# #
# Args: none #
# #
# Pseudocode: #
# - Print the time in HHMM format #
# - Starting with the lowest index (0), #
# For each index, print a comma, then the indexed drive allocation. #
# - Print carriage return. #
#####################################################################################
sub print_row()
{

print "$TIME";

foreach $IDX (0..$N_DRIVES-1)

{

print ",$DRIVE[$IDX]";

}

print "\n";
}



Finally, I'll need to find out what the latest end time is of each allocated drive. That is, what's the latest time that all drives will be released?


#####################################################################################
# get_last_end_time() #
# Finds the latest end time for the current drive allocations. At this point, #
# does not factor if the end time is the next day. #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# For each index, if the drive end time is greater than the last one I checked, #
# store that value to return. #
# - Return value stored. #
#####################################################################################
sub get_last_end_time()
{

my $RETURNVAL=0;


foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE_END[$IDX] > $RETURNVAL ) { $RETURNVAL=$DRIVE_END[$IDX]; }

}

return $RETURNVAL;
}



Now, I've added a debug function to help when things get really hairy. This isn't necessary, but it helps me figure out if my variables are getting updated when they're supposed to, or not getting updated when they're not supposed to.


#####################################################################################
# print_debug() #
# A simple subroutine to print out the current variables. #
# #
# Args: none #
# #
#####################################################################################
sub print_debug()
{

print "TIME: $TIME\n";

print "DATA_START: $DATA_START\n";

print "DATA_END: $DATA_END\n";

print "DATA_CLIENT: $DATA_CLIENT\n";

print "LAST END TIME: $LAST_END_TIME\n";

print "CURR_LINE: $CURR_LINE\n";
}


And finally, my main program, as seen above, but tweaked with the added functions. Note that I've had to add checks for the do..while loop to see if I'm at the end of the file, and my current line is blank. This is because if I reach the end of the file, it will skip over that last line, which is something I definitely don't want.


#####################################################################################
# main() #
# The main body of the program. #
# #
# Args: none #
# #
# Pseudocode: #
# - Initialize the drive variables (call sub init_drives). #
# - Open the $INFILE for reading. #
# - Read in the first line. #
# - Split that line into my fields: #
# $DATA_START $DATA_END, $DATA_CLIENT, $DATA_POLICY #
# - Start processing within a do-while-loop: #
# - For each index of drives, if the drive is claimed and the drive end time #
# is at the current time, then I release the drive (reset indexed variables). #
# - If the current time is the same as the start time from the line I just got, #
# then claim the drive, by storing the client name, start, and end times. #
# I also grab the next line from my input file $INFILE. #
# Otherwise, I print out my row and increment the time by $TINC minutes. #
# - I find out when my latest end time for allocation is and I store it. #
# - processing ends when (1) I've reached the end of the file, and (2) the time #
# has passed the latest end time. #
# - I close my input file. #
# #
#####################################################################################
sub main()
{


init_drives;


# Things I need before I can start processing:

# 1. Current Time "" got it (above)

# 2. Current Line "" Need to start on first line


my $FD=open(FD,"<$INFILE") or die "Cannot open: $!";


# I grab the first line of the file and extract the values.

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);


# Now I can start my processing. I do this until the end of the file.

do

{

# First, I'll check to see if each drive is claimed and if it is, I check if

# it's reached its end time. If so, I release that drive.

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] ne "" && $DRIVE_END[$IDX] == $TIME )

{

# To release the drive, I reset all values.

$DRIVE[$IDX] = "";

$DRIVE_END[$IDX] = -1;

$DRIVE_START[$IDX] = -1;

}

}


# Second, I need to check if my data line's start time matches

# the current time.

# If it is, I claim a drive, and go to the next line.

# If it isn't, I print the output and increment the time.

if ( $TIME == $DATA_START )

{

$NEXT=nextempty;

$DRIVE[$NEXT] = $DATA_CLIENT;

$DRIVE_START[$NEXT] = $DATA_START;

$DRIVE_END[$NEXT] = $DATA_END;

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);

} else

{

print_row();

$TIME = round_to_incr($TIME + $TINC);

}


$LAST_END_TIME = round_to_incr(get_last_end_time());


} until ( eof(FD) && $TIME == $LAST_END_TIME && $CURR_LINE eq "" );

# I want to stop processing after (a) I've reached the end of the file, and (b)

# and (b) I've gone past the last end time.

print_row();


close(FD);
}



And finally, I need my variables:


# Global variables
my $TIME=0000;
my $TINC=30;
my $N_DRIVES=6;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
my $IDX;
my $DATA_CLIENT;
my $DATA_START;
my $DATA_END;
my $DATA_POLICY;
my $LAST_END_TIME=-1;



Piecing it all together, I have the following code:


#!/bin/perl
#####################################################################################
# #
# generate_drive_usage.pl() #
# #
# Written By: Alan T. Landucci-Ruiz #
# http://solarisdeveloper.blogspot.com #
# #
# Abstract: This program generates CSV output of Tape drive usage, based on CSV #
# input. It is designed to help facilitate scheduling of tape drive #
# allocations when creating and moving backup schedules. #
# #
# #
# Args: none #
# #
# Variables: #
# $TIME - The time component of my output CSV. #
# $TINC - The increment component of my CSV. #
# $N_DRIVES - The number of drives I have. #
# @DRIVE - My "DRIVE" array: holds the string of allocation. #
# @DRIVE_START - Time that the drive is allocated. #
# @DRIVE_END - Time that the drive is unallocated. #
# $INFILE - The input file csv. #
# $CURR_LINE - The line being processed from the input CSV. #
# $NEXT - Index of my next empty drive. #
# $IDX - Index counter. #
# $DATA_CLIENT - Client that is allocating the drive. #
# $DATA_START - Start time for the client. #
# $DATA_END - End time for the client. #
# $DATA_POLICY - Policy of the client that is allocating the drive. #
# $LAST_END_TIME - The latest end time of all drives. #
# #
# Known Issues: #
# Currently, if a last end time is the next day's time, but earlier than the #
# currently known last end time, then it will use the currently known last end #
# time instead of the earlier one the next day. #
# e.g., 0200 tomorrow will be considered earlier than 1400 today. #
# #
#####################################################################################
use strict;


# Global variables
my $TIME=0000;
my $TINC=30;
my $N_DRIVES=6;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
my $IDX;
my $DATA_CLIENT;
my $DATA_START;
my $DATA_END;
my $DATA_POLICY;
my $LAST_END_TIME=-1;


#####################################################################################
# init_drives() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, blank out the indexed drive (unallocating it), and set start #
# and end times to '-1'. #
#####################################################################################
sub init_drives()
{

foreach $IDX (0..$N_DRIVES-1)

{

$DRIVE[$IDX] = "";

$DRIVE_START[$IDX] = -1;

$DRIVE_END[$IDX] = -1;

}
}


#####################################################################################
# nextempty() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, if the indexed drive is blank (i.e., empty), return that #
# that number. #
#####################################################################################
sub nextempty
{

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] eq "" ) { return $IDX; }

}
}


#####################################################################################
# round_to_incr() #
# Rounds the argument to the next '$TINC' minute mark off the hour (up or down) #
# #
# Args: #
# $time - This is the time that needs to be rounded in HHMM format. #
# #
# Pseudocode: #
# - Grab the minutes by getting the modulus of $time and 100 #
# e.g. $time = 1422 --> 1400 = (1422 % 100) = 22 #
# - Grab the hour by subtracting the minutes from the time. #
# 1400 = 1422 - (22) #
# - Find out how close I are to the $TINC minute mark by creating my $rem #
# variable: my minutes modulus $TINC (e.g., above, 22 %30 = 22). #
# - If my remainder is less than 15, I round down, by subtracting the remainder #
# of my modulus from the actual minutes. #
# e.g. $minutes = 44 --> 44 - (44 % $TINC) = 44 - 14 = $TINC. #
# - Otherwise, I round up by adding the difference of ther remainder and $TINC. #
# $minutes = 22 --> 22 + 30 - (22 % 30) = 22 + 30 - 22 = 30 #
# $minutes = 48 --> 48 + 30 - (48 % 30) = 48 + 30 - 18 = 60 #
# - If my minutes component is less than 60 (less than 1 hour), I add that to #
# my hour component. Otherwise, I just ad 100 to the hour component to get #
# the rounded time. #
# - If my time has exceeded or is at 2400, I subtract 2400. #
# #
#####################################################################################
sub round_to_incr
{

my ( $time ) = @_;


my $minutes = $time % 100;

my $hour = $time - $minutes;


my $rem = $minutes % $TINC;


if ( $rem < minutes =" $minutes">

else { $minutes = $minutes + $TINC - $rem; }


if ( $minutes < time =" $hour">

else { $time = $hour + 100; }


if ($time >= 2400) { $time = $time - 2400 ;}


return $time;
}



#####################################################################################
# print_row() #
# The main printing function of my program; prints the time and drive allocations #
# for each drive. #
# #
# Args: none #
# #
# Pseudocode: #
# - Print the time in HHMM format #
# - Starting with the lowest index (0), #
# For each index, print a comma, then the indexed drive allocation. #
# - Print carriage return. #
#####################################################################################
sub print_row()
{

print "$TIME";

foreach $IDX (0..$N_DRIVES-1)

{

print ",$DRIVE[$IDX]";

}

print "\n";
}


#####################################################################################
# get_last_end_time() #
# Finds the latest end time for the current drive allocations. At this point, #
# does not factor if the end time is the next day. #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# For each index, if the drive end time is greater than the last one I checked, #
# store that value to return. #
# - Return value stored. #
#####################################################################################
sub get_last_end_time()
{

my $RETURNVAL=0;


foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE_END[$IDX] > $RETURNVAL ) { $RETURNVAL=$DRIVE_END[$IDX]; }

}

return $RETURNVAL;
}


#####################################################################################
# print_debug() #
# A simple subroutine to print out the current variables. #
# #
# Args: none #
# #
#####################################################################################
sub print_debug()
{

print "TIME: $TIME\n";

print "DATA_START: $DATA_START\n";

print "DATA_END: $DATA_END\n";

print "DATA_CLIENT: $DATA_CLIENT\n";

print "LAST END TIME: $LAST_END_TIME\n";

print "CURR_LINE: $CURR_LINE\n";
}


#####################################################################################
# main() #
# The main body of the program. #
# #
# Args: none #
# #
# Pseudocode: #
# - Initialize the drive variables (call sub init_drives). #
# - Open the $INFILE for reading. #
# - Read in the first line. #
# - Split that line into my fields: #
# $DATA_START $DATA_END, $DATA_CLIENT, $DATA_POLICY #
# - Start processing within a do-while-loop: #
# - For each index of drives, if the drive is claimed and the drive end time #
# is at the current time, then I release the drive (reset indexed variables). #
# - If the current time is the same as the start time from the line I just got, #
# then claim the drive, by storing the client name, start, and end times. #
# I also grab the next line from my input file $INFILE. #
# Otherwise, I print out my row and increment the time by $TINC minutes. #
# - I find out when my latest end time for allocation is and I store it. #
# - processing ends when (1) I've reached the end of the file, and (2) the time #
# has passed the latest end time. #
# - I close my input file. #
# #
#####################################################################################
sub main()
{


init_drives;


# Things I need before I can start processing:

# 1. Current Time "" got it (above)

# 2. Current Line "" Need to start on first line


my $FD=open(FD,"<$INFILE") or die "Cannot open: $!";


# I grab the first line of the file and extract the values.

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);


# Now I can start my processing. I do this until the end of the file.

do

{

# First, I'll check to see if each drive is claimed and if it is, I check if

# it's reached its end time. If so, I release that drive.

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] ne "" && $DRIVE_END[$IDX] == $TIME )

{

# To release the drive, I reset all values.

$DRIVE[$IDX] = "";

$DRIVE_END[$IDX] = -1;

$DRIVE_START[$IDX] = -1;

}

}


# Second, I need to check if my data line's start time matches

# the current time.

# If it is, I claim a drive, and go to the next line.

# If it isn't, I print the output and increment the time.

if ( $TIME == $DATA_START )

{

$NEXT=nextempty;

$DRIVE[$NEXT] = $DATA_CLIENT;

$DRIVE_START[$NEXT] = $DATA_START;

$DRIVE_END[$NEXT] = $DATA_END;

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);

} else

{

print_row();

$TIME = round_to_incr($TIME + $TINC);

}


$LAST_END_TIME = round_to_incr(get_last_end_time());


} until ( eof(FD) && $TIME == $LAST_END_TIME && $CURR_LINE eq "" );

# I want to stop processing after (a) I've reached the end of the file, and (b)

# and (b) I've gone past the last end time.

print_row();


close(FD);
}


main;




So, let's see how this program stacks up on our sample data:


~ $ cat NetBackup_Export.csv
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
~ $ ./generate_drive_usage2.pl
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
~ $



Well, that looks pretty good so far. Let's double the data (i.e., add the same data for next day):


~ $ cat NetBackup_Export.csv
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
~ $ ./generate_drive_usage2.pl
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
700,,,,,,
730,,,,,,
800,,,,,,
830,,,,,,
900,,,,,,
930,,,,,,
1000,,,,,,
1030,,,,,,
1100,,,,,,
1130,,,,,,
1200,,,,,,
1230,,,,,,
1300,,,,,,
1330,,,,,,
1400,,,,,,
1430,,,,,,
1500,,,,,,
1530,,,,,,
1600,,,,,,
1630,,,,,,
1700,,,,,,
1730,,,,,,
1800,,,,,,
1830,,,,,,
1900,,,,,,
1930,,,,,,
2000,,,,,,
2030,,,,,,
2100,,,,,,
2130,,,,,,
2200,,,,,,
2230,,,,,,
2300,,,,,,
2330,,,,,,
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
~ $




SUMMARY


To conclude, given a set of data, we can plot our drive usage per client (and with some modification, even per policy, probably), if we're given the start time, end time, and the clients. This should be pretty easy to get with any reporting software, such as NetBackup 7 OpsCenter, or an export from the NetBackup Administration Console report. Because these give the output in different time formats (hh:mm non-24-hour), there will be some additional scripting that you will have to do to convert it to this format, either externally (from another program), or internally (added to this program).