Thursday, July 1, 2010

Charting Drive Usage from Backups


PROBLEM


One of the biggest problems with the free reporting tools that are available for NetBackup is that they don't have visual tools for schedule planning. I've done charts manually by taking the start times and end times and made a spreadsheet, but there should be a way to do this automatically from a given set of data. That's what I'm going to attempt to do here.



ANALYSIS


First, I'll start with some sample data. I know I want my start time and end time and I'll want to know either the name of the policy or the name of the client. I'll include both in my sample data, but I'll just need one in my report.
I'll also exclude backups that have 0 bytes associated to them. This is often the parent job or is a duplicate. Alternatively, I can search to see if the job has a parent. These exclusions/inclusions will be done from the data aggregation side and will not be covered here. To create my sample data, I'll start by creating what I want my chart to look like, then create the data from it.



Date/Time

Drive 1

Drive 2

Drive 3

Drive 4

Drive 5

Drive 6

0000







0030







0100

hagar






0130

hagar

honi

snert




0200

hagar

helga

snert




0230


helga

snert




0300

hamlet


snert




0330

hamlet


snert




0400

hamlet


snert




0430

hamlet






0500

hamlet

kvack





0530

hamlet

kvack





0600







0630

hernia






0700









From this table, I can see that hagar's backup started at 1:00 and ended by 2:30. In between that time, Honi's backup started at 01:30 and ended at 2:00, where helga's backup ran until 3:00. Snert's backup started the same time as honi's, but ended much later, at 4:30. hamlet had a backup start at 3:00 and end at 6:00. Kvack ran from 5:00 until 6:00. After 30 minutes of no backups, hernia's backup started at 6:30. Using the Start Time as my sort key for the data, the following CSV would be an appropriate set of data:


0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux


So the format I have for my CSV is “Start Time”, “End Time”, “client”, “policy”.


From Table 1, I can see that I want Time to be the major axis in gathering data. Time will increment as I sample the data. Notice, too, that two clients start their backup at the same time, so I will have to factor that in, as well.

Now, let's break down Table 1, comparing to my data set and see what's really going on. I'm starting out with a time set at midnight (0000), and I increment that by 30 minutes for each table row. I'm also going to say that my “current data line” is the first line of my data set. Since I sorted my data by the start time, I compare the start time of the current data line. If I haven't yet reached the start time, I increment my table row time. I do this until I find I've matched. When I've reached the start time of my current line, I “claim” the first available drive.

At the same time, I'm also looking for end times of “claimed” drives. But I realize that I should do this first, so that if a drive is “released” at 0300, that same drive can now be used at 0300. This is only a minor preference and can sometimes be erroneous, as I are rounding to the nearest half-hour. Because of this rounding, I will have to be careful not to omit small backups that ran for less than 15 minutes (0200-0210 would be listed as 0200-0200 and would cancel out), but I'll get to that later. In summary, my pseudo-code would look like this:


Time starts at 0000

Grab current data line

for each drive, if drive is claimed, if end time for claimed drive is current time, release drive.

If current data line's start time is current time, claim next unused drive; mark start time for the unused drive.

If current time has not exceeded current data line's start time yet, increment the time

Otherwise, I'll go to the next data line.


Now, Let's see how this pseudo-code stacks up to my sample data:


Time starts at 0000.

Grab current data line (0100, 0230, hagar, hagar-windows-full)

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0000 = 0100? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0000 <>


0000









Current data line: 0100, 0230, hagar, hagar-windows-full

Current Time: 0030

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0030 = 0100? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0030 <>


0000







0030









Current data line: 0100, 0230, hagar, hagar-windows-full

Current Time: 0100

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0100 = 0100? YES!), Claim drive #1.

If current time has not exceeded current data line's start time, increment the time:

is 0100 <>


0000







0030









Current data line: 0130, 0200, honi, honi-data

Current Time: 0100

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0100 = 0130? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0100 <>


0000







0030







0100

hagar








Current data line: 0130, 0200, honi, honi-data

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0130? YES!), claim next available drive (#2).

If current time has not exceeded current data line's start time, increment the time:

is 0130 <>


0000







0030







0100

hagar








Current data line: 0130, 0430, snert, snert-database-only

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? No.

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0130? YES!), claim next available drive (#3), with end time of 0430.

If current time has not exceeded current data line's start time, increment the time:

is 0130 <>



0000







0030







0100

hagar








Current data line: 0200, 0300, helga, helga-system-files

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? No.

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0200? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0130 <>


0000







0030







0100

hagar






0130

hagar

honi

snert






Current data line: 0200, 0300, helga, helga-system-files

Current Time: 0200

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? Yes! Release this drive

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0200 = 0200? YES!), claim next unclaimed drive (#2), with new end time of 0300

If current time has not exceeded current data line's start time, increment the time:

is 0200 <>


0000







0030







0100

hagar






0130

hagar

honi

snert






Current data line: 0300, 0400, hamlet, hamlet-windows-full

Current Time: 0200

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0300) yet? No.

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0200 = 0300? No)...

If current time has not exceeded current data line's start time, increment the time:

is 0200 <>


0000







0030







0100

hagar






0130

hagar

honi

snert




0200

hagar

helga

snert






Note that I run into a problem when I go to the next day. my clock only goes to 2330. The next day is 0000. What happens if my time is 2200 and the current data's start time is 0100? Because I'm testing with a less than (2200 <>




SOLUTION


The above output is fine, as it's generated logically in my brain and just typed on here. but when are I creating the output? Ill, because each line is a function of the time, I output every time the time is incremented. So, from my pseudo-code, let's write a simple perl program to do what I've been doing by hand:


#!/bin/perl
use strict;


# Global variables
my $TIME=0000;
my $TINC=0030;
my $N_DRIVES=8;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;


# First, I want to initialize all my drive usages:
# client, start time, and end time.
sub init_drives()
{

local $IDX;

foreach $IDX (0..$N_DRIVES-1)

{

$DRIVE[$IDX] = “”;

$DRIVE_START[$IDX] = -1;

$DRIVE_END[$IDX] = -1;

}
}


sub nextempty
{

local $IDX;

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] eq "" ) { return $IDX; }

}
}


sub print_row()
{

print “$TIME”;

foreach $IDX (0..$N_DRIVES-1)

{

print “,$DRIVE[$IDX]”;

}

print “\n”;

}


sub main()
{

my $IDX;


init_drives;

# Things I need before I can start processing:

# 1. Current Time – got it (above)

# 2. Current Line – Need to start on first line

my $FD=open(FD,”<$INFILE”) or die “Cannot open: $!”;


# I grab the first line of the file and extract the values.

$CURR_LINE=;

my ($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) =

split(','$CURR_LINE);

# Now I can start my processing. I do this until the end of the file.

do

{

# First, I'll check to see if each drive is claimed and if it is, I check if

# it's reached its end time. If so, I release that drive.

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] ne “” && $DRIVE_END[$IDX] eq $TIME )

{

# To release the drive, I reset all values.

$DRIVE[$IDX] = “”;

$DRIVE_END[$IDX] = -1;

$DRIVE_START[$IDX] = -1;

}

}

# Second, I need to check if my data line's start time matches

# the current time. If it is, I claim a drive. If it isn't, I increment

# the time.

if ( $TIME == $DATA_START )

{

$NEXT=nextdrive;

$DRIVE[$NEXT] = $DATA_CLIENT;

$DRIVE_START[$NEXT] = $DATA_START;

$DRIVE_END[$NEXT] = $DATA_END;

} else

{

print_row;

$TIME = $TIME + 30;

}


} until (eof(FD));


close(FD);
}



Of course, if you try to run the code, you'll find that it has a few issues with it, but I'll add a few subroutines and clean it up a bit.. Let's break down the subroutines first, though. I initialize the drives with my init_drives subroutine:


#####################################################################################
# init_drives() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, blank out the indexed drive (unallocating it), and set start #
# and end times to '-1'. #
#####################################################################################
sub init_drives()
{

foreach $IDX (0..$N_DRIVES-1)

{

$DRIVE[$IDX] = "";

$DRIVE_START[$IDX] = -1;

$DRIVE_END[$IDX] = -1;

}
}



Second, I need a subroutine that finds my next empty drive:


#####################################################################################
# nextempty() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, if the indexed drive is blank (i.e., empty), return that #
# that number. #
#####################################################################################
sub nextempty
{

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] eq "" ) { return $IDX; }

}
}



Third, I are incrementing by 30 minutes each time, but I'm using basic math to do it, so 30 + 30 = 60, which isn't what I want. Also, I want to round everything to the closest increment. In this case, I create one subroutine to do both rounding and fixing:


#####################################################################################
# round_to_incr() #
# Rounds the argument to the next '$TINC' minute mark off the hour (up or down) #
# #
# Args: #
# $time - This is the time that needs to be rounded in HHMM format. #
# #
# Pseudocode: #
# - Grab the minutes by getting the modulus of $time and 100 #
# e.g. $time = 1422 --> 1400 = (1422 % 100) = 22 #
# - Grab the hour by subtracting the minutes from the time. #
# 1400 = 1422 - (22) #
# - Find out how close I are to the $TINC minute mark by creating my $rem #
# variable: my minutes modulus $TINC (e.g., above, 22 %30 = 22). #
# - If my remainder is less than 15, I round down, by subtracting the remainder #
# of my modulus from the actual minutes. #
# e.g. $minutes = 44 --> 44 - (44 % $TINC) = 44 - 14 = $TINC. #
# - Otherwise, I round up by adding the difference of ther remainder and $TINC. #
# $minutes = 22 --> 22 + 30 - (22 % 30) = 22 + 30 - 22 = 30 #
# $minutes = 48 --> 48 + 30 - (48 % 30) = 48 + 30 - 18 = 60 #
# - If my minutes component is less than 60 (less than 1 hour), I add that to #
# my hour component. Otherwise, I just ad 100 to the hour component to get #
# the rounded time. #
# - If my time has exceeded or is at 2400, I subtract 2400. #
# #
#####################################################################################
sub round_to_incr
{

my ( $time ) = @_;


my $minutes = $time % 100;

my $hour = $time - $minutes;


my $rem = $minutes % $TINC;


if ( $rem < minutes =" $minutes">

else { $minutes = $minutes + $TINC - $rem; }


if ( $minutes < time =" $hour">

else { $time = $hour + 100; }


if ($time >= 2400) { $time = $time - 2400 ;}


return $time;
}



We also need a subroutine that outputs everything I have in my basic CSV format. Let's call this “print_row”, since I'm printing one row of CSV every time.



#####################################################################################
# print_row() #
# The main printing function of my program; prints the time and drive allocations #
# for each drive. #
# #
# Args: none #
# #
# Pseudocode: #
# - Print the time in HHMM format #
# - Starting with the lowest index (0), #
# For each index, print a comma, then the indexed drive allocation. #
# - Print carriage return. #
#####################################################################################
sub print_row()
{

print "$TIME";

foreach $IDX (0..$N_DRIVES-1)

{

print ",$DRIVE[$IDX]";

}

print "\n";
}



Finally, I'll need to find out what the latest end time is of each allocated drive. That is, what's the latest time that all drives will be released?


#####################################################################################
# get_last_end_time() #
# Finds the latest end time for the current drive allocations. At this point, #
# does not factor if the end time is the next day. #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# For each index, if the drive end time is greater than the last one I checked, #
# store that value to return. #
# - Return value stored. #
#####################################################################################
sub get_last_end_time()
{

my $RETURNVAL=0;


foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE_END[$IDX] > $RETURNVAL ) { $RETURNVAL=$DRIVE_END[$IDX]; }

}

return $RETURNVAL;
}



Now, I've added a debug function to help when things get really hairy. This isn't necessary, but it helps me figure out if my variables are getting updated when they're supposed to, or not getting updated when they're not supposed to.


#####################################################################################
# print_debug() #
# A simple subroutine to print out the current variables. #
# #
# Args: none #
# #
#####################################################################################
sub print_debug()
{

print "TIME: $TIME\n";

print "DATA_START: $DATA_START\n";

print "DATA_END: $DATA_END\n";

print "DATA_CLIENT: $DATA_CLIENT\n";

print "LAST END TIME: $LAST_END_TIME\n";

print "CURR_LINE: $CURR_LINE\n";
}


And finally, my main program, as seen above, but tweaked with the added functions. Note that I've had to add checks for the do..while loop to see if I'm at the end of the file, and my current line is blank. This is because if I reach the end of the file, it will skip over that last line, which is something I definitely don't want.


#####################################################################################
# main() #
# The main body of the program. #
# #
# Args: none #
# #
# Pseudocode: #
# - Initialize the drive variables (call sub init_drives). #
# - Open the $INFILE for reading. #
# - Read in the first line. #
# - Split that line into my fields: #
# $DATA_START $DATA_END, $DATA_CLIENT, $DATA_POLICY #
# - Start processing within a do-while-loop: #
# - For each index of drives, if the drive is claimed and the drive end time #
# is at the current time, then I release the drive (reset indexed variables). #
# - If the current time is the same as the start time from the line I just got, #
# then claim the drive, by storing the client name, start, and end times. #
# I also grab the next line from my input file $INFILE. #
# Otherwise, I print out my row and increment the time by $TINC minutes. #
# - I find out when my latest end time for allocation is and I store it. #
# - processing ends when (1) I've reached the end of the file, and (2) the time #
# has passed the latest end time. #
# - I close my input file. #
# #
#####################################################################################
sub main()
{


init_drives;


# Things I need before I can start processing:

# 1. Current Time "" got it (above)

# 2. Current Line "" Need to start on first line


my $FD=open(FD,"<$INFILE") or die "Cannot open: $!";


# I grab the first line of the file and extract the values.

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);


# Now I can start my processing. I do this until the end of the file.

do

{

# First, I'll check to see if each drive is claimed and if it is, I check if

# it's reached its end time. If so, I release that drive.

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] ne "" && $DRIVE_END[$IDX] == $TIME )

{

# To release the drive, I reset all values.

$DRIVE[$IDX] = "";

$DRIVE_END[$IDX] = -1;

$DRIVE_START[$IDX] = -1;

}

}


# Second, I need to check if my data line's start time matches

# the current time.

# If it is, I claim a drive, and go to the next line.

# If it isn't, I print the output and increment the time.

if ( $TIME == $DATA_START )

{

$NEXT=nextempty;

$DRIVE[$NEXT] = $DATA_CLIENT;

$DRIVE_START[$NEXT] = $DATA_START;

$DRIVE_END[$NEXT] = $DATA_END;

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);

} else

{

print_row();

$TIME = round_to_incr($TIME + $TINC);

}


$LAST_END_TIME = round_to_incr(get_last_end_time());


} until ( eof(FD) && $TIME == $LAST_END_TIME && $CURR_LINE eq "" );

# I want to stop processing after (a) I've reached the end of the file, and (b)

# and (b) I've gone past the last end time.

print_row();


close(FD);
}



And finally, I need my variables:


# Global variables
my $TIME=0000;
my $TINC=30;
my $N_DRIVES=6;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
my $IDX;
my $DATA_CLIENT;
my $DATA_START;
my $DATA_END;
my $DATA_POLICY;
my $LAST_END_TIME=-1;



Piecing it all together, I have the following code:


#!/bin/perl
#####################################################################################
# #
# generate_drive_usage.pl() #
# #
# Written By: Alan T. Landucci-Ruiz #
# http://solarisdeveloper.blogspot.com #
# #
# Abstract: This program generates CSV output of Tape drive usage, based on CSV #
# input. It is designed to help facilitate scheduling of tape drive #
# allocations when creating and moving backup schedules. #
# #
# #
# Args: none #
# #
# Variables: #
# $TIME - The time component of my output CSV. #
# $TINC - The increment component of my CSV. #
# $N_DRIVES - The number of drives I have. #
# @DRIVE - My "DRIVE" array: holds the string of allocation. #
# @DRIVE_START - Time that the drive is allocated. #
# @DRIVE_END - Time that the drive is unallocated. #
# $INFILE - The input file csv. #
# $CURR_LINE - The line being processed from the input CSV. #
# $NEXT - Index of my next empty drive. #
# $IDX - Index counter. #
# $DATA_CLIENT - Client that is allocating the drive. #
# $DATA_START - Start time for the client. #
# $DATA_END - End time for the client. #
# $DATA_POLICY - Policy of the client that is allocating the drive. #
# $LAST_END_TIME - The latest end time of all drives. #
# #
# Known Issues: #
# Currently, if a last end time is the next day's time, but earlier than the #
# currently known last end time, then it will use the currently known last end #
# time instead of the earlier one the next day. #
# e.g., 0200 tomorrow will be considered earlier than 1400 today. #
# #
#####################################################################################
use strict;


# Global variables
my $TIME=0000;
my $TINC=30;
my $N_DRIVES=6;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
my $IDX;
my $DATA_CLIENT;
my $DATA_START;
my $DATA_END;
my $DATA_POLICY;
my $LAST_END_TIME=-1;


#####################################################################################
# init_drives() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, blank out the indexed drive (unallocating it), and set start #
# and end times to '-1'. #
#####################################################################################
sub init_drives()
{

foreach $IDX (0..$N_DRIVES-1)

{

$DRIVE[$IDX] = "";

$DRIVE_START[$IDX] = -1;

$DRIVE_END[$IDX] = -1;

}
}


#####################################################################################
# nextempty() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, if the indexed drive is blank (i.e., empty), return that #
# that number. #
#####################################################################################
sub nextempty
{

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] eq "" ) { return $IDX; }

}
}


#####################################################################################
# round_to_incr() #
# Rounds the argument to the next '$TINC' minute mark off the hour (up or down) #
# #
# Args: #
# $time - This is the time that needs to be rounded in HHMM format. #
# #
# Pseudocode: #
# - Grab the minutes by getting the modulus of $time and 100 #
# e.g. $time = 1422 --> 1400 = (1422 % 100) = 22 #
# - Grab the hour by subtracting the minutes from the time. #
# 1400 = 1422 - (22) #
# - Find out how close I are to the $TINC minute mark by creating my $rem #
# variable: my minutes modulus $TINC (e.g., above, 22 %30 = 22). #
# - If my remainder is less than 15, I round down, by subtracting the remainder #
# of my modulus from the actual minutes. #
# e.g. $minutes = 44 --> 44 - (44 % $TINC) = 44 - 14 = $TINC. #
# - Otherwise, I round up by adding the difference of ther remainder and $TINC. #
# $minutes = 22 --> 22 + 30 - (22 % 30) = 22 + 30 - 22 = 30 #
# $minutes = 48 --> 48 + 30 - (48 % 30) = 48 + 30 - 18 = 60 #
# - If my minutes component is less than 60 (less than 1 hour), I add that to #
# my hour component. Otherwise, I just ad 100 to the hour component to get #
# the rounded time. #
# - If my time has exceeded or is at 2400, I subtract 2400. #
# #
#####################################################################################
sub round_to_incr
{

my ( $time ) = @_;


my $minutes = $time % 100;

my $hour = $time - $minutes;


my $rem = $minutes % $TINC;


if ( $rem < minutes =" $minutes">

else { $minutes = $minutes + $TINC - $rem; }


if ( $minutes < time =" $hour">

else { $time = $hour + 100; }


if ($time >= 2400) { $time = $time - 2400 ;}


return $time;
}



#####################################################################################
# print_row() #
# The main printing function of my program; prints the time and drive allocations #
# for each drive. #
# #
# Args: none #
# #
# Pseudocode: #
# - Print the time in HHMM format #
# - Starting with the lowest index (0), #
# For each index, print a comma, then the indexed drive allocation. #
# - Print carriage return. #
#####################################################################################
sub print_row()
{

print "$TIME";

foreach $IDX (0..$N_DRIVES-1)

{

print ",$DRIVE[$IDX]";

}

print "\n";
}


#####################################################################################
# get_last_end_time() #
# Finds the latest end time for the current drive allocations. At this point, #
# does not factor if the end time is the next day. #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# For each index, if the drive end time is greater than the last one I checked, #
# store that value to return. #
# - Return value stored. #
#####################################################################################
sub get_last_end_time()
{

my $RETURNVAL=0;


foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE_END[$IDX] > $RETURNVAL ) { $RETURNVAL=$DRIVE_END[$IDX]; }

}

return $RETURNVAL;
}


#####################################################################################
# print_debug() #
# A simple subroutine to print out the current variables. #
# #
# Args: none #
# #
#####################################################################################
sub print_debug()
{

print "TIME: $TIME\n";

print "DATA_START: $DATA_START\n";

print "DATA_END: $DATA_END\n";

print "DATA_CLIENT: $DATA_CLIENT\n";

print "LAST END TIME: $LAST_END_TIME\n";

print "CURR_LINE: $CURR_LINE\n";
}


#####################################################################################
# main() #
# The main body of the program. #
# #
# Args: none #
# #
# Pseudocode: #
# - Initialize the drive variables (call sub init_drives). #
# - Open the $INFILE for reading. #
# - Read in the first line. #
# - Split that line into my fields: #
# $DATA_START $DATA_END, $DATA_CLIENT, $DATA_POLICY #
# - Start processing within a do-while-loop: #
# - For each index of drives, if the drive is claimed and the drive end time #
# is at the current time, then I release the drive (reset indexed variables). #
# - If the current time is the same as the start time from the line I just got, #
# then claim the drive, by storing the client name, start, and end times. #
# I also grab the next line from my input file $INFILE. #
# Otherwise, I print out my row and increment the time by $TINC minutes. #
# - I find out when my latest end time for allocation is and I store it. #
# - processing ends when (1) I've reached the end of the file, and (2) the time #
# has passed the latest end time. #
# - I close my input file. #
# #
#####################################################################################
sub main()
{


init_drives;


# Things I need before I can start processing:

# 1. Current Time "" got it (above)

# 2. Current Line "" Need to start on first line


my $FD=open(FD,"<$INFILE") or die "Cannot open: $!";


# I grab the first line of the file and extract the values.

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);


# Now I can start my processing. I do this until the end of the file.

do

{

# First, I'll check to see if each drive is claimed and if it is, I check if

# it's reached its end time. If so, I release that drive.

foreach $IDX (0..$N_DRIVES-1)

{

if ( $DRIVE[$IDX] ne "" && $DRIVE_END[$IDX] == $TIME )

{

# To release the drive, I reset all values.

$DRIVE[$IDX] = "";

$DRIVE_END[$IDX] = -1;

$DRIVE_START[$IDX] = -1;

}

}


# Second, I need to check if my data line's start time matches

# the current time.

# If it is, I claim a drive, and go to the next line.

# If it isn't, I print the output and increment the time.

if ( $TIME == $DATA_START )

{

$NEXT=nextempty;

$DRIVE[$NEXT] = $DATA_CLIENT;

$DRIVE_START[$NEXT] = $DATA_START;

$DRIVE_END[$NEXT] = $DATA_END;

$CURR_LINE=;

($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);

} else

{

print_row();

$TIME = round_to_incr($TIME + $TINC);

}


$LAST_END_TIME = round_to_incr(get_last_end_time());


} until ( eof(FD) && $TIME == $LAST_END_TIME && $CURR_LINE eq "" );

# I want to stop processing after (a) I've reached the end of the file, and (b)

# and (b) I've gone past the last end time.

print_row();


close(FD);
}


main;




So, let's see how this program stacks up on our sample data:


~ $ cat NetBackup_Export.csv
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
~ $ ./generate_drive_usage2.pl
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
~ $



Well, that looks pretty good so far. Let's double the data (i.e., add the same data for next day):


~ $ cat NetBackup_Export.csv
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
~ $ ./generate_drive_usage2.pl
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
700,,,,,,
730,,,,,,
800,,,,,,
830,,,,,,
900,,,,,,
930,,,,,,
1000,,,,,,
1030,,,,,,
1100,,,,,,
1130,,,,,,
1200,,,,,,
1230,,,,,,
1300,,,,,,
1330,,,,,,
1400,,,,,,
1430,,,,,,
1500,,,,,,
1530,,,,,,
1600,,,,,,
1630,,,,,,
1700,,,,,,
1730,,,,,,
1800,,,,,,
1830,,,,,,
1900,,,,,,
1930,,,,,,
2000,,,,,,
2030,,,,,,
2100,,,,,,
2130,,,,,,
2200,,,,,,
2230,,,,,,
2300,,,,,,
2330,,,,,,
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
~ $




SUMMARY


To conclude, given a set of data, we can plot our drive usage per client (and with some modification, even per policy, probably), if we're given the start time, end time, and the clients. This should be pretty easy to get with any reporting software, such as NetBackup 7 OpsCenter, or an export from the NetBackup Administration Console report. Because these give the output in different time formats (hh:mm non-24-hour), there will be some additional scripting that you will have to do to convert it to this format, either externally (from another program), or internally (added to this program).


No comments:

Post a Comment