Run7 nanoDST to RCF operations
The procedure for transferring nanoDSTs to RCF during the Run7
min bias reconstruction project
Contact Person:
Charles F. Maguire
Creation Date: May 19, 2007
Last update May 29, 2007: Revisions for actual operations
- The main cautoRun7 script arranges for previously obtained reconstruction output to be
placed in a set of 14 subdirectories which are in a newstore directory,
specifically /gfps3/RUN7PRDF/prod/run7/output/newstore directory. This move
is done at the beginning of cautoRun7 starting to run, after checks are made
that it is OK to submit a new set of reconstruction jobs. The script which does
these moves is move.sh. After the moves are finished the cautoRun7 script
determined a new set of jobs to be run, and submits those to PBS via the submit.pl
script. After the submit.pl has completed all of its job submissions the last step
of the cautoRun7
script is to initiate a fdtStartNanoServer.csh job on the vmps02 node.
- The fdtStartNanoServer.csh script running on the vmps02 node starts
the fdtStartNanoServer.pl script.
- The fdtStartNanoServer.pl will start the FDT script
fdtNanoServer.csh on the vmps02 node provided there is no nanoDST
file transfer already in progress, and provided that there is disk space available on
eon0 or eon1 at firebird. Otherwise the fdtStartNanoServer.pl script
will exit without doing anything. A BACKUP SCRIPT NEEDS TO BE WRITTEN TO
TAKE OVER WHEN THIS OCCURS! Both of these situations would be very abnormal.
The cautoRun7 script itself should not initiate a new set of transfers
unless the previous set was confirmed to be successfully at RCF. The buffer
disks at firebird should not generally be more than 50% full
- The fdtNanoServer.csh script will invoke the fdtStartNanoClientEon0(1).csh script on
firebird, and will also make a fdtInProgress file to block any new
job launches by the cautoRun7 script. The fdtInProgress file
will be removed by the fdtNanoServer.csh script after the FDT process
has completed transferring all the files to firebird.
- The fdtStartNanoClient.csh script will wait 30 seconds, and then execute the
fdtNanoClientEon0(1).csh script on firebird. The 30 seconds is the usual
wait to make sure that the server process starts first.
- When the FDT is finished, the fdtNanoClientEon0(1).csh script will call the
gridFTPNanoEon0(1).pl script to do the gridFTP transfers to RCF
- The gridFTPNanoEon0(1).pl script composes and executes the three grid transfer .csh
scripts which run in parallel. Originally the grid transfer scripts had to
be composed for each transfer since an older version of gridFTP required
that each file be named separately. However, the newest version of gridFTP
allows whole directories to be copied with one command. Nonetheless, I left
the same three grid transfer .csh scripts to be composed anew each time.
Each transfer script sends a file completion script
as its last command. This completion (handshake) file contains the names and
sizes of all the files which have been transferred. The handshake file is
used at RCF for checking that the file transfers had no errors.
A gridFtpInProgress file is sent to the ACCRE nanoDstTransfering
area before the grid transfers begin. This file will block any starts of the
cautoRun7 script for the production of new nanoDSTs.
- After starting the three grid transfer scripts, the gridFTPNanoEon0(1).pl script waits
in 5 minute cycles until an eon0(1)UploadSuccess.txt file is
found.
- The eon0(1)UploadSuccess.txt file is sent using a gridFTP transfer by the
confirmAndEraseData59(58,63) script which runs every 30 minutes as a cron job in the maguire
account on the rftpexp01 node. This script checks that the files arrived at RCF
correctly. As you can see this is one of the current weak points of the project,
that the output areas at RCF have to be adjusted manually instead of
automatically.
- After the UploadSuccess.txt file appears the nanoDST files in the firebird newstore
areas are removed, and a /home/phenix/nanoDsttTransfering/moveNanoToStore.csh script
is executed on the vmps02 node.
- The moveNanoToStore.csh script will move the files from the newstore to the store
area. After this move the gridFtpInProgress file is removed from the
/home/phenix/nanoDstTransfering area which permits new job submissions by
the cautoRun7 master script.
- The gridFTP copy could hang and not all the files are transferred. In
that case one has to do manual commands to recover the missing files. This would
mean killing the hanging globus job, if there is one. Then one has to
manually find out what are the missing files at RCF and arrange to have
those copied to RCF by another gridFTP script. Once this is done, then
the hopefully the automatic checking process would resume to verify
that the files are all present.
- The transfer rate should proceed at a minimum of 15 MBytes/second over
many hours. It is possible that the destination disk at RCF may become
very overloaded by other users. In that case, one would have to find another
disk destination area at RCF, and start the transfers from the beginning
to that new area.
- I have written new gridFTP monitoring scripts which will send an
e-mail if the average gridFTP transfer rate drops below 15 MBytes/second.
These scripts were in use during the May 28-29 gridFTP transfer to
data63, and no alarm message was sent during those 11 hours of transfer.