recipes that save time
As an HMS Core we get storage on O2 for free. We are not the biggest user but we are in the top 10. As such it is smart for us to be good citizens and keep our footprint as low as possible. Ways to reduce our footprint include
I recommend avoiding things like Dropbox, Google Drive or Box unless the data is small. They aren’t really built for this purpose.
We have access to standby storage on O2 (/n/standby/cores/bcbio/). standby dir is only accessible from the transfer node. For projects that are either too small to bother with returning to the researcher or projects where we think we may want to access the data again, we can tar.gz them and store them here. Leave a symlink in the original directory to allow easy restoration of the project
Once you have restored the project, delete the standby file.
Once you are finished with the project, rearchive it
Please don’t keep an archived copy of the diretory in two places plus the expanded folder. Duplicated data is wasted space and makes John cry.
With the caveat that every project is different here are some general guidelines to help guide your decision making process.
1) Is the data “large” (>500GB)? As much as possible, we’d prefer to get rid of these ASAP
2) Will you need to access the data and derived files again? If yes, tidy up any unnecessary files and archive
The following points can inform your decision making about how likely we will need to reaccess the data
See Globus - for sending data to clients AND for downloading data with Globus, see the Globus section in [Data Management](admin/data_management.md)
Sample script for older projects.
Subject: Delete/return data? re: (list project) PIname_c018_20K_cells_Samples_9-28-17
Hi xxx –
I hope this email finds you well. We have data from your lab’s project. Do you have the data you need from this? Due to storage constraints, we must remove the data from our HMS O2 storage. We will delete the data if you are all set. The project is from 2019 and is labeled:
PI1/Contact1 - Test RNASeq of human brain HBC12345
If you do not have the data and want to retain the raw and derived data in whole or in part, please let us know so we can facilitate transferring the data back to you.
Is two weeks enough before we delete the data?
We look forward to hearing from you.
Thanks and best,
If they say yes, they’d like the data, Start by asking the PI/Postdoc for their globus ID or to get one. Once you get it, move on to explaining how to use it. See Globus for more scripts/suggested process.