knowledgebase

recipes that save time

View the Project on GitHub hbc/knowledgebase

Data managment on O2

As an HMS Core we get storage on O2 for free. We are not the biggest user but we are in the top 10. As such it is smart for us to be good citizens and keep our footprint as low as possible. Ways to reduce our footprint include

Reduce space used by active consults

Get the data out of our main storage area

Return data from completed analyses to the researcher

I recommend avoiding things like Dropbox, Google Drive or Box unless the data is small. They aren’t really built for this purpose.

Archive the data

We have access to standby storage on O2 (/n/standby/cores/bcbio/). standby dir is only accessible from the transfer node. For projects that are either too small to bother with returning to the researcher or projects where we think we may want to access the data again, we can tar.gz them and store them here. Leave a symlink in the original directory to allow easy restoration of the project

Once you have restored the project, delete the standby file.

Once you are finished with the project, rearchive it

Please don’t keep an archived copy of the diretory in two places plus the expanded folder. Duplicated data is wasted space and makes John cry.

How to decide what to do with the data

With the caveat that every project is different here are some general guidelines to help guide your decision making process.

1) Is the data “large” (>500GB)? As much as possible, we’d prefer to get rid of these ASAP

2) Will you need to access the data and derived files again? If yes, tidy up any unnecessary files and archive

The following points can inform your decision making about how likely we will need to reaccess the data

Globus howto

See Globus - for sending data to clients AND for downloading data with Globus, see the Globus section in [Data Management](admin/data_management.md)

Sample script for older projects.

Subject: Delete/return data? re: (list project) PIname_c018_20K_cells_Samples_9-28-17

Hi xxx –

I hope this email finds you well. We have data from your lab’s project. Do you have the data you need from this? Due to storage constraints, we must remove the data from our HMS O2 storage. We will delete the data if you are all set. The project is from 2019 and is labeled:

PI1/Contact1 - Test RNASeq of human brain HBC12345

If you do not have the data and want to retain the raw and derived data in whole or in part, please let us know so we can facilitate transferring the data back to you.

Is two weeks enough before we delete the data?

We look forward to hearing from you.

Thanks and best,

If they say yes, they’d like the data, Start by asking the PI/Postdoc for their globus ID or to get one. Once you get it, move on to explaining how to use it. See Globus for more scripts/suggested process.