recipes that save time
(also read new checklist on dropbox - HBC Team Folder (1)/Consults/_checklists/Data Management Checklist of Bulk RNA.docx)
Adapt the Trello project name to work on the server (i.e. replace spaces with underscores, remove special characters and make lowercase) and use that as the github repo name
If the project is not on Trello, use something specific so we can tell which rpoject it is:
Make sure to include the hbc_ prefix if you can find it:
hbc_$technology_of_$pilastname_$intervention_on_$tissue_in_$organism_$hbccode
$pifirstname_pilastnameGo inside the repo directory, and setup subfolders called:
#### data (for raw data)
#### meta (for extra, unformatted, sample metadata)
#### templates (bcbio config files) #### docs (other information that you might want to keep near the data)
## Notes on folders
### meta
### templates
bcbio_download_template rnaseq (for example) to get the template for your particular technology freebayes-variant.yaml
illumina-chipseq.yaml
illumina-rnaseq.yaml
indrop-singlecell.yaml
tumor-paired.yaml
gatk-variant.yaml
illumina-fastrnaseq.yaml
illumina-srnaseq.yaml
noalign-variant.yaml
So for example, bcbio_download_template gatk will only pull down the gatk-variant.yaml template, but bcbio_download_template ill will pull down every template that has illlumina in its’ name.
upload:dir variableAt the end of the run, you will have a directory structure that looks something like this:
├── Homo_sapiens.GRCh38.92.gtf
├── Homo_sapiens.GRCh38.92-tx2gene.tsv
├── Homo_sapiens.GRCh38.cdna.all.fa
├── indrop-rnaseq.yaml
├── metadata
│ ├── lane1_NoIndex_L001
│ ├── lane2_NoIndex_L002
├── sc-human
│ ├── config
│ │ ├── sscc-human.csv
│ │ ├── sc-human-template.yaml
│ │ ├── sc-human.yaml
│ │ └── sc-human.yaml.bak2018-07-12-14-38-13
│ └── final
│ ├── 2018-07-19_sc-human
│ │ ├── bcbio-nextgen-commands.log
│ │ ├── bcbio-nextgen.log
│ │ ├── bcb.rds
│ │ ├── data_versions.csv
│ │ ├── metadata.csv
│ │ ├── programs.txt
│ │ ├── project-summary.yaml
│ │ ├── tagcounts-dupes.mtx
│ │ ├── tagcounts-dupes.mtx.colnames
│ │ ├── tagcounts-dupes.mtx.rownames
│ │ ├── tagcounts.mtx
│ │ ├── tagcounts.mtx.colnames
│ │ └── tagcounts.mtx.rownames
│ ├── lane1-AGCTTTCT
│ │ ├── lane1-AGCTTTCT-barcodes-filtered.tsv
│ │ ├── lane1-AGCTTTCT-barcodes.tsv
│ │ ├── lane1-AGCTTTCT.mtx
│ │ ├── lane1-AGCTTTCT.mtx.colnames
│ │ ├── lane1-AGCTTTCT.mtx.rownames
│ │ └── lane1-AGCTTTCT-transcriptome.bam
│ ├── lane2-AAGAGCGT
│ │ ├── lane2-AAGAGCGT-barcodes-filtered.tsv
│ │ ├── lane2-AAGAGCGT-barcodes.tsv
│ │ ├── lane2-AAGAGCGT.mtx
│ │ ├── lane2-AAGAGCGT.mtx.colnames
│ │ ├── lane2-AAGAGCGT.mtx.rownames
│ │ └── lane2-AAGAGCGT-transcriptome.bam
│ └── mtx.tar.gz
└── sc-human.csv
Edit both to reflect the properties of your consult, making sure to have the final upload dir point to an appropriate folder
details:
- analysis: RNA-seq
genome_build: BDGP6
algorithm:
aligner: star
quality_format: Standard
trim_reads: False
adapters: [truseq, polya]
strandedness: unstranded
upload:
dir: /n/data1/cores/bcbio/PIs/mel_feany/RNAseq_of_different_genotypes_in_Drosophila_brain/bcbio/final
#!/bin/sh
#SBATCH -p short
#SBATCH -J feany
#SBATCH -o run.o
#SBATCH -e run.e
#SBATCH -t 0-12:00
#SBATC H --cpus-per-task=1
#SBATCH --mem=8000
#SBATCH --mail-type=ALL
#SBATCH --mail-user=jnhutchinson@gmail.com
export PATH=/n/app/bcbio/tools/bin:$PATH
/n/app/bcbio/dev/anaconda/bin/bcbio_nextgen.py ../config/bcbio_ensembl.yaml -n 72 -t ipython -s slurm -q short -r t=0-6000:00 --tag feany
It’s a good idea to get bcbio to use a work directory that is on scratch. One way to do this is to run bcbio’s templating script and then replace the work folder with a symlink to a folder on scratch. keep your work direcotyr variant has the nice feature of automatically putting the working directory on the scratch drive, so that our storage space doesn’t blow up
Another more convoluted route is just to run everything on scratch and copy over the final files:
date_projectname) (this folder will be the dated subfolder in the final subfolder which is a subfolder of a folder named after the bcbio metadata file you made above, for a better explanation, try looking at the directory structure above) # Template for mouse RNA-seq using Illumina prepared samples
---
details:
- analysis: RNA-seq
genome_build: mm10
algorithm:
aligner: star
quality_format: standard
strandedness: unstranded
tools_on: bcbiornaseq
bcbiornaseq:
organism: mus musculus
interesting_groups: [day, genotype]
upload:
dir: ../bcbio_final
$pifirsthane_$pilastname)You can share via Dropbox by either:
1) using code within r2dropSmart::sync function (install from github lpantano/r2dropSmart)
do the next two steps once per project directory
library(rdrop2)
drop_auth()
once completed, close your browser window and return to R to complete authentication the credentials are automatically cached (you can prevent this if you’d like, see the rdrop2 do) for future use
If you wish to save the tokens, for local/remote use:
token <- drop_auth()
saveRDS(token, file = "token.rds")
To sync your results
library(r2dropSmart)
token <- readRDS("~/.droptoken.rds")
d = drop_acc(dtoken = token)
dropdir = "HBC Team Folder (1)/Consults/firstname_lastname/ $technology_of_$intervention_on_$tissue_in_$organism_$hbccode
sync(".", remote = dropdir, token = token, pattern = ".html")
sync(".", remote = dropdir, token = token, blacklist = c(“Rproj”, “rda”, ...))
2) By hand, by zipping up results and copying them to the appropriate folder on Dropbox. It’s a good idea to always include the code you used for the results, as well as any linked results [FUTURE: NEED DISCUSSION]
We don’t use Dropbox to share bcbio results, BAM files, fastqs, or bcbio objects. If people needs those, we point them to the server or have them come with harddrive.
Analysis
config
metadata
docs
templates
README
reports (code goes to GIT REPO, DROPBOX IF YOU WANT)
RMD (go to DROPBOX)
HTML (go to DROPBOX)
Data (R objects that you don’t want to sync to any place)
Results (go to DROPBOX) dropn[#FUTURE: NEED DISCUSSION]