reproduce_setup/pgx-main at 595fb6478d443f6f72b2bdc309e87ea5b9519529 - reproduce_setup - Gitea: Git with a cup of tea

darren/reproduce_setup

Files

History

Darren Wight 595fb6478d remove the results data

2025-08-18 13:12:53 +02:00

..

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

stellar_references

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

.dockstore.yml

pgx-main from prod added

2025-08-18 13:09:30 +02:00

combine_outside_calls.py

pgx-main from prod added

2025-08-18 13:09:30 +02:00

copy_allele_file.py

pgx-main from prod added

2025-08-18 13:09:30 +02:00

create_overview.py

pgx-main from prod added

2025-08-18 13:09:30 +02:00

download_files.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

example_pgx_diplotypes_rsids.tsv

pgx-main from prod added

2025-08-18 13:09:30 +02:00

genes.txt

pgx-main from prod added

2025-08-18 13:09:30 +02:00

get_pgx_result_dev.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

get_pgx_result_no_dl.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

get_pgx_result.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

LICENSE

pgx-main from prod added

2025-08-18 13:09:30 +02:00

main.bac.nf

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

main.nf

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

nat2.nf

pgx-main from prod added

2025-08-18 13:09:30 +02:00

nextflow.config

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx_diplotypes_rsids.tsv

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

pgx_fulgent_panel.tsv

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx_update_show.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

pgx_update.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

README.md

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

run_stellar.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

stellar_parser.py

pgx-main from prod added

2025-08-18 13:09:30 +02:00

test3.bed

pgx-main from prod added

2025-08-18 13:09:30 +02:00

test_samples.txt

pgx-main from prod added

2025-08-18 13:09:30 +02:00

upload_results.sh

pgx-main from prod added

2025-08-18 13:09:30 +02:00

var_call.nf

pgx-main from experiment server added

2025-08-18 13:12:15 +02:00

README.md

Set up PGx server

Install python 3.8 (incl.pandas), docker, aws, java, samtools

Clone this and additional repos

git clone git@bitbucket.org:quantgene/pgx-engine-wrapper.git
git clone git@bitbucket.org:quantgene/pgx-engine.git
git clone https://github.com/SBIMB/StellarPGx.git pgx-main

Prepare main directory

cp -r pgx-engine-wrapper/* pgx-main/
mkdir pgx-main/pgx_results
rm pgx-main/data/*

Update the Chr definitions in main.nf

Update all Chr<#> to just <#> in main.nf (e.g. Chr1 should be 1). Important are lines 226-230 in main.nf for cyp2d6.

    if (params.gene=='cyp2d6') {
        chrom = "22"
    	region_a1 = "22:42126000-42137500"
    	region_a2 = "042126000-042137500"
    	region_b1 = "22:42126300-42132400"
    	region_b2 = "042126300-042132400"
	transcript = "ENST00000645361"

Place fasta files into pgx-main directory

(hg38.fa, hg38.fa.fai). For example:

scp root@139.162.190.87:~/ajit/resources/hg38\* pgx-main/

Install nexflow

curl -fsSL get.nextflow.io | bash

Move the nextflow launcher (installed in your current directory) to a directory in your $PATH e.g. /bin

mv nextflow /bin

(The full Nextflow documentation can be found here)

Docker: StellarPGx

docker pull twesigomwedavid/stellarpgx-dev:latest

Docker: PGx Engine

cd pgx-engine
docker build -t pgx .
docker run -d -p 5000:5000 --name pgx-api pgx
docker stop pgx-api
cd ..

Run Entire PGx Pipeline

Change directory: cd pgx-main

The main script to run is: get_pgx_result.sh. The input can be either a text file or standard input: each line should contain an s3 path to a vcf file. The location of the corresponding bam file is inferred. The output folders will all be in pgx_results/. A diplotype/rsid overview is given in pgx_diplotypes_rsids.tsv. The output.json for each sample is stored in the same s3 location as the vcf with the filename <sample.json>. The entire pgx output for a sample is uploaded to s3 in the same location as the bam file with the name <sample>_pgx_result.tar.gz.