pg-main from prod server added

2025-08-18 12:03:55 +02:00
parent add456f0e7
commit f66cd01b21
956 changed files with 934400 additions and 0 deletions
--- a/pgx-main/README.md
+++ b/pgx-main/README.md
@@ -0,0 +1,75 @@
+## Set up PGx server
+
+### Install python 3.8 (incl.pandas), docker, aws, java, samtools
+
+### Clone this and additional repos
+```bash
+git clone git@bitbucket.org:quantgene/pgx-engine-wrapper.git
+git clone git@bitbucket.org:quantgene/pgx-engine.git
+git clone https://github.com/SBIMB/StellarPGx.git pgx-main
+```
+
+### Prepare main directory
+```bash
+cp -r pgx-engine-wrapper/* pgx-main/
+mkdir pgx-main/pgx_results
+rm pgx-main/data/*
+```
+
+### Update the Chr definitions in main.nf
+Update all Chr<#> to just <#> in `main.nf` (e.g. Chr1 should be 1). Important are lines 226-230 in `main.nf` for cyp2d6.
+
+```groovy
+    if (params.gene=='cyp2d6') {
+        chrom = "22"
+    	region_a1 = "22:42126000-42137500"
+    	region_a2 = "042126000-042137500"
+    	region_b1 = "22:42126300-42132400"
+    	region_b2 = "042126300-042132400"
+	transcript = "ENST00000645361"
+```
+
+### Place fasta files into pgx-main directory
+(`hg38.fa`, `hg38.fa.fai`). For example:
+```bash
+scp root@139.162.190.87:~/ajit/resources/hg38\* pgx-main/
+```
+
+### Install nexflow
+
+```bash
+curl -fsSL get.nextflow.io | bash
+```
+
+Move the `nextflow` launcher (installed in your current directory) to a directory in your `$PATH` e.g. `/bin`
+
+```bash
+mv nextflow /bin
+```
+
+(The full Nextflow documentation can be found [here](https://www.nextflow.io))
+
+
+### Docker: StellarPGx
+```bash
+docker pull twesigomwedavid/stellarpgx-dev:latest
+```
+
+### Docker: PGx Engine
+```bash
+cd pgx-engine
+docker build -t pgx .
+docker run -d -p 5000:5000 --name pgx-api pgx
+docker stop pgx-api
+cd ..
+```
+
+## Run Entire PGx Pipeline
+
+Change directory: `cd pgx-main`
+
+The main script to run is: `get_pgx_result.sh`.
+The input can be either a text file or standard input: each line should contain an s3 path to a vcf file. The location of the corresponding bam file is inferred.
+The output folders will all be in `pgx_results/`. A diplotype/rsid overview is given in `pgx_diplotypes_rsids.tsv`.
+The `output.json` for each sample is stored in the same s3 location as the vcf with the filename `<sample.json>`.
+The entire pgx output for a sample is uploaded to s3 in the same location as the bam file with the name `<sample>_pgx_result.tar.gz`.