changeset 0:67c179acafdd draft default tip

"planemo upload for repository https://github.com/usegalaxy-au/galaxy-local-tools commit a510e97ebd604a5e30b1f16e5031f62074f23e86-dirty"
author galaxy-australia
date Thu, 03 Mar 2022 02:54:20 +0000
parents
children
files README.rst alphafold.html alphafold.xml gen_extra_outputs.py static/img/alphafold-visualization.png static/img/alphafold_runtime_graph.png test-data/test1.fasta validate_fasta.py
diffstat 8 files changed, 1405 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst	Thu Mar 03 02:54:20 2022 +0000
@@ -0,0 +1,164 @@
+Alphafold compute setup
+=======================
+
+Overview
+--------
+
+Alphafold requires a customised compute environment to run. The machine
+needs a GPU, and access to a 2.2 Tb reference data store.
+
+This document is designed to provide details on the compute environment
+required for Alphafold operation, and the Galaxy job destination
+settings to run the wrapper.
+
+For full details on Alphafold requirements, see
+https://github.com/deepmind/alphafold.
+
+HARDWARE
+~~~~~~~~
+
+The machine is recommended to have the following specs: - 12 cores - 80
+Gb RAM - 2.5 Tb storage - A fast Nvidia GPU.
+
+As a minimum, the Nvidia GPU must have 8Gb RAM. It also requires
+**unified memory** to be switched on. Unified memory is usually enabled
+by default, but some HPC systems will turn it off so the GPU can be
+shared between multiple jobs concurrently.
+
+ENVIRONMENT
+~~~~~~~~~~~
+
+This wrapper runs Alphafold as a singularity container. The following
+software are needed:
+
+-  `Singularity <https://sylabs.io/guides/3.0/user-guide/installation.html>`_
+-  `NVIDIA Container
+   Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html>`_
+
+As Alphafold uses an Nvidia GPU, the NVIDIA Container Toolkit is needed.
+This makes the GPU available inside the running singularity container.
+
+To check that everything has been set up correctly, run the following
+
+::
+
+   singularity run --nv docker://nvidia/cuda:11.0-base nvidia-smi
+
+If you can see something similar to this output (details depend on your
+GPU), it has been set up correctly.
+
+::
+
+   +-----------------------------------------------------------------------------+
+   | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
+   |-------------------------------+----------------------+----------------------+
+   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+   |                               |                      |               MIG M. |
+   |===============================+======================+======================|
+   |   0  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
+   | N/A   49C    P0    28W /  70W |      0MiB / 15109MiB |      0%      Default |
+   |                               |                      |                  N/A |
+   +-------------------------------+----------------------+----------------------+
+
+   +-----------------------------------------------------------------------------+
+   | Processes:                                                                  |
+   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+   |        ID   ID                                                   Usage      |
+   |=============================================================================|
+   |  No running processes found                                                 |
+   +-----------------------------------------------------------------------------+
+
+REFERENCE DATA
+~~~~~~~~~~~~~~
+
+Alphafold needs reference data to run. The wrapper expects this data to
+be present at ``/data/alphafold_databases``. To download, run the
+following shell script command in the tool directory.
+
+::
+
+   # make folders if needed
+   mkdir /data /data/alphafold_databases
+
+   # download ref data
+   bash scripts/download_all_data.sh /data/alphafold_databases
+
+This will install the reference data to ``/data/alphafold_databases``.
+To check this has worked, ensure the final folder structure is as
+follows:
+
+::
+
+   data/alphafold_databases
+   ├── bfd
+   │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata
+   │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex
+   │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata
+   │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex
+   │   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata
+   │   └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex
+   ├── mgnify
+   │   └── mgy_clusters_2018_12.fa
+   ├── params
+   │   ├── LICENSE
+   │   ├── params_model_1.npz
+   │   ├── params_model_1_ptm.npz
+   │   ├── params_model_2.npz
+   │   ├── params_model_2_ptm.npz
+   │   ├── params_model_3.npz
+   │   ├── params_model_3_ptm.npz
+   │   ├── params_model_4.npz
+   │   ├── params_model_4_ptm.npz
+   │   ├── params_model_5.npz
+   │   └── params_model_5_ptm.npz
+   ├── pdb70
+   │   ├── md5sum
+   │   ├── pdb70_a3m.ffdata
+   │   ├── pdb70_a3m.ffindex
+   │   ├── pdb70_clu.tsv
+   │   ├── pdb70_cs219.ffdata
+   │   ├── pdb70_cs219.ffindex
+   │   ├── pdb70_hhm.ffdata
+   │   ├── pdb70_hhm.ffindex
+   │   └── pdb_filter.dat
+   ├── pdb_mmcif
+   │   ├── mmcif_files
+   │   └── obsolete.dat
+   ├── uniclust30
+   │   └── uniclust30_2018_08
+   └── uniref90
+       └── uniref90.fasta
+
+JOB DESTINATION
+~~~~~~~~~~~~~~~
+
+Alphafold needs a custom singularity job destination to run. The
+destination needs to be configured for singularity, and some extra
+singularity params need to be set as seen below.
+
+Specify the job runner. For example, a local runner
+
+::
+
+   <plugin id="alphafold_runner" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner"/>
+
+Customise the job destination with required singularity settings. The
+settings below are mandatory, but you may include other settings as
+needed.
+
+::
+
+   <destination id="alphafold" runner="alphafold_runner">
+       <param id="dependency_resolution">'none'</param>
+       <param id="singularity_enabled">true</param>
+       <param id="singularity_run_extra_arguments">--nv</param>
+       <param id="singularity_volumes">"$job_directory:ro,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw,/data/alphafold_databases:/data:ro"</param>
+   </destination>
+
+Closing
+~~~~~~~
+
+If you are experiencing technical issues, feel free to write to
+help@genome.edu.au. We may be able to provide advice on setting up
+Alphafold on your compute environment.
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/alphafold.html	Thu Mar 03 02:54:20 2022 +0000
@@ -0,0 +1,656 @@
+<!DOCTYPE html>
+<html lang="en" dir="ltr">
+
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+
+    <title> Alphafold structure prediction </title>
+
+    <link rel="preconnect" href="https://fonts.googleapis.com">
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+    <link href="https://fonts.googleapis.com/css2?family=Ubuntu:wght@300;400;500;700&display=swap" rel="stylesheet">
+    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/chroma-js/2.1.0/chroma.min.js" integrity="sha512-yocoLferfPbcwpCMr8v/B0AB4SWpJlouBwgE0D3ZHaiP1nuu5djZclFEIj9znuqghaZ3tdCMRrreLoM8km+jIQ==" crossorigin="anonymous"></script>
+
+    <style type="text/css">
+      * {
+        margin: 0;
+        padding: 0;
+      }
+      html, body {
+        width: 100%;
+        font-size: 1rem;
+      }
+      body {
+        font-family: 'Ubuntu', sans-serif;
+      }
+      canvas {
+        background-color: white;
+      }
+      h1, h2, h3, h4, h5, h6 {
+        color: dodgerblue;
+        text-align: center;
+        font-weight: lighter;
+      }
+      h1 {
+        margin: 2rem;
+        font-size: 3rem;
+      }
+      h2 {
+        font-size: 2rem;
+        margin-top: 1rem;
+        margin-bottom: .5rem;
+      }
+      button.btn {
+        color: #ccc;
+        margin: 1rem;
+        padding: .5rem;
+        font-size: 1rem;
+        min-width: 4rem;
+        border: none;
+        border-radius: .5rem;
+        background-color: grey;
+        transition-duration: 0.25s;
+        cursor: pointer;
+      }
+      button.btn.selected {
+        color: #eee;
+        background-color: dodgerblue;
+      }
+      button.btn.green {
+        color: #eee;
+        background-color: #10941f;
+      }
+      button.btn:focus {
+        outline: none;
+        color: inherit;
+      }
+      button.btn:hover {
+        color: white;
+        box-shadow: 0 0 10px dodgerblue;
+      }
+      button.btn.green:hover {
+        color: white;
+        box-shadow: 0 0 10px limegreen;
+      }
+      .main {
+        min-height: 90vh;
+        position: relative;
+      }
+      .flex {
+        display: flex;
+        justify-content: center;
+        align-items: center;
+        padding: 1rem;
+      }
+      .col {
+        flex-direction: column;
+        flex-grow: 0;
+      }
+      .controls {
+        padding-bottom: 10vh;
+      }
+      .box {
+        padding: .5rem 1rem;
+        margin: .5rem auto;
+        width: fit-content;
+        border-radius: 1rem;
+      }
+      .mono {
+        font-family: monospace;
+        color: #555;
+        background-color: #ddd;
+        padding: .25rem;
+        border-radius: .25rem;
+      }
+      .space-1 {
+        line-height: 1.2;
+      }
+      .space-2 {
+        line-height: 1.5;
+      }
+      .relative {
+        position: relative;
+      }
+      .legend {
+        max-width: 350px;
+      }
+      .legend .scale {
+        display: flex;
+        flex-direction: column;
+        align-items: center;
+      }
+      .legend .color {
+        width: 150px;
+        height: 30px;
+        justify-content: space-between;
+        background: linear-gradient(
+          90deg,
+          rgba(255,55,0,1)   0%,
+          rgba(216,224,6,1)  33%,
+          rgba(34,213,238,1) 66%,
+          rgba(3,30,148,1)   100%
+          );
+      }
+      .legend .ticks {
+        margin-top: -1rem;
+        width: 180px;
+        justify-content: space-between;
+      }
+      #ngl-root-parent {
+        width: 40vw;
+        height: 30vw;
+        margin: auto;
+        position: relative;
+      }
+      #ngl-root {
+        width: 40vw;
+        height: 30vw;
+        border-radius: 15px;
+        border: 1px solid grey;
+      }
+      #ngl-nothing {
+        position: absolute;
+        top: 0;
+        left: 0;
+        display: none;
+        text-align: center;
+        width: 40vw;
+        height: 30vw;
+        padding-top: 12vw;
+      }
+      #ngl-loading {
+        position: absolute;
+        top: 0;
+        left: 0;
+        display: flex;
+        justify-content: center;
+        align-items: center;
+        width: 800px;
+        height: 600px;
+        width: 40vw;
+        height: 30vw;
+      }
+      #ngl-loading svg {
+        width: 30%;
+        height: 30%;
+        width: 10vw;
+        height: 10vw;
+      }
+
+      /* Responsive */
+      @media (max-width: 1400px) {
+        :root {
+          font-size: 10pt;
+        }
+        button.btn {
+          margin: .5rem;
+          padding: .25rem;
+        }
+        .box {
+          padding: .5rem;
+          margin: .5rem auto;
+        }
+        .legend {
+          max-width: 200px;
+        }
+        .help-text {
+          font-size: 0.8rem;
+        }
+        .mono {
+          padding: .25rem .5rem;
+        }
+      }
+      @media (max-width: 1000px) {
+        :root {
+          font-size: 8pt;
+        }
+      }
+      @media (max-width: 800px) {
+        :root {
+          font-size: 6pt;
+        }
+      }
+    </style>
+
+    <script src="https://cdn.rawgit.com/arose/ngl/v2.0.0-dev.37/dist/ngl.js"></script>
+  </head>
+
+
+  <body>
+    <h1> Alphafold structure prediction </h1>
+
+    <div class="main flex">
+      <div class="col relative">
+        <div id="ngl-root-parent">
+
+          <div id="ngl-root"></div>
+
+          <div id="ngl-nothing">
+            Select a representation to display
+          </div>
+
+          <div id="ngl-loading">
+            <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="margin: auto; background: none; display: block; shape-rendering: auto;" width="200px" height="200px" viewBox="0 0 100 100" preserveAspectRatio="xMidYMid">
+              <g transform="rotate(0 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.9166666666666666s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(30 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.8333333333333334s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(60 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.75s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(90 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.6666666666666666s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(120 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.5833333333333334s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(150 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.5s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(180 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.4166666666666667s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(210 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.3333333333333333s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(240 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.25s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(270 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.16666666666666666s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(300 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="-0.08333333333333333s" repeatCount="indefinite"></animate>
+                </rect>
+              </g><g transform="rotate(330 50 50)">
+                <rect x="47" y="24" rx="3" ry="6" width="6" height="12" fill="#88879e">
+                  <animate attributeName="opacity" values="1;0" keyTimes="0;1" dur="1s" begin="0s" repeatCount="indefinite"></animate>
+                </rect>
+              </g>
+            </svg>
+          </div>
+        </div>
+
+        <div class="flex">
+          <div class="box space-1">
+            <p>
+              <span class="mono">Scroll up/down</span>
+              to zoom in and out
+            </p>
+            <p>
+              <span class="mono">Click + drag</span>
+              to rotate the structure
+            </p>
+            <p>
+              <span class="mono">CTRL + click + drag</span>
+              to move the structure
+            </p>
+            <p>
+              <span class="mono">Click</span>
+              an atom to bring it into focus
+            </p>
+          </div>
+
+          <div class="box legend">
+            <div class="scale">
+              <div class="color"></div>
+              <div class="flex ticks">
+                <div>&lt;50</div>
+                <div>70</div>
+                <div>90+</div>
+              </div>
+            </div>
+
+            <div>
+              <p class="text-center">
+                <small>
+                Alphafold produces a
+                <a href="https://alphafold.ebi.ac.uk/faq#faq-5" target="_blank">
+                  per-residue confidence score (pLDDT)
+                </a>
+                between 0 and 100. Some regions below 50 pLDDT may be
+                unstructured in isolation.
+              </small>
+              </p>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      <div class="flex col controls">
+        <div class="box text-center">
+          <h3> Select model </h3>
+          <p>The top five structures predicted by Alphafold</p>
+          <div>
+            <button class="btn selected" id="btn-ranked_0" onclick="setModel(0);">
+              Model 1
+            </button>
+
+            <button class="btn" id="btn-ranked_1" onclick="setModel(1);">
+              Model 2
+            </button>
+
+            <button class="btn" id="btn-ranked_2" onclick="setModel(2);">
+              Model 3
+            </button>
+
+            <button class="btn" id="btn-ranked_3" onclick="setModel(3);">
+              Model 4
+            </button>
+
+            <button class="btn" id="btn-ranked_4" onclick="setModel(4);">
+              Model 5
+            </button>
+          </div>
+        </div>
+
+        <div class="box text-center">
+          <h3> Toggle representations </h3>
+          <div>
+            <button class="btn selected" id="btn-cartoon" onclick="toggleModelRepresentation('cartoon');">
+              Cartoon
+            </button>
+
+            <button class="btn" id="btn-ball-stick" onclick="toggleModelRepresentation('ball+stick');">
+              Ball + stick
+            </button>
+
+            <button class="btn" id="btn-surface" onclick="toggleModelRepresentation('surface');">
+              Surface
+            </button>
+
+            <button class="btn" id="btn-backbone" onclick="toggleModelRepresentation('backbone');">
+              Backbone
+            </button>
+          </div>
+        </div>
+
+        <div class="box text-center">
+          <h3> Actions </h3>
+          <div>
+            <button class="btn selected" id="btn-toggle-spin" onclick="toggleSpin();">
+              Toggle spin
+            </button>
+
+            <button class="btn" id="btn-toggle-dark" onclick="toggleDark();">
+              Dark mode
+            </button>
+          </div>
+        </div>
+
+        <div class="box text-center">
+          <h3> Download </h3>
+          <div>
+            <button class="btn green" onclick="downloadPng();">
+              Snapshot
+            </button>
+
+            <button class="btn green" onclick="downloadPdb();">
+              PDB
+            </button>
+          </div>
+        </div>
+      </div>
+    </div>
+  </body>
+
+
+  <script type="text/javascript">
+
+    // Render NGLviewer for PDB files
+
+    // State management has been implemented with vanilla Js but could have used
+    // Vue - it's a fairly simple use case so a global 'state' object works fine
+    // without complicating things too much.
+
+
+    // Define a custom color scheme to represent model confidence consistently
+    // across different representations
+    // ------------------------------------------------------------------------
+    const colorScale = chroma.scale([
+      'red', 'yellow', 'green', 'cyan', 'blue'
+    ]).mode('lab').domain([0, 0.9]);
+
+    const confidenceScheme = NGL.ColormakerRegistry.addScheme(function (params) {
+      this.atomColor = function (atom) {
+        // Actually model confidence (pLDDT)
+        const c = atom.bfactor;
+        const BREAK_RED = 40;   // Below this is just plain red
+        let range, r, g, b;
+
+        if (c < BREAK_RED) {
+          return 0xFF0000;
+        }
+        const p = (c - BREAK_RED) / (100 - BREAK_RED)
+        return eval(colorScale(p).hex().replace('#', '0x'));
+      };
+    });
+
+    // NGL color schemes https://nglviewer.org/ngl/api/manual/usage/coloring.html
+    const COLORSCHEME = confidenceScheme;  //'bfactor'
+
+    const MODELS = [
+      'ranked_0.pdb',
+      'ranked_1.pdb',
+      'ranked_2.pdb',
+      'ranked_3.pdb',
+      'ranked_4.pdb',
+    ]
+
+    const REPRESENTATIONS = [
+      'cartoon',
+      'ball+stick',
+      'surface',
+      'backbone',
+    ]
+
+    const DEFAULT_REPRESENTATION = REPRESENTATIONS[0];
+    const MAX_CLICK_INTERVAL_MS = 500;  // For debouncing model clicks
+
+    let stage;
+    let nonceSetModel;
+
+    let state = {
+      model: 0,
+      modelObject: null,
+      representations: {},
+      colorScheme: 'bfactor',
+      darkMode: false,
+      loading: 1,
+      spin: true,
+    }
+
+    const uri = (i) => MODELS[i];
+    // Switch to this function to return sample model URI (local dev)
+    // const uri = (i) => `https://raw.githubusercontent.com/neoformit/alphafold-galaxy/main/data/${MODELS[i]}`;
+
+    document.addEventListener("DOMContentLoaded", async function () {
+      // Can set debug for development if NGL is being... funny
+      // NGL.setDebug(true)
+
+      // Create NGL Stage object
+      stage = new NGL.Stage("ngl-root", { backgroundColor: 'white' });
+
+      // Handle window resizing
+      window.addEventListener("resize",  () => stage.handleResize());
+
+      loadModel();
+      while (true) {
+        if (!state.loading) {
+          // Reload page if NGL failed to display. Weird occassional bug.
+          const canvas = document.querySelector('#ngl-root canvas');
+          canvas.height < 50 && window.reload();
+          break
+        }
+        await new Promise(resolve => setTimeout(resolve, 500));
+      }
+    });
+
+    // Models ------------------------------------------------------------------
+
+    const setModel = (ix) => {
+      state.model = ix;
+      stage.removeComponent(state.modelObject);
+      setLoading(1);
+
+      // Debounce rapid model clicking with a nonce
+      nonceSetModel = new Object();
+      const localNonce = nonceSetModel;
+      setTimeout( () => {
+        if (localNonce === nonceSetModel) {
+          // The user has stopped clicking, hurray...
+          loadModel().then(updateButtons);
+        }
+      }, MAX_CLICK_INTERVAL_MS);
+    }
+
+    const loadModel = () => {
+      reps = Object.keys(state.representations);
+      if (reps.length) {
+        state.representations = {};
+      } else {
+        reps = [DEFAULT_REPRESENTATION];
+      }
+
+      // Load PDB entry
+      return stage.loadFile(uri(state.model)).then( (o) => {
+        state.modelObject = o;
+        reps.forEach( (r) => addModelRepresentation(r) );
+        stage.setSpin(state.spin);
+        o.autoView();
+        setLoading(0);
+      })
+    }
+
+    // Representations ---------------------------------------------------------
+
+    const toggleModelRepresentation = (rep) => {
+      rep in state.representations ?
+        removeModelRepresentation(rep)
+        : addModelRepresentation(rep)
+    }
+
+    const addModelRepresentation = (rep) => {
+      state.representations[rep] =
+        state.modelObject.addRepresentation(rep, {colorScheme: COLORSCHEME});
+      updateButtons();
+    }
+
+    const removeModelRepresentation = (rep) => {
+      o = state.representations[rep];
+      state.modelObject.removeRepresentation(o);
+      delete state.representations[rep];
+      updateButtons();
+    }
+
+    const clearModelRepresentations = () => {
+      state.modelObject && state.modelObject.removeAllRepresentations();
+      state.representations = {};
+    }
+
+    // Actions -----------------------------------------------------------------
+
+    const toggleDark = () => {
+      state.darkMode = !state.darkMode;
+      stage.setParameters({
+        backgroundColor: state.darkMode ? 'black' : 'white',
+      });
+      const btn = document.querySelector('#btn-toggle-dark');
+      btn && btn.classList.toggle('selected');
+    }
+
+    const setLoading = (state) => {
+      document.getElementById('ngl-loading')
+        .style.display = state ? 'flex' : 'none';
+      state.loading = state;
+    }
+
+    const toggleSpin = () => {
+      stage.toggleSpin();
+      const btn = document.querySelector('#btn-toggle-spin');
+      btn && btn.classList.toggle('selected');
+      state.spin = !state.spin;
+    }
+
+    const downloadPng = () => {
+      const params = {
+        factor: 3,
+        antialias: true,
+      }
+      stage.makeImage(params).then( (blob) => {
+        const name = MODELS[state.model].replace('.pdb', '.png');
+        const url = URL.createObjectURL(blob);
+        makeDownload(url, name);
+      })
+    }
+
+    const downloadPdb = () => {
+      const url = uri(state.model);
+      const name = `alphafold_${MODELS[state.model]}`;
+      makeDownload(url, name);
+    }
+
+    const makeDownload = (url, name) => {
+      // Will not work with cross-origin urls (i.e. during development)
+      console.log(`Creating file download for ${name}, href ${url}`);
+      const saveLink = document.createElement('a');
+      saveLink.href = url;
+      saveLink.download = name;
+      document.body.appendChild(saveLink);
+      saveLink.dispatchEvent(
+        new MouseEvent('click', {
+          bubbles: true,
+          cancelable: true,
+          view: window
+        })
+      );
+      document.body.removeChild(saveLink);
+    }
+
+    const updateButtons = () => {
+      MODELS.forEach( (name, i) => {
+        const id = `#btn-${name.replace('.pdb', '')}`;
+        const btn = document.querySelector(id);
+        if (!btn) return
+        i == state.model ?
+          btn.classList.add('selected')
+          : btn.classList.remove('selected');
+      })
+
+      REPRESENTATIONS.forEach( (name) => {
+        const id = `#btn-${name}`.replace('+', '-');
+        const btn = document.querySelector(id);
+        if (!btn) return
+        if (name in state.representations) {
+          btn.classList.add('selected')
+        } else {
+          btn.classList.remove('selected');
+        }
+      });
+
+      // Show "Nothing to display" if no representations are selected
+      document.querySelector('#ngl-nothing').style.display =
+        Object.keys(state.representations).length ?
+        'none'
+        : 'block';
+    }
+
+  </script>
+
+</html>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/alphafold.xml	Thu Mar 03 02:54:20 2022 +0000
@@ -0,0 +1,250 @@
+<tool id="alphafold" name="alphafold" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.01">
+    <description>Alphafold v2.0: AI-guided 3D structure prediction of proteins</description>
+    <macros>
+      <token name="@TOOL_VERSION@">2.0.0</token>
+      <token name="@VERSION_SUFFIX@">0</token>
+    </macros>
+    <edam_topics>
+      <edam_topic>topic_0082</edam_topic>
+    </edam_topics>
+    <edam_operations>
+      <edam_operation>operation_0474</edam_operation>
+    </edam_operations>
+    <xrefs>
+      <xref type="bio.tools">alphafold_2.0</xref>
+    </xrefs>
+    <requirements>
+        <container type="docker">neoformit/alphafold:latest</container>
+    </requirements>
+    <command detect_errors="exit_code"><![CDATA[
+
+## $ALPHAFOLD_DB variable should point to the location of the AlphaFold
+## databases - defaults to /data
+
+## fasta setup ----------------------------
+#if $fasta_or_text.input_mode == 'history':
+    cp '$fasta_or_text.fasta_file' input.fasta &&
+
+#elif $fasta_or_text.input_mode == 'textbox':
+    echo '$fasta_or_text.fasta_text' > input.fasta &&
+#end if
+
+python3 '$__tool_directory__/validate_fasta.py' input.fasta &&
+
+## env vars -------------------------------
+export TF_FORCE_UNIFIED_MEMORY=1 &&
+export XLA_PYTHON_CLIENT_MEM_FRACTION=4.0 &&
+export DATE=`date +"%Y-%m-%d"` &&
+
+## run alphafold  -------------------------
+python /app/alphafold/run_alphafold.py
+--fasta_paths alphafold.fasta
+--output_dir output
+--data_dir \${ALPHAFOLD_DB:-/data}
+--uniref90_database_path \${ALPHAFOLD_DB:-/data}/uniref90/uniref90.fasta
+--mgnify_database_path \${ALPHAFOLD_DB:-/data}/mgnify/mgy_clusters_2018_12.fa
+--pdb70_database_path \${ALPHAFOLD_DB:-/data}/pdb70/pdb70
+--template_mmcif_dir \${ALPHAFOLD_DB:-/data}/pdb_mmcif/mmcif_files
+--obsolete_pdbs_path \${ALPHAFOLD_DB:-/data}/pdb_mmcif/obsolete.dat
+--max_template_date=\$DATE
+--bfd_database_path \${ALPHAFOLD_DB:-/data}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
+--uniclust30_database_path \${ALPHAFOLD_DB:-/data}/uniclust30/uniclust30_2018_08/uniclust30_2018_08
+--use_gpu_relax=True
+&&
+
+## Uncomment for "dummy run" - skip alphafold run and read output from test-data
+## cp -r '$__tool_directory__/output' . &&
+
+## Generate additional outputs ------------
+python3 '$__tool_directory__/gen_extra_outputs.py' output/alphafold $output_plddts &&
+
+## HTML output
+mkdir -p '${ html.files_path }' &&
+cp '$__tool_directory__/alphafold.html' '${html}' &&
+cp output/alphafold/ranked_*.pdb '${html.files_path}'
+
+    ]]></command>
+    <inputs>
+        <conditional name="fasta_or_text">
+            <param name="input_mode" type="select" label="Fasta Input" help="Single protein sequence to fold. Input can be fasta file from history, or text. Provide only 1 sequence per job.">
+                <option value="history">Use fasta from history</option>
+                <option value="textbox">Paste sequence into textbox</option>
+            </param>
+            <when value="history">
+                <param name="fasta_file" type="data" format="fasta" label="Fasta file from history" help="Select single fasta protein sequence from your history. If you wish to fold multiple proteins, submit an individual job for each protein." />
+            </when>
+            <when value="textbox">
+                <param name="fasta_text" type="text" area="true" value="" label="Paste sequence" help="Paste single protein sequence into the textbox. If you wish to fold multiple proteins, submit individual jobs for each protein." />
+            </when>
+        </conditional>
+        <param name="output_plddts" type="boolean" checked="false" label="Output per-residue confidence scores" truevalue="--plddts" falsevalue="" help="Alphafold produces a pLDDT score between 0-100 for each residue in the folded models. High scores represent high confidence in placement for the residue, while low scoring residues have lower confidence. Sections of low confidence often occur in disordered regions. " />
+    </inputs>
+    <outputs>
+        <data name="model5" format="pdb" from_work_dir="output/alphafold/ranked_4.pdb" label="${tool.name} on ${on_string}: Model 5"/>
+        <data name="model4" format="pdb" from_work_dir="output/alphafold/ranked_3.pdb" label="${tool.name} on ${on_string}: Model 4"/>
+        <data name="model3" format="pdb" from_work_dir="output/alphafold/ranked_2.pdb" label="${tool.name} on ${on_string}: Model 3"/>
+        <data name="model2" format="pdb" from_work_dir="output/alphafold/ranked_1.pdb" label="${tool.name} on ${on_string}: Model 2"/>
+        <data name="model1" format="pdb" from_work_dir="output/alphafold/ranked_0.pdb" label="${tool.name} on ${on_string}: Model 1"/>
+        <data name="confidence_scores" format="tsv" from_work_dir="output/alphafold/model_confidence_scores.tsv" label="${tool.name} on ${on_string}: Model confidence scores"/>
+        <data name="plddts" format="tsv" from_work_dir="output/alphafold/plddts.tsv" label="${tool.name} on ${on_string}: Per-residue confidence scores (plddts)">
+            <filter>(output_plddts)</filter>
+        </data>
+        <data name="html" format="html" label="${tool.name} on ${on_string}: Visualization" />
+    </outputs>
+    <tests>
+        <test expect_num_outputs="8">
+            <conditional name="fasta_or_text">
+                <param name="input_mode" value="history"/>
+                <param name="fasta_file" value="test1.fasta"/>
+            </conditional>
+            <param name="output_plddts" value="true"/>
+            <output name="plddts">
+                <assert_contents>
+                    <has_n_columns n="2"/>
+                    <has_n_lines n="6"/>
+                    <has_size value="2900" delta="300"/>
+                </assert_contents>
+            </output>
+            <output name="confidence_scores">
+                <assert_contents>
+                    <has_n_columns n="2"/>
+                    <has_n_lines n="6"/>
+                    <has_size value="70" delta="50"/>
+                </assert_contents>
+            </output>
+            <output name="model1">
+                <assert_contents>
+                    <has_n_columns n="12"/>
+                    <has_n_lines n="1517"/>
+                    <has_size value="123000" delta="10000"/>
+                </assert_contents>
+            </output>
+            <output name="model2">
+                <assert_contents>
+                    <has_n_columns n="12"/>
+                    <has_n_lines n="1517"/>
+                    <has_size value="123000" delta="10000"/>
+                </assert_contents>
+            </output>
+            <output name="model3">
+                <assert_contents>
+                    <has_n_columns n="12"/>
+                    <has_n_lines n="1517"/>
+                    <has_size value="123000" delta="10000"/>
+                </assert_contents>
+            </output>
+            <output name="model4">
+                <assert_contents>
+                    <has_n_columns n="12"/>
+                    <has_n_lines n="1517"/>
+                    <has_size value="123000" delta="10000"/>
+                </assert_contents>
+            </output>
+            <output name="model5">
+                <assert_contents>
+                    <has_n_columns n="12"/>
+                    <has_n_lines n="1517"/>
+                    <has_size value="123000" delta="10000"/>
+                </assert_contents>
+            </output>
+        </test>
+    </tests>
+    <help><![CDATA[
+
+    .. class:: infomark
+
+    **What it does**
+
+    | AlphaFold v2.0: AI-guided 3D structure prediction of proteins
+    |
+
+    *What is AlphaFold?*
+
+    | AlphaFold is a program which uses neural networks to predict the tertiary (3D) structure of proteins. AlphaFold accepts an amino acid sequence (in Fasta format), then will 'fold' that sequence into a 3D model.
+    | NOTE: AlphaFold has a number of versions - this tool uses AlphaFold v2.0.
+    |
+
+    *What makes AlphaFold different?*
+
+    | The ability to use computers to predict 3D protein structures with high accuracy is desirable because it removes the time-consuming and costly process of determining structures experimentally.
+    | In-silico protein folding has been an active field of research for decades, but existing tools ran more slowly and with less reliability than AlphaFold.
+    | AlphaFold represents a leap forward by regularly predicting structures to atomic-level accuracy, even when no similar structures are known.
+    |
+
+    *Downstream analysis*
+
+    | Obtaining a protein fold is the first step in many analyses.
+    | The 3D models created by AlphaFold can be used in downstream analysis, including the following:
+    |
+
+    - Inspecting protein features
+        3D viewers (pymol, chimera, ngl, blender) can be used to inspect active sites, regulatory domains, binding sites.
+    - Molecular docking
+        3D structures can be used to predict the binding affinity of different compounds.
+        This is especially useful in screening drug candidates.
+    - Protein-protein interactions
+        Proteins associate in many biological processes, including intracellular signalling pathways and protein complex formation.
+        To predict these interactions, other programs may ingest 3D models predicted by AlphaFold. Proprietary softwares include `GOLD <https://www.ccdc.cam.ac.uk/solutions/csd-discovery/components/gold/>`_ and `SeeSAR <https://www.biosolveit.de/SeeSAR>`_, but many `free and open-source options <https://en.wikipedia.org/wiki/List_of_protein-ligand_docking_software>`_ are available such as `AutoDock <https://autodock.scripps.edu/>`_ and `SwissDock <http://www.swissdock.ch/>`_.
+
+    *Expected run times*
+
+    .. image:: https://github.com/usegalaxy-au/galaxy-local-tools/blob/1a8d3e8daa7ccc5a345ca377697735ab95ed0666/tools/alphafold/static/img/alphafold_runtime_graph.png?raw=true
+        :height: 520
+        :alt: Run time graph
+
+    |
+    | In general, we observe a quadratic relationship between sequence length and time to fold.
+    | Once your job begins, a sequence of 50aa will take approximately 1hr to complete, while a sequence of 2000aa will take about 18hrs.
+    |
+
+    **Input**
+
+    *Amino acid sequence*
+
+    | AlphaFold accepts a **single amino acid sequence** in FASTA format.
+    | You can choose to input either a file from your Galaxy history or paste a sequence into a text box.
+    | Please paste only a single sequence - we can only process a single sequence per job.
+    | Multiple sequences will return an error.
+    |
+
+    **Outputs**
+
+    *Visualization*
+
+    | An interactive 3D graphic of the best predicted molecular structures.
+    | This output can be opened in Galaxy to give a visual impression of the results, with different structural representations to choose from.
+    | Open the "Visualization" history output by clicking on the "view data" icon:
+    |
+
+    .. image:: https://github.com/usegalaxy-au/galaxy-local-tools/blob/1a8d3e8daa7ccc5a345ca377697735ab95ed0666/tools/alphafold/static/img/alphafold-visualization.png?raw=true
+        :height: 520
+        :alt: Result visualization
+
+    |
+
+    *PDB files*
+
+    | Five PDB (Protein Data Bank) files will be created for the best ranking models predicted by AlphaFold.
+    | These files describe the molecular structures and can be used for downstream analysis. e.g. *in silico* molecular docking.
+    |
+
+    *Model confidence scores (optional)*
+
+    | This optional output produces a file which describes the confidence scores for each model (based on `pLDDTs <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3799472/>`_) which may be useful for downstream analysis.
+    | Model confidence scores are also included as a column in the default PDB output.
+    |
+
+    **External Resources**
+
+    We recommend checking out the
+    `Alphafold Protein Structure Database <https://alphafold.ebi.ac.uk/>`_,
+    which contains predicted sequences for thousands of Human proteins. See also:
+
+    - `Google Deepmind's article on AlphaFold <https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology>`_
+    - `AlphaFold source code on GitHub <https://github.com/deepmind/alphafold>`_
+
+    ]]></help>
+    <citations>
+        <citation type="doi">https://doi.org/10.1038/s41586-021-03819-2</citation>
+    </citations>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/gen_extra_outputs.py	Thu Mar 03 02:54:20 2022 +0000
@@ -0,0 +1,155 @@
+
+
+import json
+import pickle
+import argparse
+from typing import Any, Dict, List
+
+
+class Settings:
+    """parses then keeps track of program settings"""
+    def __init__(self):
+        self.workdir = None
+        self.output_confidence_scores = True
+        self.output_residue_scores = False
+
+    def parse_settings(self) -> None:
+        parser = argparse.ArgumentParser()
+        parser.add_argument(
+            "workdir", 
+            help="alphafold output directory", 
+            type=str
+        )   
+        parser.add_argument(
+            "-p",
+            "--plddts",
+            help="output per-residue confidence scores (pLDDTs)", 
+            action="store_true"
+        )
+        args = parser.parse_args()
+        self.workdir = args.workdir.rstrip('/')
+        self.output_residue_scores = args.plddts
+
+
+class ExecutionContext:
+    """uses program settings to get paths to files etc"""
+    def __init__(self, settings: Settings):
+        self.settings = settings
+
+    @property
+    def ranking_debug(self) -> str:
+        return f'{self.settings.workdir}/ranking_debug.json'
+
+    @property
+    def model_pkls(self) -> List[str]:
+        return [f'{self.settings.workdir}/result_model_{i}.pkl'
+                for i in range(1, 6)]
+
+    @property
+    def model_conf_score_output(self) -> str:
+        return f'{self.settings.workdir}/model_confidence_scores.tsv'
+
+    @property
+    def plddt_output(self) -> str:
+        return f'{self.settings.workdir}/plddts.tsv'
+
+
+class FileLoader:
+    """loads file data for use by other classes"""
+    def __init__(self, context: ExecutionContext):
+        self.context = context
+
+    def get_model_mapping(self) -> Dict[str, int]:
+        data = self.load_ranking_debug()
+        return {name: int(rank) + 1 
+                for (rank, name) in enumerate(data['order'])}
+
+    def get_conf_scores(self) -> Dict[str, float]:
+        data = self.load_ranking_debug()
+        return {name: float(f'{score:.2f}') 
+                for name, score in data['plddts'].items()}
+
+    def load_ranking_debug(self) -> Dict[str, Any]:
+        with open(self.context.ranking_debug, 'r') as fp:
+            return json.load(fp)
+
+    def get_model_plddts(self) -> Dict[str, List[float]]:
+        plddts: Dict[str, List[float]] = {}
+        model_pkls = self.context.model_pkls
+        for i in range(5):
+            pklfile = model_pkls[i]
+            with open(pklfile, 'rb') as fp:
+                data = pickle.load(fp)
+                plddts[f'model_{i+1}'] = [float(f'{x:.2f}') for x in data['plddt']]
+        return plddts
+
+
+class OutputGenerator:
+    """generates the output data we are interested in creating"""
+    def __init__(self, loader: FileLoader):
+        self.loader = loader
+
+    def gen_conf_scores(self):
+        mapping = self.loader.get_model_mapping()
+        scores = self.loader.get_conf_scores()
+        ranked = list(scores.items())
+        ranked.sort(key=lambda x: x[1], reverse=True)
+        return {f'model_{mapping[name]}': score 
+                for name, score in ranked}
+
+    def gen_residue_scores(self) -> Dict[str, List[float]]:
+        mapping = self.loader.get_model_mapping()
+        model_plddts = self.loader.get_model_plddts()
+        return {f'model_{mapping[name]}': plddts 
+                for name, plddts in model_plddts.items()}
+
+
+class OutputWriter:
+    """writes generated data to files"""
+    def __init__(self, context: ExecutionContext):
+        self.context = context
+
+    def write_conf_scores(self, data: Dict[str, float]) -> None:
+        outfile = self.context.model_conf_score_output
+        with open(outfile, 'w') as fp:
+            for model, score in data.items():
+                fp.write(f'{model}\t{score}\n')
+    
+    def write_residue_scores(self, data: Dict[str, List[float]]) -> None:
+        outfile = self.context.plddt_output
+        model_plddts = list(data.items())
+        model_plddts.sort()
+
+        with open(outfile, 'w') as fp:
+            for model, plddts in model_plddts:
+                plddt_str_list = [str(x) for x in plddts]
+                plddt_str = ','.join(plddt_str_list)
+                fp.write(f'{model}\t{plddt_str}\n')
+
+
+def main():
+    # setup
+    settings = Settings()
+    settings.parse_settings()
+    context = ExecutionContext(settings)
+    loader = FileLoader(context)
+    
+    # generate & write outputs
+    generator = OutputGenerator(loader)
+    writer = OutputWriter(context)
+    
+    # confidence scores
+    conf_scores = generator.gen_conf_scores()
+    writer.write_conf_scores(conf_scores)
+    
+    # per-residue plddts
+    if settings.output_residue_scores:
+        residue_scores = generator.gen_residue_scores()
+        writer.write_residue_scores(residue_scores)
+
+    
+if __name__ == '__main__':
+    main()
+
+
+
Binary file static/img/alphafold-visualization.png has changed
Binary file static/img/alphafold_runtime_graph.png has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/test1.fasta	Thu Mar 03 02:54:20 2022 +0000
@@ -0,0 +1,3 @@
+>UPI0015CE2E61 status=active
+DGKILADKVSDKLEQTATLTGLDYGRFTRSMLLSQGQFAAFLNAKPSDRAELLEELTGTE
+IYGQISAMVYEQHKAARHALEKFEAQAAGIVLLTEAQQ
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/validate_fasta.py	Thu Mar 03 02:54:20 2022 +0000
@@ -0,0 +1,177 @@
+"""Validate input FASTA sequence."""
+
+import re
+import argparse
+from typing import List, TextIO
+
+
+class Fasta:
+    def __init__(self, header_str: str, seq_str: str):
+        self.header = header_str
+        self.aa_seq = seq_str
+
+
+class FastaLoader:
+    def __init__(self, fasta_path: str):
+        """Initialize from FASTA file."""
+        self.fastas = []
+        self.load(fasta_path)
+        print("Loaded FASTA sequences:")
+        for f in self.fastas:
+            print(f.header)
+            print(f.aa_seq)
+
+    def load(self, fasta_path: str):
+        """Load bare or FASTA formatted sequence."""
+        with open(fasta_path, 'r') as f:
+            self.content = f.read()
+
+        if "__cn__" in self.content:
+            # Pasted content with escaped characters
+            self.newline = '__cn__'
+            self.caret = '__gt__'
+        else:
+            # Uploaded file with normal content
+            self.newline = '\n'
+            self.caret = '>'
+
+        self.lines = self.content.split(self.newline)
+        header, sequence = self.interpret_first_line()
+
+        i = 0
+        while i < len(self.lines):
+            line = self.lines[i]
+            if line.startswith(self.caret):
+                self.update_fastas(header, sequence)
+                header = '>' + self.strip_header(line)
+                sequence = ''
+            else:
+                sequence += line.strip('\n ')
+            i += 1
+
+        # after reading whole file, header & sequence buffers might be full
+        self.update_fastas(header, sequence)
+
+    def interpret_first_line(self):
+        line = self.lines[0]
+        if line.startswith(self.caret):
+            header = '>' + self.strip_header(line)
+            return header, ''
+        else:
+            return '', line
+
+    def strip_header(self, line):
+        """Strip characters escaped with underscores from pasted text."""
+        return re.sub(r'\_\_.{2}\_\_', '', line).strip('>')
+
+    def update_fastas(self, header: str, sequence: str):
+        # if we have a sequence
+        if sequence:
+            # create generic header if not exists
+            if not header:
+                fasta_count = len(self.fastas)
+                header = f'>sequence_{fasta_count}'
+
+            # Create new Fasta
+            self.fastas.append(Fasta(header, sequence))
+
+
+class FastaValidator:
+    def __init__(self, fasta_list: List[Fasta]):
+        self.fasta_list = fasta_list
+        self.min_length = 30
+        self.max_length = 2000
+        self.iupac_characters = {
+            'A', 'B', 'C', 'D', 'E', 'F', 'G',
+            'H', 'I', 'K', 'L', 'M', 'N', 'P',
+            'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
+            'Y', 'Z', '-'
+        }
+
+    def validate(self):
+        """performs fasta validation"""
+        self.validate_num_seqs()
+        self.validate_length()
+        self.validate_alphabet()
+        # not checking for 'X' nucleotides at the moment.
+        # alphafold can throw an error if it doesn't like it.
+        #self.validate_x()
+
+    def validate_num_seqs(self) -> None:
+        if len(self.fasta_list) > 1:
+            raise Exception(f'Error encountered validating fasta: More than 1 sequence detected ({len(self.fasta_list)}). Please use single fasta sequence as input')
+        elif len(self.fasta_list) == 0:
+            raise Exception(f'Error encountered validating fasta: input file has no fasta sequences')
+
+    def validate_length(self):
+        """Confirms whether sequence length is valid. """
+        fasta = self.fasta_list[0]
+        if len(fasta.aa_seq) < self.min_length:
+            raise Exception(f'Error encountered validating fasta: Sequence too short ({len(fasta.aa_seq)}aa). Must be > 30aa')
+        if len(fasta.aa_seq) > self.max_length:
+            raise Exception(f'Error encountered validating fasta: Sequence too long ({len(fasta.aa_seq)}aa). Must be < 2000aa')
+
+    def validate_alphabet(self):
+        """
+        Confirms whether the sequence conforms to IUPAC codes.
+        If not, reports the offending character and its position.
+        """
+        fasta = self.fasta_list[0]
+        for i, char in enumerate(fasta.aa_seq.upper()):
+            if char not in self.iupac_characters:
+                raise Exception(f'Error encountered validating fasta: Invalid amino acid found at pos {i}: "{char}"')
+
+    def validate_x(self):
+        """checks if any bases are X. TODO check whether alphafold accepts X bases. """
+        fasta = self.fasta_list[0]
+        for i, char in enumerate(fasta.aa_seq.upper()):
+            if char == 'X':
+                raise Exception(f'Error encountered validating fasta: Unsupported aa code "X" found at pos {i}')
+
+
+class FastaWriter:
+    def __init__(self) -> None:
+        self.outfile = 'alphafold.fasta'
+        self.formatted_line_len = 60
+
+    def write(self, fasta: Fasta):
+        with open(self.outfile, 'w') as fp:
+            header = fasta.header
+            seq = self.format_sequence(fasta.aa_seq)
+            fp.write(header + '\n')
+            fp.write(seq + '\n')
+
+    def format_sequence(self, aa_seq: str):
+        formatted_seq = ''
+        for i in range(0, len(aa_seq), self.formatted_line_len):
+            formatted_seq += aa_seq[i: i + self.formatted_line_len] + '\n'
+        return formatted_seq
+
+
+def main():
+    # load fasta file
+    args = parse_args()
+    fas = FastaLoader(args.input_fasta)
+
+    # validate
+    fv = FastaValidator(fas.fastas)
+    fv.validate()
+
+    # write cleaned version
+    fw = FastaWriter()
+    fw.write(fas.fastas[0])
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "input_fasta",
+        help="input fasta file",
+        type=str
+    )
+    return parser.parse_args()
+
+
+
+if __name__ == '__main__':
+    main()