diff --git a/.gitignore b/.gitignore
index e5bf75993..a6b25eba1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -21,4 +21,6 @@ benchmark-*.json
 datasets/
 abnormal.txt
 gpu-rank-map.txt
-.specstory
\ No newline at end of file
+.specstory
+!datasets/
+!datasets/**
diff --git a/CODEOWNERS b/CODEOWNERS
index dc8df7ee0..56c10a827 100644
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -47,6 +47,6 @@ megatron/core/inference/
 .gitlab/
 .github/
 .gitlab-ci.yml
-Dockerfile.ci.lts
-Dockerfile.ci.dev
+docker/Dockerfile.ci.lts
+docker/Dockerfile.ci.dev
 tests/
diff --git a/README.md b/README.md
index 21286d3bd..9de1e56a5 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,18 @@
 <div align="center">
+  <img src="images/megatronapp.png" alt="MegatronApp logo" height="96">
+</div>
+<h1 align="center">MegatronApp: Toolchain Built around Megatron-LM for Distributed Training</h1>
+
+<p align="center">
+  An extension for performance tuning, slow-node detection, and training-process visualization.
+</p>
 
-**MegatronApp: Toolchain built around Megatron-LM for Distributed Training**
-<!-- > Sample: AI Health Assistant | Powered by Your Data -->
+<p align="center">
+  <a href="https://github.com/OpenSQZ/MegatronApp/blob/main/docker/DockerUsage.md">🍳 Cookbook</a> |
+  <a href="https://arxiv.org/pdf/2507.19845">📄 Technical Report</a>
+</p>
 
-An extension for performance tuning, slow-node detection, and training-process visualization.
+</div>
 
 <!-- **📢 Announcements**  
 
@@ -18,7 +27,7 @@ An extension for performance tuning, slow-node detection, and training-process v
 # News <!-- omit in toc -->
 
 ### 📌 Pinned
-* [2025.10.17] 🔥🔥🔥 We provide user-friendly [docker guidance](./DockerUsage.md) for all four features of MegatronApp. Please try it out!
+* [2025.10.17] 🔥🔥🔥 We provide user-friendly [docker guidance](./docker/DockerUsage.md) for all four features of MegatronApp. Please try it out!
 * [2025.07.27] 📢📢📢 The MegatronApp technical report has been released! See [here](https://arxiv.org/pdf/2507.19845).
 * [2025.07.04] 🔥🔥🔥 MegatronApp is officially launched at WAIC 2025! Our code is available [here](https://github.com/OpenSQZ/MegatronApp). Come and try it out!
 
@@ -167,7 +176,7 @@ docker run --runtime --nvidia --gpus all -it --rm \
 To install additional required packages, run
 
 ```bash
-pip install -r requirements.txt
+pip install -r requirements/requirements.txt
 ```
 
 ## MegaScan
@@ -197,7 +206,7 @@ Alternatively, you can use elastic training. See [torchrun](https://docs.pytorch
 2. After training, you will find separated trace files in the current directory. The trace files are named as `benchmark-data-{}-pipeline-{}-tensor-{}.json`, where `{}` is the rank number. Now we should aggregate the trace files into a single trace file:
 
 ```bash
-python scripts/aggregate.py --b trace_output --output benchmark.json
+python tools/aggregate.py --b trace_output --output benchmark.json
 ```
 
 3. You can visualize the trace file using Chrome Tracing (or Perfetto UI). Open the trace file in Chrome Tracing by navigating to `chrome://tracing` in your browser (or https://ui.perfetto.dev/). Now you can explore the trace data, zoom in on specific events, and analyze the performance characteristics of your distributed training run.
@@ -218,7 +227,7 @@ python scripts/aggregate.py --b trace_output --output benchmark.json
     2. Run the training script. Then aggregate the trace files as described above, but with an additional command line argument to enable the detection algorithm:
     
     ```bash
-    python scripts/aggregate.py \
+    python tools/aggregate.py \
         -b . \ # Equivalent to --bench-dir
         -d # Enable the detection algorithm, Equivalent to --detect
     ```
@@ -245,12 +254,12 @@ bash a_pretrain_script.sh $RANK
 ```
 For example
 ```bash
-bash pretrain_gpt.sh 0
+bash scripts/pretrain_gpt.sh 0
 ```
 
 **Frontend (Vue)**: Navigate to the frontend directory and start the development server.
 ```bash
-cd transformer-visualize
+cd tools/visualization/transformer-visualize
 npm run dev
 ```
 After launching both, open your browser to the specified address (usually http://localhost:5173). You will see the main interface.
@@ -307,13 +316,13 @@ UseIB: true
 - The Python environment in the image automatically includes almost all of the required packages. To install additional required packages, run
 
 ```bash
-pip install -r requirements.txt
+pip install -r requirements/requirements.txt
 ```
 
 - Install infiniband prerequisites
 
 ```bash
-bash prerequisite.sh
+bash scripts/prerequisite.sh
 ```
 
 - Build the `shm_tensor_new_rdma` (for multinode) and `shm_tensor_new_rdma_pre_alloc` modules.
@@ -341,7 +350,7 @@ First, prepare your dataset in the following `.json` format with one sample per
 {"src": "bloomberg", "text": "Var Energi agrees to buy Exxonmobil's Norway assets for $4.5 bln. MILAN, Sept 26 (Reuters) - Var Energi AS, the Norwegian oil and gas group 69.6% owned by Italian major Eni, has agreed to buy the Norwegian upstream assets of ExxonMobil for $4.5 billion. The deal is expected to be completed in the final quarter of this year, Var Energi said on Thursday. Reporting by Stephen Jewkes; editing by Francesca Landini MILAN, Sept 26 (Reuters) - Var Energi AS, the Norwegian oil and gas group 69.6% owned by Italian major Eni, has agreed to buy the Norwegian upstream assets of ExxonMobil for $4.5 billion. The deal is expected to be completed in the final quarter of this year, Var Energi said on Thursday. Reporting by Stephen Jewkes; editing by Francesca Landini", "type": "Eng", "id": "1", "title": "Var Energi agrees to buy Exxonmobil's Norway assets for $4.5 bln. "}
 {"src": "bloomberg", "text": "Trump says 'incorrect' he is willing to meet Iran with 'no conditions'. WASHINGTON (Reuters) - U.S. President Donald Trump on Sunday appeared to play down the chances that he might be willing to meet with Iranian officials, saying reports that he would do so without conditions were not accurate. \u201cThe Fake News is saying that I am willing to meet with Iran, \u2018No Conditions.\u2019 That is an incorrect statement (as usual!),\u201d Trump said on Twitter. In fact, as recently as on Sept. 10, U.S. Secretary of State Mike Pompeo said \u201cHe (Trump) is prepared to meet with no preconditions.\u201d Reporting By Arshad Mohammed; Editing by Shri Navaratnam WASHINGTON (Reuters) - U.S. President Donald Trump on Sunday appeared to play down the chances that he might be willing to meet with Iranian officials, saying reports that he would do so without conditions were not accurate. \u201cThe Fake News is saying that I am willing to meet with Iran, \u2018No Conditions.\u2019 That is an incorrect statement (as usual!),\u201d Trump said on Twitter. In fact, as recently as on Sept. 10, U.S. Secretary of State Mike Pompeo said \u201cHe (Trump) is prepared to meet with no preconditions.\u201d Reporting By Arshad Mohammed; Editing by Shri Navaratnam", "type": "Eng", "id": "2", "title": "Trump says 'incorrect' he is willing to meet Iran with 'no conditions'. "}
 ```
-note that we have provided a sample dataset under `datasets_gpt/` and `datasets_bert/`.
+note that we have provided a sample dataset under `datasets/gpt/` and `datasets/bert/`.
 
 Then, prepare the vocab file (gpt and bert) and the merges file (gpt-only). We have provided it in the respective directories.
 
@@ -349,9 +358,9 @@ For bert, run the following
 ```bash
 cd datasets
 python ../tools/preprocess_data.py \
-       --input ../datasets_bert/dataset.json \
+       --input ../datasets/bert/dataset.json \
        --output-prefix bert \
-       --vocab-file ../datasets_bert/vocab.txt \
+       --vocab-file ../datasets/bert/vocab.txt \
        --tokenizer-type BertWordPieceLowerCase \
        --split-sentences \
        --workers $(nproc)
@@ -362,11 +371,11 @@ For GPT, run the following
 ```bash
 cd datasets
 python ../tools/preprocess_data.py \
-       --input ../datasets_gpt/dataset.json \
+       --input ../datasets/gpt/dataset.json \
        --output-prefix gpt \
-       --vocab-file ../datasets_gpt/vocab.json \
+       --vocab-file ../datasets/gpt/vocab.json \
        --tokenizer-type GPT2BPETokenizer \
-       --merge-file ../datasets_gpt/merges.txt \
+       --merge-file ../datasets/gpt/merges.txt \
        --append-eod \
        --workers $(nproc)
 ```
@@ -377,18 +386,18 @@ For other models, please refer to `nvidia/megatron` for the corresponding datase
 To run distributed training on a single node, go to the project root directory and run
 
 ```bash
-bash run_single_gpt.sh
+bash scripts/run_single_gpt.sh
 ```
 
 for GPT and
 
 ```bash
-bash run_single_bert.sh
+bash scripts/run_single_bert.sh
 ```
 
 for bert.
 
-The `run_single_<model>.sh` files have the following structure:
+The `scripts/run_single_<model>.sh` files have the following structure:
 
 - Parameters include `pipeline_parallel`, `model_chunks` and `tensor_parallel`
 - The `virtual_stage_layer` parameter specifies how many layers there are in a single virtual pipeline stage. It is calculated as
@@ -405,35 +414,35 @@ There are also several critical parameters in `examples/gpt3/train_gpt3_175b_dis
 - `--workload` specifies the workload of each single thread, and hence determines the number of threads used in P2P communication
 - `--num-gpus` specifies the number of GPUs on the current node (single node training)
 - Other critical parameters include the number of layers of the model, the global batch size and the sequence length
-- Note that currently the global batch size value is 16 and is static in `run_single_<model>.sh`. It needs to simultaneously modify `run_single_<model>.sh` if adjusting the layers.
+- Note that currently the global batch size value is 16 and is static in `scripts/run_single_<model>.sh`. It needs to simultaneously modify `scripts/run_single_<model>.sh` if adjusting the layers.
 
 For the remaining models, you can either directly run
 ```bash
 bash examples/<model>/<train_file>.sh
 ```
-or write a file similar to `run_{single,master,worker}_<model>.sh` that sets up configurations and runs the shell under `examples/`
+or write a file similar to `scripts/run_{single,master,worker}_<model>.sh` that sets up configurations and runs the shell under `examples/`
 
 #### Multinode Distributed Training
 To run distributed training on multiple nodes, go to the root directory. First run
 
 ```bash
-bash run_master_<model>.sh
+bash scripts/run_master_<model>.sh
 ```
 
 and then start another pod and run
 
 ```bash
-bash run_worker_<model>.sh
+bash scripts/run_worker_<model>.sh
 ```
 
-The `run_master_<model>.sh` has the following parameters
+The `scripts/run_master_<model>.sh` has the following parameters
 
-- Similar to `run_single_<model>.sh`, we have `pipeline_parallel`, `model_chunks` and `tensor_parallel`
+- Similar to `scripts/run_single_<model>.sh`, we have `pipeline_parallel`, `model_chunks` and `tensor_parallel`
 - It writes the master pod IP to `examples/gpt3/train_gpt3_175b_distributed_master.sh` and to `train_gpt3_175b_distributed_worker.sh` (bert in the corresponding directory)
 - Set the number of nodes to be 2 and master node has rank 0
 - Starts the shell under `examples`
 
-and `run_worker_<model>.sh` does the following
+and `scripts/run_worker_<model>.sh` does the following
 - Set the number of nodes to be 2 and the worker node has rank 1
 - Starts the shell under `examples`
 
@@ -441,10 +450,10 @@ The `examples/gpt3/train_gpt3_175b_distributed_master.sh` and `examples/gpt3/tra
 
 ### Profiling
 
-Each run will generate a trace dir in `benchmark`. Go to the `profiling` directory and run
+Each run will generate a trace dir in `benchmark`. Go to the `tools/profiling` directory and run
 
 ```
-python aggregate.py --benchmark_dir benchmark/your-benchmark-dir
+python tools/aggregate.py --benchmark_dir benchmark/your-benchmark-dir
 ```
 
 in the root dir to produce an aggregated trace file.
@@ -464,10 +473,10 @@ Just follow above installation instructions.
 $\quad$ To run distributed training on a single node, go to the project root directory and run
 
 ```bash
-bash pretrain_gpt.sh $RANK
+bash scripts/pretrain_gpt.sh $RANK
 ```
 
-Here `pretrain_gpt.sh` is an example pretraining `Bash` script. 
+Here `scripts/pretrain_gpt.sh` is an example pretraining `Bash` script. 
 
 There are two extra options: `--forward-backward-disaggregating` and `--ignore-forward-tensor-parallel` in `TRAINING_ARGS`.
 
diff --git a/README_Megatron.md b/README_Megatron.md
index d1038b223..c885c9aa9 100644
--- a/README_Megatron.md
+++ b/README_Megatron.md
@@ -32,7 +32,7 @@ An integration of a dynamic pipeline-parallel algorithm to the Megatron distribu
 
 # 📥 Supported Data Sources & Language Models
 
-We provide demo examples for the following models. See the files `run_{single,master,worker}_<model>.sh`
+We provide demo examples for the following models. See the files `scripts/run_{single,master,worker}_<model>.sh`
 | Data Sources You Can Add | Supported Language Models |
 | ------------------------ | ------------------------- |
 | Sample dataset provided & self-chosen                      | GPT                       |
@@ -64,13 +64,13 @@ UseIB: true
 - The python environment in the image automatically includes almost all of the required packages, to install additional required packages, run
 
 ```bash
-pip install -r requirements.txt
+pip install -r requirements/requirements.txt
 ```
 
 - Install infiniband prerequisites
 
 ```bash
-bash prerequisite.sh
+bash scripts/prerequisite.sh
 ```
 
 - Build the `shm_tensor_new_rdma` (for multinode) and `shm_tensor_new_rdma_pre_alloc` module.
@@ -98,7 +98,7 @@ First, prepare your dataset in the following `.json` format with one sample per
 {"src": "bloomberg", "text": "Var Energi agrees to buy Exxonmobil's Norway assets for $4.5 bln. MILAN, Sept 26 (Reuters) - Var Energi AS, the Norwegian oil and gas group 69.6% owned by Italian major Eni, has agreed to buy the Norwegian upstream assets of ExxonMobil for $4.5 billion. The deal is expected to be completed in the final quarter of this year, Var Energi said on Thursday. Reporting by Stephen Jewkes; editing by Francesca Landini MILAN, Sept 26 (Reuters) - Var Energi AS, the Norwegian oil and gas group 69.6% owned by Italian major Eni, has agreed to buy the Norwegian upstream assets of ExxonMobil for $4.5 billion. The deal is expected to be completed in the final quarter of this year, Var Energi said on Thursday. Reporting by Stephen Jewkes; editing by Francesca Landini", "type": "Eng", "id": "1", "title": "Var Energi agrees to buy Exxonmobil's Norway assets for $4.5 bln. "}
 {"src": "bloomberg", "text": "Trump says 'incorrect' he is willing to meet Iran with 'no conditions'. WASHINGTON (Reuters) - U.S. President Donald Trump on Sunday appeared to play down the chances that he might be willing to meet with Iranian officials, saying reports that he would do so without conditions were not accurate. \u201cThe Fake News is saying that I am willing to meet with Iran, \u2018No Conditions.\u2019 That is an incorrect statement (as usual!),\u201d Trump said on Twitter. In fact, as recently as on Sept. 10, U.S. Secretary of State Mike Pompeo said \u201cHe (Trump) is prepared to meet with no preconditions.\u201d Reporting By Arshad Mohammed; Editing by Shri Navaratnam WASHINGTON (Reuters) - U.S. President Donald Trump on Sunday appeared to play down the chances that he might be willing to meet with Iranian officials, saying reports that he would do so without conditions were not accurate. \u201cThe Fake News is saying that I am willing to meet with Iran, \u2018No Conditions.\u2019 That is an incorrect statement (as usual!),\u201d Trump said on Twitter. In fact, as recently as on Sept. 10, U.S. Secretary of State Mike Pompeo said \u201cHe (Trump) is prepared to meet with no preconditions.\u201d Reporting By Arshad Mohammed; Editing by Shri Navaratnam", "type": "Eng", "id": "2", "title": "Trump says 'incorrect' he is willing to meet Iran with 'no conditions'. "}
 ```
-note that we have provided a sample dataset under `datasets_gpt/` and `datasets_bert/`.
+note that we have provided a sample dataset under `datasets/gpt/` and `datasets/bert/`.
 
 Then, prepare the vocab file (gpt and bert) and the merges file (gpt-only). We have provided it in the respective directories.
 
@@ -106,9 +106,9 @@ For bert, run the following
 ```bash
 cd datasets
 python ../tools/preprocess_data.py \
-       --input ../datasets_bert/dataset.json \
+       --input ../datasets/bert/dataset.json \
        --output-prefix bert \
-       --vocab-file ../datasets_bert/vocab.txt \
+       --vocab-file ../datasets/bert/vocab.txt \
        --tokenizer-type BertWordPieceLowerCase \
        --split-sentences
        --workers $(nproc)
@@ -119,11 +119,11 @@ For GPT, run the following
 ```bash
 cd datasets
 python ../tools/preprocess_data.py \
-       --input ../datasets_gpt/dataset.json \
+       --input ../datasets/gpt/dataset.json \
        --output-prefix gpt \
-       --vocab-file ../datasets_gpt/vocab.json \
+       --vocab-file ../datasets/gpt/vocab.json \
        --tokenizer-type GPT2BPETokenizer \
-       --merge-file ../datasets_gpt/merges.txt \
+       --merge-file ../datasets/gpt/merges.txt \
        --append-eod
        --workers $(nproc)
 ```
@@ -134,18 +134,18 @@ For other models, please refer to `nvidia/megatron` for the corresponding datase
 To run distributed training on a single node, go to the project root directory and run
 
 ```bash
-bash run_single_gpt.sh
+bash scripts/run_single_gpt.sh
 ```
 
 for GPT and
 
 ```bash
-bash run_single_bert.sh
+bash scripts/run_single_bert.sh
 ```
 
 for bert.
 
-The `run_single_<model>.sh` files have the following structure:
+The `scripts/run_single_<model>.sh` files have the following structure:
 
 - Parameters include `pipeline_parallel`, `model_chunks` and `tensor_parallel`
 - The `virtual_stage_layer` parameter sets how many layers are there in a single virtual pipeline stage. It is calculated as
@@ -161,35 +161,35 @@ There are also several critical parameters in `examples/gpt3/train_gpt3_175b_dis
 - `--use-dpp` switches to DPP algorithm
 - `--workload` specifies the workload of each single thread, and hence determines the number of threads used in P2P communication
 - `--num-gpus` specify the number of GPUs on the current node (single node training)
-- Other critical parameters include the number of layers of the model (note that currently the value is 16 and is static in `run_single_<model>.sh`, needs to simultaneously modify `run_single_<model>.sh` if adjusting the layers), the global batch size and the sequence length
+- Other critical parameters include the number of layers of the model (note that currently the value is 16 and is static in `scripts/run_single_<model>.sh`, needs to simultaneously modify `scripts/run_single_<model>.sh` if adjusting the layers), the global batch size and the sequence length
 
 For the remaining models, you can either directly run
 ```bash
 bash examples/<model>/<train_file>.sh
 ```
-or write a file similar to `run_{single,master,worker}_<model>.sh` that sets up configurations and runs the shell under `examples/`
+or write a file similar to `scripts/run_{single,master,worker}_<model>.sh` that sets up configurations and runs the shell under `examples/`
 
 ### Multinode Distributed Training
 To run distributed training on multiple nodes, go to the root directory. First run
 
 ```bash
-bash run_master_<model>.sh
+bash scripts/run_master_<model>.sh
 ```
 
 and then start another pod and run
 
 ```bash
-bash run_worker_<model>.sh
+bash scripts/run_worker_<model>.sh
 ```
 
-The `run_master_<model>.sh` has the following parameters
+The `scripts/run_master_<model>.sh` has the following parameters
 
-- Similar to `run_single_<model>.sh`, we have `pipeline_parallel`, `model_chunks` and `tensor_parallel`
+- Similar to `scripts/run_single_<model>.sh`, we have `pipeline_parallel`, `model_chunks` and `tensor_parallel`
 - It writes the master pod IP to `examples/gpt3/train_gpt3_175b_distributed_master.sh` and to `train_gpt3_175b_distributed_worker.sh` (bert in the corresponding directory)
 - Set the number of nodes to be 2 and master node has rank 0
 - Starts the shell under `examples`
 
-and `run_worker_<model>.sh` does the following
+and `scripts/run_worker_<model>.sh` does the following
 - Set the number of nodes to be 2 and the worker node has rank 1
 - Starts the shell under `examples`
 
@@ -197,10 +197,10 @@ The `examples/gpt3/train_gpt3_175b_distributed_master.sh` and `examples/gpt3/tra
 
 ### Profiling
 
-Each run will generate a trace dir in `benchmark`. Go to the `profiling` directory and run
+Each run will generate a trace dir in `benchmark`. Go to the `tools/profiling` directory and run
 
 ```python
-python aggregate.py --benchmark_dir benchmark/your-benchmark-dir
+python tools/aggregate.py --benchmark_dir benchmark/your-benchmark-dir
 ```
 
 in the root dir to produce an aggregated trace file.
diff --git a/datasets_bert/dataset.json b/datasets/bert/dataset.json
similarity index 100%
rename from datasets_bert/dataset.json
rename to datasets/bert/dataset.json
diff --git a/datasets_bert/dataset_ss.json b/datasets/bert/dataset_ss.json
similarity index 100%
rename from datasets_bert/dataset_ss.json
rename to datasets/bert/dataset_ss.json
diff --git a/datasets_bert/vocab.txt b/datasets/bert/vocab.txt
similarity index 100%
rename from datasets_bert/vocab.txt
rename to datasets/bert/vocab.txt
diff --git a/datasets_gpt/dataset.json b/datasets/gpt/dataset.json
similarity index 100%
rename from datasets_gpt/dataset.json
rename to datasets/gpt/dataset.json
diff --git a/datasets_gpt/merges.txt b/datasets/gpt/merges.txt
similarity index 100%
rename from datasets_gpt/merges.txt
rename to datasets/gpt/merges.txt
diff --git a/datasets_gpt/vocab.json b/datasets/gpt/vocab.json
similarity index 100%
rename from datasets_gpt/vocab.json
rename to datasets/gpt/vocab.json
diff --git a/DockerUsage.md b/docker/DockerUsage.md
similarity index 90%
rename from DockerUsage.md
rename to docker/DockerUsage.md
index 93fca075c..e17592970 100644
--- a/DockerUsage.md
+++ b/docker/DockerUsage.md
@@ -20,7 +20,7 @@ docker run --runtime --nvidia --gpus all -it --rm \
 Install any additional Python packages:
 
 ```
-pip install -r requirements.txt
+pip install -r requirements/requirements.txt
 ```
 
 For `MegaFBD` and `MegaDPP`, the RDMA C++ extentions `shm_tensor_new_rdma` and `shm_tensor_new_rdma_pre_alloc` must be installed:
@@ -63,14 +63,14 @@ mkdir -p /workspace/shared/datasets /workspace/shared/outputs /workspace/shared/
 # Preprocessed binaries from Megatron’s scripts will be produced here
 mkdir -p datasets
 
-# Example: preprocess GPT sample data (datasets_gpt/ and datasets_bert/ provided)
+# Example: preprocess GPT sample data (datasets/gpt/ and datasets/bert/ provided)
 cd /workspace/megatronapp/datasets
 python ../tools/preprocess_data.py \
-  --input ../datasets_gpt/dataset.json \
+  --input ../datasets/gpt/dataset.json \
   --output-prefix gpt \
-  --vocab-file ../datasets_gpt/vocab.json \
+  --vocab-file ../datasets/gpt/vocab.json \
   --tokenizer-type GPT2BPETokenizer \
-  --merge-file ../datasets_gpt/merges.txt \
+  --merge-file ../datasets/gpt/merges.txt \
   --append-eod \
   --workers "$(nproc)"
 ```
@@ -95,13 +95,13 @@ TRACE_FLAGS="\
  --trace-granularity full \
  --transformer-impl local"
 
-bash ./DockerUsage_MegaScan.sh
+bash docker/DockerUsage_MegaScan.sh
 ```
 
 Note: 
 - **Single machine, multi-GPU**: If your node has multiple A40s, the script will detect GPU count automatically. To force a value, set `--num-gpus` inside the script to your machine’s GPU count.
 
-- **Multi-node**: Use `run_master_<model>.sh` / `run_worker_<model>.sh` and set `--multi-node` and `--node-ips` (in InfiniBand order) in `examples/.../train_*_master/worker.sh`.
+- **Multi-node**: Use `scripts/run_master_<model>.sh` / `scripts/run_worker_<model>.sh` and set `--multi-node` and `--node-ips` (in InfiniBand order) in `examples/.../train_*_master/worker.sh`.
 
 You can also consider **elastic training** (see `torchrun` documentation).
 
@@ -114,7 +114,7 @@ benchmark-data-{}-pipeline-{}-tensor-{}.json
 Aggregate them into one file:
 
 ```bash
-python scripts/aggregate.py --b trace_output --output benchmark.json
+python tools/aggregate.py --b trace_output --output benchmark.json
 ```
 
 To visualize, open the JSON trace with Chrome Tracing (chrome://tracing) or [Perfetto UI](https://ui.perfetto.dev/). You can zoom, filter, and inspect timelines token-by-token to analyze distributed performance.
@@ -136,7 +136,7 @@ bash scripts/gpu_control.sh limit 0 900
 Re-run training, then aggregate with detection enabled:
     
 ```bash
-python scripts/aggregate.py \
+python tools/aggregate.py \
   -b . \  # Equivalent to --bench-dir
   -d      # Enable detection (equivalent to --detect)
 ```
@@ -148,14 +148,14 @@ You should see output indicating a potential anomaly on GPU 0:
 
 ## MegaScope
 
-First, we use existed data to launch this example. You need to move `/workspace/megatronapp/datasets` 下的 `gpt_text_document.bin` and `gpt_text_document.idx` to  `/workspace/megatronapp/datasets_gpt`.
+First, we use existed data to launch this example. You need to move `/workspace/megatronapp/datasets` 下的 `gpt_text_document.bin` and `gpt_text_document.idx` to  `/workspace/megatronapp/datasets/gpt`.
 
 MegaScope requires a backend (Megatron) and a frontend (Vue) service.
 
 ### Backend(Megatron) Training Mode
 
 ```bash
-TP=1 PP=2 NNODES=1 NCCL_DEBUG=INFO MASTER_ADDR=127.0.0.1 MASTER_PORT=29500 bash DockerUsage_MegaScope.sh
+TP=1 PP=2 NNODES=1 NCCL_DEBUG=INFO MASTER_ADDR=127.0.0.1 MASTER_PORT=29500 bash docker/DockerUsage_MegaScope.sh
 ```
 
 Important: The tutorial defaults to 1 node × 4 GPUs. On your server, set a consistent combination of `TP` (tensor parallel size), `PP` (pipeline parallel size), and `world size`.
@@ -230,7 +230,7 @@ When the terminal shows **“MegatronServer started”** and a listening **PORT*
 ### Frontend (Vue): Navigate to the frontend directory and start the development server.
 
 ```bash
-cd transformer-visualize
+cd tools/visualization/transformer-visualize
 npm run dev
 ```
 After launching both, open your browser to the specified address (usually http://localhost:5173). You will see the main interface.
@@ -268,20 +268,20 @@ The similar support for visualization during training process are provided as we
 #### Single Node Distributed Training
 
 ```
-bash run_single_gpt.sh
+bash scripts/run_single_gpt.sh
 ```
 
-This script (see `run_single_gpt.sh`) automatically rewrites the parallel configuration and `MASTER_ADDR` inside `examples/gpt3/train_gpt3_175b_distributed.sh` and keeps `--use-dpp` enabled so `MegaDPP` stays active.
+This script (see `scripts/run_single_gpt.sh`) automatically rewrites the parallel configuration and `MASTER_ADDR` inside `examples/gpt3/train_gpt3_175b_distributed.sh` and keeps `--use-dpp` enabled so `MegaDPP` stays active.
 
 If your GPU count or InfiniBand IPs differ from the defaults, edit `examples/gpt3/train_gpt3_175b_distributed.sh` (lines 12–34) and adjust `GPUS_PER_NODE`, `--node-ips`, and related fields. On a single node, repeat the IP returned by `hostname -i` in `--node-ips`, matching the number of GPUs.
 
 Training logs and any generated benchmark directories are written to the mounted repository path. Aggregate profiling traces when needed by running 
 
 ```python
-python aggregate.py --benchmark_dir benchmark/<your-directory>.
+python tools/aggregate.py --benchmark_dir benchmark/<your-directory>.
 ``` 
 
-Once the single-node run succeeds, consider (1) experimenting with different parallel settings in `examples/gpt3/train_gpt3_175b_distributed.sh`, and (2) validating the multi-node workflow described in the README using `run_master_gpt.sh` and `run_worker_gpt.sh`.
+Once the single-node run succeeds, consider (1) experimenting with different parallel settings in `examples/gpt3/train_gpt3_175b_distributed.sh`, and (2) validating the multi-node workflow described in the README using `scripts/run_master_gpt.sh` and `scripts/run_worker_gpt.sh`.
 
 #### Multinode Distributed Training
 
@@ -292,10 +292,10 @@ Please refer to [./README.md](https://github.com/OpenSQZ/MegatronApp?tab=readme-
 $\quad$ To run distributed training on a single node, go to the project root directory and run
 
 ```bash
-bash DockerUsage_MegaFBD.sh $RANK
+bash docker/DockerUsage_MegaFBD.sh $RANK
 ```
 
-Here `DockerUsage_MegaFBD.sh` is an example bash script of pretrain, designed for a single node:
+Here `docker/DockerUsage_MegaFBD.sh` is an example bash script of pretrain, designed for a single node:
 
 - `GPUS_PER_NODE`=<actual number of GPUs> (no longer incremented);
 
diff --git a/DockerUsage_MegaFBD.sh b/docker/DockerUsage_MegaFBD.sh
similarity index 100%
rename from DockerUsage_MegaFBD.sh
rename to docker/DockerUsage_MegaFBD.sh
diff --git a/DockerUsage_MegaScan.sh b/docker/DockerUsage_MegaScan.sh
similarity index 94%
rename from DockerUsage_MegaScan.sh
rename to docker/DockerUsage_MegaScan.sh
index f44a2f283..137455fc6 100644
--- a/DockerUsage_MegaScan.sh
+++ b/docker/DockerUsage_MegaScan.sh
@@ -30,8 +30,8 @@ torchrun --standalone --nproc_per_node=4 pretrain_gpt.py \
   --untie-embeddings-and-output-weights \
   --no-ckpt-fully-parallel-save \
   --tokenizer-type GPT2BPETokenizer \
-  --vocab-file datasets_gpt/vocab.json \
-  --merge-file datasets_gpt/merges.txt \
+  --vocab-file datasets/gpt/vocab.json \
+  --merge-file datasets/gpt/merges.txt \
   --data-path datasets/gpt_text_document \
   --split 949,50,1 \
   --fp16 \
diff --git a/DockerUsage_MegaScope.sh b/docker/DockerUsage_MegaScope.sh
similarity index 95%
rename from DockerUsage_MegaScope.sh
rename to docker/DockerUsage_MegaScope.sh
index 77ab2bdce..077e5e0f3 100644
--- a/DockerUsage_MegaScope.sh
+++ b/docker/DockerUsage_MegaScope.sh
@@ -16,9 +16,9 @@ PIPELINE_MP_SIZE="${PP:-2}"           # PP=2
 
 # ---------- Path ----------
 CHECKPOINT_PATH="ngc_models/release_gpt_base"
-VOCAB_FILE="datasets_gpt/vocab.json"
-MERGE_FILE="datasets_gpt/merges.txt"
-DATA_PATH="datasets_gpt/gpt_text_document"
+VOCAB_FILE="datasets/gpt/vocab.json"
+MERGE_FILE="datasets/gpt/merges.txt"
+DATA_PATH="datasets/gpt/gpt_text_document"
 
 # ---------- Hyperparameters ----------
 NUM_LAYERS=16
diff --git a/Dockerfile b/docker/Dockerfile
similarity index 94%
rename from Dockerfile
rename to docker/Dockerfile
index 7c51da348..805d30a8d 100644
--- a/Dockerfile
+++ b/docker/Dockerfile
@@ -20,14 +20,14 @@ WORKDIR /workspace/MegatronApp
 COPY . .
 
 # Project pre-dependencies and two RDMA C++ extensions compilation
-RUN bash prerequisite.sh
+RUN bash scripts/prerequisite.sh
 WORKDIR /workspace/MegatronApp/megatron/shm_tensor_new_rdma
 RUN pip install -e .
 WORKDIR /workspace/MegatronApp/megatron/shm_tensor_new_rdma_pre_alloc
 RUN pip install -e .
 
 WORKDIR /workspace/MegatronApp
-RUN pip install -r requirements.txt
+RUN pip install -r requirements/requirements.txt
 RUN python - <<'PY'
 import importlib
 for m in ['shm_tensor_new_rdma_pre_alloc','shm_tensor_new_rdma']:
diff --git a/Dockerfile.ci.dev b/docker/Dockerfile.ci.dev
similarity index 96%
rename from Dockerfile.ci.dev
rename to docker/Dockerfile.ci.dev
index 0f17f4a62..881ab8158 100644
--- a/Dockerfile.ci.dev
+++ b/docker/Dockerfile.ci.dev
@@ -67,8 +67,8 @@ RUN \
     --mount=type=bind,source=megatron/core/package_info.py,target=megatron/core/package_info.py \
     --mount=type=bind,source=megatron/core/README.md,target=megatron/core/README.md \
     --mount=type=bind,source=megatron/core/requirements.txt,target=megatron/core/requirements.txt \
-    --mount=type=bind,source=requirements_mlm.txt,target=requirements_mlm.txt \
-    --mount=type=bind,source=requirements_ci.txt,target=requirements_ci.txt \
+    --mount=type=bind,source=requirements/requirements_mlm.txt,target=requirements_mlm.txt \
+    --mount=type=bind,source=requirements/requirements_ci.txt,target=requirements_ci.txt \
     --mount=type=bind,source=megatron/core/__init__.py,target=megatron/core/__init__.py <<"EOF" bash -ex
 pip install -U pip
 pip install --no-cache-dir causal_conv1d-*.whl mamba_ssm-*.whl grouped_gemm-*.whl transformer_engine*.whl
diff --git a/Dockerfile.ci.lts b/docker/Dockerfile.ci.lts
similarity index 100%
rename from Dockerfile.ci.lts
rename to docker/Dockerfile.ci.lts
diff --git a/Dockerfile.linting b/docker/Dockerfile.linting
similarity index 100%
rename from Dockerfile.linting
rename to docker/Dockerfile.linting
diff --git a/examples/bert/train_bert_340m_distributed.sh b/examples/bert/train_bert_340m_distributed.sh
index 10887bcc2..def838a49 100644
--- a/examples/bert/train_bert_340m_distributed.sh
+++ b/examples/bert/train_bert_340m_distributed.sh
@@ -17,7 +17,7 @@ TENSOR_PARALLEL=1
 
 CHECKPOINT_PATH=ngc_models_bert #<Specify path>
 TENSORBOARD_LOGS_PATH=tensor_board_bert  #<Specify path>
-VOCAB_FILE=datasets_bert/vocab.txt #<Specify path to file>/bert-vocab.json
+VOCAB_FILE=datasets/bert/vocab.txt #<Specify path to file>/bert-vocab.json
 DATA_PATH=datasets/bert_text_sentence #<Specify path and file prefix>_text_document
 
 DISTRIBUTED_ARGS=(
diff --git a/examples/bert/train_bert_340m_distributed_master.sh b/examples/bert/train_bert_340m_distributed_master.sh
index 0c44f513a..96fdfb1ac 100644
--- a/examples/bert/train_bert_340m_distributed_master.sh
+++ b/examples/bert/train_bert_340m_distributed_master.sh
@@ -17,7 +17,7 @@ TENSOR_PARALLEL=4
 
 CHECKPOINT_PATH=ngc_models_bert #<Specify path>
 TENSORBOARD_LOGS_PATH=tensor_board_bert  #<Specify path>
-VOCAB_FILE=datasets_bert/vocab.txt #<Specify path to file>/bert-vocab.json
+VOCAB_FILE=datasets/bert/vocab.txt #<Specify path to file>/bert-vocab.json
 DATA_PATH=datasets/bert_text_sentence #<Specify path and file prefix>_text_document
 
 DISTRIBUTED_ARGS=(
diff --git a/examples/bert/train_bert_340m_distributed_worker.sh b/examples/bert/train_bert_340m_distributed_worker.sh
index b3395f02b..23f7adea4 100644
--- a/examples/bert/train_bert_340m_distributed_worker.sh
+++ b/examples/bert/train_bert_340m_distributed_worker.sh
@@ -17,7 +17,7 @@ TENSOR_PARALLEL=4
 
 CHECKPOINT_PATH=ngc_models_bert #<Specify path>
 TENSORBOARD_LOGS_PATH=tensor_board_bert  #<Specify path>
-VOCAB_FILE=datasets_bert/vocab.txt #<Specify path to file>/bert-vocab.json
+VOCAB_FILE=datasets/bert/vocab.txt #<Specify path to file>/bert-vocab.json
 DATA_PATH=datasets/bert_text_sentence #<Specify path and file prefix>_text_document
 
 DISTRIBUTED_ARGS=(
diff --git a/examples/gpt3/train_gpt3_175b_distributed.sh b/examples/gpt3/train_gpt3_175b_distributed.sh
index 3eaaf0fae..324b80abc 100644
--- a/examples/gpt3/train_gpt3_175b_distributed.sh
+++ b/examples/gpt3/train_gpt3_175b_distributed.sh
@@ -17,8 +17,8 @@ TENSOR_PARALLEL=2
 
 CHECKPOINT_PATH=ngc_models_gpt                                   #<Specify path>
 TENSORBOARD_LOGS_PATH=tensor_board_gpt                           #<Specify path>
-VOCAB_FILE=datasets_gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
-MERGE_FILE=datasets_gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
+VOCAB_FILE=datasets/gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
+MERGE_FILE=datasets/gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
 DATA_PATH=datasets/gpt_text_document #<Specify path and file prefix>_text_document
 
 DISTRIBUTED_ARGS=(
diff --git a/examples/gpt3/train_gpt3_175b_distributed_master.sh b/examples/gpt3/train_gpt3_175b_distributed_master.sh
index 218e1d925..2c28bf744 100644
--- a/examples/gpt3/train_gpt3_175b_distributed_master.sh
+++ b/examples/gpt3/train_gpt3_175b_distributed_master.sh
@@ -17,8 +17,8 @@ TENSOR_PARALLEL=4
 
 CHECKPOINT_PATH=ngc_models_gpt                                   #<Specify path>
 TENSORBOARD_LOGS_PATH=tensor_board_gpt                           #<Specify path>
-VOCAB_FILE=datasets_gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
-MERGE_FILE=datasets_gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
+VOCAB_FILE=datasets/gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
+MERGE_FILE=datasets/gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
 DATA_PATH=datasets/gpt_text_document #<Specify path and file prefix>_text_document
 
 DISTRIBUTED_ARGS=(
diff --git a/examples/gpt3/train_gpt3_175b_distributed_worker.sh b/examples/gpt3/train_gpt3_175b_distributed_worker.sh
index 92f7038ad..05b9f0ddf 100644
--- a/examples/gpt3/train_gpt3_175b_distributed_worker.sh
+++ b/examples/gpt3/train_gpt3_175b_distributed_worker.sh
@@ -17,8 +17,8 @@ TENSOR_PARALLEL=4
 
 CHECKPOINT_PATH=ngc_models_gpt                                   #<Specify path>
 TENSORBOARD_LOGS_PATH=tensor_board_gpt                           #<Specify path>
-VOCAB_FILE=datasets_gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
-MERGE_FILE=datasets_gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
+VOCAB_FILE=datasets/gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
+MERGE_FILE=datasets/gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
 DATA_PATH=datasets/gpt_text_document #<Specify path and file prefix>_text_document
 
 DISTRIBUTED_ARGS=(
diff --git a/examples/gpt3/train_gpt3_345m_distributed.sh b/examples/gpt3/train_gpt3_345m_distributed.sh
index 13ef4da34..e78c4f40d 100644
--- a/examples/gpt3/train_gpt3_345m_distributed.sh
+++ b/examples/gpt3/train_gpt3_345m_distributed.sh
@@ -14,9 +14,9 @@ WORLD_SIZE=$(($GPUS_PER_NODE*$NUM_NODES))
 
 CHECKPOINT_PATH=checkpoints/gpt3_345m_distributed
 TENSORBOARD_LOGS_PATH=checkpoints/tb_logs/gpt3_345m_distributed
-VOCAB_FILE=datasets_gpt/vocab.json
-MERGE_FILE=datasets_gpt/merges.txt
-DATA_PATH=datasets_gpt/gpt_text_document # modify this to your dataset path
+VOCAB_FILE=datasets/gpt/vocab.json
+MERGE_FILE=datasets/gpt/merges.txt
+DATA_PATH=datasets/gpt/gpt_text_document # modify this to your dataset path
 
 DISTRIBUTED_ARGS=(
     --nproc_per_node $GPUS_PER_NODE 
diff --git a/images/megatronapp.png b/images/megatronapp.png
new file mode 100644
index 000000000..38b29d3d3
Binary files /dev/null and b/images/megatronapp.png differ
diff --git a/requirements.txt b/requirements/requirements.txt
similarity index 100%
rename from requirements.txt
rename to requirements/requirements.txt
diff --git a/requirements_ci.txt b/requirements/requirements_ci.txt
similarity index 100%
rename from requirements_ci.txt
rename to requirements/requirements_ci.txt
diff --git a/requirements_mlm.txt b/requirements/requirements_mlm.txt
similarity index 100%
rename from requirements_mlm.txt
rename to requirements/requirements_mlm.txt
diff --git a/prerequisite.sh b/scripts/prerequisite.sh
similarity index 100%
rename from prerequisite.sh
rename to scripts/prerequisite.sh
diff --git a/pretrain_gpt.sh b/scripts/pretrain_gpt.sh
similarity index 100%
rename from pretrain_gpt.sh
rename to scripts/pretrain_gpt.sh
diff --git a/run_master_bert.sh b/scripts/run_master_bert.sh
similarity index 100%
rename from run_master_bert.sh
rename to scripts/run_master_bert.sh
diff --git a/run_master_gpt.sh b/scripts/run_master_gpt.sh
similarity index 100%
rename from run_master_gpt.sh
rename to scripts/run_master_gpt.sh
diff --git a/run_single_bert.sh b/scripts/run_single_bert.sh
similarity index 100%
rename from run_single_bert.sh
rename to scripts/run_single_bert.sh
diff --git a/run_single_gpt.sh b/scripts/run_single_gpt.sh
similarity index 100%
rename from run_single_gpt.sh
rename to scripts/run_single_gpt.sh
diff --git a/run_worker_bert.sh b/scripts/run_worker_bert.sh
similarity index 100%
rename from run_worker_bert.sh
rename to scripts/run_worker_bert.sh
diff --git a/run_worker_gpt.sh b/scripts/run_worker_gpt.sh
similarity index 100%
rename from run_worker_gpt.sh
rename to scripts/run_worker_gpt.sh
diff --git a/test_scripts/Dockerfile b/test_scripts/Dockerfile
index 58eb5754e..df9841b77 100644
--- a/test_scripts/Dockerfile
+++ b/test_scripts/Dockerfile
@@ -59,8 +59,8 @@ RUN \
 
 RUN \
         git clone https://github.com/OpenSQZ/MegatronApp.git && \
-        cd MegatronApp  && \
-        bash prerequisite.sh  && \
+        cd MegatronApp  && \
+        bash scripts/prerequisite.sh  && \
         cd megatron/shm_tensor_new_rdma  && \
         pip install -e . && \
         cd ../.. && \
diff --git a/test_scripts/aggregate_trace.sh b/test_scripts/aggregate_trace.sh
index d66888aca..6a162e1b9 100644
--- a/test_scripts/aggregate_trace.sh
+++ b/test_scripts/aggregate_trace.sh
@@ -1,3 +1,3 @@
-python scripts/aggregate.py \
+python tools/aggregate.py \
     --b trace_output \
     --output /path/to/trace_output.json
\ No newline at end of file
diff --git a/test_scripts/process_dataset_gpt.sh b/test_scripts/process_dataset_gpt.sh
index 641c08c11..dd842b8c8 100644
--- a/test_scripts/process_dataset_gpt.sh
+++ b/test_scripts/process_dataset_gpt.sh
@@ -1,12 +1,12 @@
-output_dir="/path/to/datasets_gpt/"
+output_dir="/path/to/datasets/gpt/"
 mkdir -p $output_dir
 
 python tools/preprocess_data.py \
-  --input datasets_gpt/dataset.json \
+  --input datasets/gpt/dataset.json \
   --output-prefix $output_dir/bloomberg \
-  --vocab-file datasets_gpt/vocab.json \
+  --vocab-file datasets/gpt/vocab.json \
   --tokenizer-type GPT2BPETokenizer \
-  --merge-file datasets_gpt/merges.txt \
+  --merge-file datasets/gpt/merges.txt \
   --append-eod \
   --json-keys text \
   --workers $(nproc)
diff --git a/test_scripts/readme.md b/test_scripts/readme.md
index 1df3a4940..2d19d0491 100644
--- a/test_scripts/readme.md
+++ b/test_scripts/readme.md
@@ -54,7 +54,7 @@ bash test_scripts/test_text_generation_server_gpt2.sh
 ### Launch MegaScope frontend
 
 ```bash
-cd transformer-visualize
+cd tools/visualization/transformer-visualize
 
 # Optional: if your Node.js environment changed
 # rm -rf ./node_modules
@@ -95,7 +95,7 @@ MegatronApp/trace_output/benchmark-data-0-pipeline-0-tensor-0.json
 ### Aggregate trace
 
 ```bash
-python scripts/aggregate.py --b trace_output --output test_scripts/trace_output.json
+python tools/aggregate.py --b trace_output --output test_scripts/trace_output.json
 # or
 bash test_scripts/aggregate_trace.sh
 ```
diff --git a/test_scripts/test_train_gpt_distributed_fbd.sh b/test_scripts/test_train_gpt_distributed_fbd.sh
index 99393a113..3148e7f03 100644
--- a/test_scripts/test_train_gpt_distributed_fbd.sh
+++ b/test_scripts/test_train_gpt_distributed_fbd.sh
@@ -21,9 +21,9 @@ PIPELINE_MP_SIZE=2
 VIRTUAL_STAGE_LAYER=1
 
 CHECKPOINT_PATH=/cephfs/shared/liangdahao/megatronapp_ckpt/mcore/gpt_distributed_fbd_ckpt
-VOCAB_FILE=datasets_gpt/vocab.json
-MERGE_FILE=datasets_gpt/merges.txt
-DATA_PATH=/cephfs/shared/liangdahao/megatronapp_ckpt/mcore/datasets_gpt/bloomberg_text_document
+VOCAB_FILE=datasets/gpt/vocab.json
+MERGE_FILE=datasets/gpt/merges.txt
+DATA_PATH=/cephfs/shared/liangdahao/megatronapp_ckpt/mcore/datasets/gpt/bloomberg_text_document
 
 DISTRIBUTED_ARGS="
     --nproc_per_node $GPUS_PER_NODE \
diff --git a/test_scripts/test_train_gpt_single_dpp.sh b/test_scripts/test_train_gpt_single_dpp.sh
index b2812713a..84a1984f1 100644
--- a/test_scripts/test_train_gpt_single_dpp.sh
+++ b/test_scripts/test_train_gpt_single_dpp.sh
@@ -15,9 +15,9 @@ TENSOR_PARALLEL=2
 
 CHECKPOINT_PATH=/path/to/mcore_ckpt/gpt_single_dpp_ckpt                                   #<Specify path>
 TENSORBOARD_LOGS_PATH=/path/to/mcore_ckpt/tb_logs                           #<Specify path>
-VOCAB_FILE=datasets_gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
-MERGE_FILE=datasets_gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
-DATA_PATH=/path/to/datasets_gpt/bloomberg_text_document #<Specify path and file prefix>_text_document
+VOCAB_FILE=datasets/gpt/vocab.json                               #<Specify path to file>/gpt2-vocab.json
+MERGE_FILE=datasets/gpt/merges.txt                               #<Specify path to file>/gpt2-merges.txt
+DATA_PATH=/path/to/datasets/gpt/bloomberg_text_document #<Specify path and file prefix>_text_document
 
 DISTRIBUTED_ARGS=(
     --nproc_per_node $GPUS_PER_NODE
diff --git a/test_scripts/test_train_gpt_single_trace.sh b/test_scripts/test_train_gpt_single_trace.sh
index 494422233..cc2ceb13d 100644
--- a/test_scripts/test_train_gpt_single_trace.sh
+++ b/test_scripts/test_train_gpt_single_trace.sh
@@ -12,9 +12,9 @@ WORLD_SIZE=$(($GPUS_PER_NODE*$NUM_NODES))
 
 CHECKPOINT_PATH=/path/to/mcore_ckpt/gpt_single_trace_ckpt
 TENSORBOARD_LOGS_PATH=/path/to/mcore_ckpt/tb_logs
-VOCAB_FILE=datasets_gpt/vocab.json
-MERGE_FILE=datasets_gpt/merges.txt
-DATA_PATH=/path/to/datasets_gpt/bloomberg_text_document # modify this to your dataset path
+VOCAB_FILE=datasets/gpt/vocab.json
+MERGE_FILE=datasets/gpt/merges.txt
+DATA_PATH=/path/to/datasets/gpt/bloomberg_text_document # modify this to your dataset path
 
 DISTRIBUTED_ARGS=(
     --nproc_per_node $GPUS_PER_NODE 
diff --git a/aggregate.py b/tools/aggregate.py
similarity index 100%
rename from aggregate.py
rename to tools/aggregate.py
diff --git a/profiling/baseline_benchmark.py b/tools/profiling/baseline_benchmark.py
similarity index 100%
rename from profiling/baseline_benchmark.py
rename to tools/profiling/baseline_benchmark.py
diff --git a/profiling/data-1-pipeline-4-tensor-1-dpp-aggregated-iter10.json b/tools/profiling/data-1-pipeline-4-tensor-1-dpp-aggregated-iter10.json
similarity index 100%
rename from profiling/data-1-pipeline-4-tensor-1-dpp-aggregated-iter10.json
rename to tools/profiling/data-1-pipeline-4-tensor-1-dpp-aggregated-iter10.json
diff --git a/profiling/data-1-pipeline-4-tensor-1-dpp-aggregated.json b/tools/profiling/data-1-pipeline-4-tensor-1-dpp-aggregated.json
similarity index 100%
rename from profiling/data-1-pipeline-4-tensor-1-dpp-aggregated.json
rename to tools/profiling/data-1-pipeline-4-tensor-1-dpp-aggregated.json
diff --git a/profiling/data-1-pipeline-4-tensor-1-pp-aggregated-iter10.json b/tools/profiling/data-1-pipeline-4-tensor-1-pp-aggregated-iter10.json
similarity index 100%
rename from profiling/data-1-pipeline-4-tensor-1-pp-aggregated-iter10.json
rename to tools/profiling/data-1-pipeline-4-tensor-1-pp-aggregated-iter10.json
diff --git a/profiling/data-1-pipeline-4-tensor-1-pp-aggregated.json b/tools/profiling/data-1-pipeline-4-tensor-1-pp-aggregated.json
similarity index 100%
rename from profiling/data-1-pipeline-4-tensor-1-pp-aggregated.json
rename to tools/profiling/data-1-pipeline-4-tensor-1-pp-aggregated.json
diff --git a/profiling/merge_trace.py b/tools/profiling/merge_trace.py
similarity index 100%
rename from profiling/merge_trace.py
rename to tools/profiling/merge_trace.py
diff --git a/profiling/merged_trace.json b/tools/profiling/merged_trace.json
similarity index 100%
rename from profiling/merged_trace.json
rename to tools/profiling/merged_trace.json
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-compute-ratios-aggregated.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-compute-ratios-aggregated.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-compute-ratios-aggregated.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-compute-ratios-aggregated.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-compute-windows-aggregated.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-compute-windows-aggregated.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-compute-windows-aggregated.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-compute-windows-aggregated.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-ratios.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-ratios.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-ratios.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-ratios.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-windows.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-windows.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-windows.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-computing-windows.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-ratios.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-ratios.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-ratios.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-ratios.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-windows.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-windows.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-windows.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-dpp-aggregated-sending-windows.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-iter-time.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-iter-time.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-iter-time.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-iter-time.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-peak-memory.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-peak-memory.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-peak-memory.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-peak-memory.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-ratios.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-ratios.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-ratios.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-ratios.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-windows.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-windows.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-windows.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-computing-windows.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-ratios.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-ratios.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-ratios.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-ratios.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-windows.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-windows.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-windows.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-pp-aggregated-sending-windows.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-reduce-ratios-aggregated.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-reduce-ratios-aggregated.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-reduce-ratios-aggregated.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-reduce-ratios-aggregated.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-reduce-windows-aggregated.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-reduce-windows-aggregated.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-reduce-windows-aggregated.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-reduce-windows-aggregated.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-sending-ratios-aggregated.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-sending-ratios-aggregated.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-sending-ratios-aggregated.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-sending-ratios-aggregated.png
diff --git a/profiling/plots/data-1-pipeline-4-tensor-1-sending-windows-aggregated.png b/tools/profiling/plots/data-1-pipeline-4-tensor-1-sending-windows-aggregated.png
similarity index 100%
rename from profiling/plots/data-1-pipeline-4-tensor-1-sending-windows-aggregated.png
rename to tools/profiling/plots/data-1-pipeline-4-tensor-1-sending-windows-aggregated.png
diff --git a/profiling/process_computing_ratio.py b/tools/profiling/process_computing_ratio.py
similarity index 100%
rename from profiling/process_computing_ratio.py
rename to tools/profiling/process_computing_ratio.py
diff --git a/profiling/process_computing_ratio_aggregated.py b/tools/profiling/process_computing_ratio_aggregated.py
similarity index 100%
rename from profiling/process_computing_ratio_aggregated.py
rename to tools/profiling/process_computing_ratio_aggregated.py
diff --git a/profiling/process_computing_window.py b/tools/profiling/process_computing_window.py
similarity index 100%
rename from profiling/process_computing_window.py
rename to tools/profiling/process_computing_window.py
diff --git a/profiling/process_computing_window_aggregated.py b/tools/profiling/process_computing_window_aggregated.py
similarity index 100%
rename from profiling/process_computing_window_aggregated.py
rename to tools/profiling/process_computing_window_aggregated.py
diff --git a/profiling/process_iter_times.py b/tools/profiling/process_iter_times.py
similarity index 100%
rename from profiling/process_iter_times.py
rename to tools/profiling/process_iter_times.py
diff --git a/profiling/process_peak_memory.py b/tools/profiling/process_peak_memory.py
similarity index 100%
rename from profiling/process_peak_memory.py
rename to tools/profiling/process_peak_memory.py
diff --git a/profiling/process_sending_ratio.py b/tools/profiling/process_sending_ratio.py
similarity index 100%
rename from profiling/process_sending_ratio.py
rename to tools/profiling/process_sending_ratio.py
diff --git a/profiling/process_sending_ratio_aggregated.py b/tools/profiling/process_sending_ratio_aggregated.py
similarity index 100%
rename from profiling/process_sending_ratio_aggregated.py
rename to tools/profiling/process_sending_ratio_aggregated.py
diff --git a/profiling/process_sending_window.py b/tools/profiling/process_sending_window.py
similarity index 100%
rename from profiling/process_sending_window.py
rename to tools/profiling/process_sending_window.py
diff --git a/profiling/process_sending_window_aggregated.py b/tools/profiling/process_sending_window_aggregated.py
similarity index 100%
rename from profiling/process_sending_window_aggregated.py
rename to tools/profiling/process_sending_window_aggregated.py
diff --git a/profiling/process_trace.py b/tools/profiling/process_trace.py
similarity index 100%
rename from profiling/process_trace.py
rename to tools/profiling/process_trace.py
diff --git a/profiling/setup.py b/tools/profiling/setup.py
similarity index 100%
rename from profiling/setup.py
rename to tools/profiling/setup.py
diff --git a/profiling/shm_benchmark.cpp b/tools/profiling/shm_benchmark.cpp
similarity index 100%
rename from profiling/shm_benchmark.cpp
rename to tools/profiling/shm_benchmark.cpp
diff --git a/profiling/shm_benchmark_test.py b/tools/profiling/shm_benchmark_test.py
similarity index 100%
rename from profiling/shm_benchmark_test.py
rename to tools/profiling/shm_benchmark_test.py
diff --git a/transformer-visualize/.gitignore b/tools/visualization/transformer-visualize/.gitignore
similarity index 100%
rename from transformer-visualize/.gitignore
rename to tools/visualization/transformer-visualize/.gitignore
diff --git a/transformer-visualize/README.md b/tools/visualization/transformer-visualize/README.md
similarity index 100%
rename from transformer-visualize/README.md
rename to tools/visualization/transformer-visualize/README.md
diff --git a/transformer-visualize/components.d.ts b/tools/visualization/transformer-visualize/components.d.ts
similarity index 100%
rename from transformer-visualize/components.d.ts
rename to tools/visualization/transformer-visualize/components.d.ts
diff --git a/transformer-visualize/index.html b/tools/visualization/transformer-visualize/index.html
similarity index 100%
rename from transformer-visualize/index.html
rename to tools/visualization/transformer-visualize/index.html
diff --git a/transformer-visualize/package-lock.json b/tools/visualization/transformer-visualize/package-lock.json
similarity index 100%
rename from transformer-visualize/package-lock.json
rename to tools/visualization/transformer-visualize/package-lock.json
diff --git a/transformer-visualize/package.json b/tools/visualization/transformer-visualize/package.json
similarity index 100%
rename from transformer-visualize/package.json
rename to tools/visualization/transformer-visualize/package.json
diff --git a/transformer-visualize/public/vite.svg b/tools/visualization/transformer-visualize/public/vite.svg
similarity index 100%
rename from transformer-visualize/public/vite.svg
rename to tools/visualization/transformer-visualize/public/vite.svg
diff --git a/transformer-visualize/src/App.vue b/tools/visualization/transformer-visualize/src/App.vue
similarity index 100%
rename from transformer-visualize/src/App.vue
rename to tools/visualization/transformer-visualize/src/App.vue
diff --git a/transformer-visualize/src/AppContent.vue b/tools/visualization/transformer-visualize/src/AppContent.vue
similarity index 100%
rename from transformer-visualize/src/AppContent.vue
rename to tools/visualization/transformer-visualize/src/AppContent.vue
diff --git a/transformer-visualize/src/assets/vue.svg b/tools/visualization/transformer-visualize/src/assets/vue.svg
similarity index 100%
rename from transformer-visualize/src/assets/vue.svg
rename to tools/visualization/transformer-visualize/src/assets/vue.svg
diff --git a/transformer-visualize/src/components/AttentionMatrix.vue b/tools/visualization/transformer-visualize/src/components/AttentionMatrix.vue
similarity index 100%
rename from transformer-visualize/src/components/AttentionMatrix.vue
rename to tools/visualization/transformer-visualize/src/components/AttentionMatrix.vue
diff --git a/transformer-visualize/src/components/ColoredVector.vue b/tools/visualization/transformer-visualize/src/components/ColoredVector.vue
similarity index 100%
rename from transformer-visualize/src/components/ColoredVector.vue
rename to tools/visualization/transformer-visualize/src/components/ColoredVector.vue
diff --git a/transformer-visualize/src/components/HelloWorld.vue b/tools/visualization/transformer-visualize/src/components/HelloWorld.vue
similarity index 100%
rename from transformer-visualize/src/components/HelloWorld.vue
rename to tools/visualization/transformer-visualize/src/components/HelloWorld.vue
diff --git a/transformer-visualize/src/components/MLPVector.vue b/tools/visualization/transformer-visualize/src/components/MLPVector.vue
similarity index 100%
rename from transformer-visualize/src/components/MLPVector.vue
rename to tools/visualization/transformer-visualize/src/components/MLPVector.vue
diff --git a/transformer-visualize/src/components/MLPVectors.vue b/tools/visualization/transformer-visualize/src/components/MLPVectors.vue
similarity index 100%
rename from transformer-visualize/src/components/MLPVectors.vue
rename to tools/visualization/transformer-visualize/src/components/MLPVectors.vue
diff --git a/transformer-visualize/src/components/OutputProbs.vue b/tools/visualization/transformer-visualize/src/components/OutputProbs.vue
similarity index 100%
rename from transformer-visualize/src/components/OutputProbs.vue
rename to tools/visualization/transformer-visualize/src/components/OutputProbs.vue
diff --git a/transformer-visualize/src/components/PCAPlot.vue b/tools/visualization/transformer-visualize/src/components/PCAPlot.vue
similarity index 100%
rename from transformer-visualize/src/components/PCAPlot.vue
rename to tools/visualization/transformer-visualize/src/components/PCAPlot.vue
diff --git a/transformer-visualize/src/components/QKVMatrix.vue b/tools/visualization/transformer-visualize/src/components/QKVMatrix.vue
similarity index 100%
rename from transformer-visualize/src/components/QKVMatrix.vue
rename to tools/visualization/transformer-visualize/src/components/QKVMatrix.vue
diff --git a/transformer-visualize/src/components/QKVVector.vue b/tools/visualization/transformer-visualize/src/components/QKVVector.vue
similarity index 100%
rename from transformer-visualize/src/components/QKVVector.vue
rename to tools/visualization/transformer-visualize/src/components/QKVVector.vue
diff --git a/transformer-visualize/src/components/QKVVectors.vue b/tools/visualization/transformer-visualize/src/components/QKVVectors.vue
similarity index 100%
rename from transformer-visualize/src/components/QKVVectors.vue
rename to tools/visualization/transformer-visualize/src/components/QKVVectors.vue
diff --git a/transformer-visualize/src/main.ts b/tools/visualization/transformer-visualize/src/main.ts
similarity index 100%
rename from transformer-visualize/src/main.ts
rename to tools/visualization/transformer-visualize/src/main.ts
diff --git a/transformer-visualize/src/style.css b/tools/visualization/transformer-visualize/src/style.css
similarity index 100%
rename from transformer-visualize/src/style.css
rename to tools/visualization/transformer-visualize/src/style.css
diff --git a/transformer-visualize/src/vite-env.d.ts b/tools/visualization/transformer-visualize/src/vite-env.d.ts
similarity index 100%
rename from transformer-visualize/src/vite-env.d.ts
rename to tools/visualization/transformer-visualize/src/vite-env.d.ts
diff --git a/transformer-visualize/tsconfig.app.json b/tools/visualization/transformer-visualize/tsconfig.app.json
similarity index 100%
rename from transformer-visualize/tsconfig.app.json
rename to tools/visualization/transformer-visualize/tsconfig.app.json
diff --git a/transformer-visualize/tsconfig.json b/tools/visualization/transformer-visualize/tsconfig.json
similarity index 100%
rename from transformer-visualize/tsconfig.json
rename to tools/visualization/transformer-visualize/tsconfig.json
diff --git a/transformer-visualize/tsconfig.node.json b/tools/visualization/transformer-visualize/tsconfig.node.json
similarity index 100%
rename from transformer-visualize/tsconfig.node.json
rename to tools/visualization/transformer-visualize/tsconfig.node.json
diff --git a/transformer-visualize/vite.config.ts b/tools/visualization/transformer-visualize/vite.config.ts
similarity index 100%
rename from transformer-visualize/vite.config.ts
rename to tools/visualization/transformer-visualize/vite.config.ts
diff --git a/transformer-visualize/yarn.lock b/tools/visualization/transformer-visualize/yarn.lock
similarity index 100%
rename from transformer-visualize/yarn.lock
rename to tools/visualization/transformer-visualize/yarn.lock