# CA-MTL **Repository Path**: huhu5/CA-MTL ## Basic Information - **Project Name**: CA-MTL - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-07 - **Last Updated**: 2026-04-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Conditional Adaptive Multi-Task Learning: a Hypernetwork for NLU

Python Python Python Python

The source code uses the [huggingface implementation](https://github.com/huggingface/transformers) of transformers adapted for multitask training. Our paper was accepted at ICLR 2021 (https://openreview.net/pdf?id=de11dbHzAMF). ## Requirements - Python 3.7 - Pytorch 1.6 - Huggingface transformers 2.11.0 Note: Newer versions of the requirements should work, but was not tested. ### Using a virual environment ```bash # Create a virtual environment python3.7 -m venv ca_mtl_env source ca_mtl_env/bin/activate # Install the requirements pip install requirements-with-torch.txt # If you are using an environment that have torch already installed use "requirements.txt" ``` ### Using Docker We created a public docker image that contains all the requirements and the source code hosted on DockerHub. ```bash export DOCKER_IMG=DOCKER_FILE:latest # Pull the image docker pull $DOCKER_IMG ``` ## Data Using the [official GLUE data download scirpt](https://github.com/nyu-mll/jiant/blob/master/scripts/download_glue_data.py), download all datasets ```bash export DATA_DIR=/my/data/dir/path # Make sure the data folder exists python download_glue_data.py --data_dir $DATA_DIR --tasks all ``` ## Run training ```bash # Set the output dir export OUTPUT_DIR=/path/to/output/dir ``` ### Using the created virtual environment ```bash python run.py --model_name_or_path CA-MTL-base --data_dir $DATA_DIR --output_dir $OUTPUT_DIR --do_train --do_eval --num_train_epochs 5 --learning_rate 2e-5 --seed 12 --overwrite_cache ``` ### Using the pulled docker image ```bash docker run -v /data:$DATA_DIR $DOCKER_IMG --model_name_or_path CA-MTL-base --data_dir $DATA_DIR --output_dir $OUTPUT_DIR --do_train --do_eval --num_train_epochs 5 --learning_rate 2e-5 --seed 12 --overwrite_cache ``` Add parameter `--use_mt_uncertainty` to use the uncertainty sampling technique described in the paper. To use uniform sampling use `--uniform_mt_sampling`. Otherwise, the tasks will be sequentially sampled until data runs out. To freeze layers, use `--freeze_encoder_layers 0-N`. Results in the paper are based on `N=4` for base and `N=11` for large models. Note that you may remove `--overwrite_cache` to make data loading faster. ### Usage ``` usage: run.py [-h] --model_name_or_path MODEL_NAME_OR_PATH --data_dir DATA_DIR [--tasks TASKS [TASKS ...]] [--overwrite_cache] [--max_seq_length MAX_SEQ_LENGTH] --output_dir OUTPUT_DIR [--overwrite_output_dir] [--do_train] [--do_eval] [--do_predict] [--evaluate_during_training] [--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE] [--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE] [--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE] [--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--learning_rate LEARNING_RATE] [--weight_decay WEIGHT_DECAY] [--adam_epsilon ADAM_EPSILON] [--max_grad_norm MAX_GRAD_NORM] [--num_train_epochs NUM_TRAIN_EPOCHS] [--max_steps MAX_STEPS] [--warmup_steps WARMUP_STEPS] [--logging_dir LOGGING_DIR] [--logging_first_step] [--logging_steps LOGGING_STEPS] [--save_steps SAVE_STEPS] [--save_total_limit SAVE_TOTAL_LIMIT] [--no_cuda] [--seed SEED] [--fp16] [--fp16_opt_level FP16_OPT_LEVEL] [--local_rank LOCAL_RANK] [--tpu_num_cores TPU_NUM_CORES] [--tpu_metrics_debug] [--use_mt_uncertainty] optional arguments: -h, --help show this help message and exit --model_name_or_path MODEL_NAME_OR_PATH Path to pretrained model or model identifier from: CA- MTL-base, CA-MTL-large, bert-base-cased bert-base- uncased, bert-large-cased, bert-large-uncased --data_dir DATA_DIR The input data dir. Should contain the .tsv files (or other data files) for the task. --tasks TASKS [TASKS ...] The task file that contains the tasks to train on. If None all tasks will be used --overwrite_cache Overwrite the cached training and evaluation sets --max_seq_length MAX_SEQ_LENGTH The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. --output_dir OUTPUT_DIR The output directory where the model predictions and checkpoints will be written. --overwrite_output_dir Overwrite the content of the output directory.Use this to continue training if output_dir points to a checkpoint directory. --do_train Whether to run training. --do_eval Whether to run eval on the dev set. --do_predict Whether to run predictions on the test set. --evaluate_during_training Run evaluation during training at each logging step. --per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE Batch size per GPU/TPU core/CPU for training. --per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE Batch size per GPU/TPU core/CPU for evaluation. --per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE Deprecated, the use of `--per_device_train_batch_size` is preferred. Batch size per GPU/TPU core/CPU for training. --per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE Deprecated, the use of `--per_device_eval_batch_size` is preferred.Batch size per GPU/TPU core/CPU for evaluation. --gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS Number of updates steps to accumulate before performing a backward/update pass. --learning_rate LEARNING_RATE The initial learning rate for Adam. --weight_decay WEIGHT_DECAY Weight decay if we apply some. --adam_epsilon ADAM_EPSILON Epsilon for Adam optimizer. --max_grad_norm MAX_GRAD_NORM Max gradient norm. --num_train_epochs NUM_TRAIN_EPOCHS Total number of training epochs to perform. --max_steps MAX_STEPS If > 0: set total number of training steps to perform. Override num_train_epochs. --warmup_steps WARMUP_STEPS Linear warmup over warmup_steps. --logging_dir LOGGING_DIR Tensorboard log dir. --logging_first_step Log and eval the first global_step --logging_steps LOGGING_STEPS Log every X updates steps. --save_steps SAVE_STEPS Save checkpoint every X updates steps. --save_total_limit SAVE_TOTAL_LIMIT Limit the total amount of checkpoints.Deletes the older checkpoints in the output_dir. Default is unlimited checkpoints --no_cuda Do not use CUDA even when it is available --seed SEED random seed for initialization --fp16 Whether to use 16-bit (mixed) precision (through NVIDIA apex) instead of 32-bit --fp16_opt_level FP16_OPT_LEVEL For fp16: Apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3'].See details at https://nvidia.github.io/apex/amp.html --local_rank LOCAL_RANK For distributed training: local_rank --tpu_num_cores TPU_NUM_CORES TPU: Number of TPU cores (automatically passed by launcher script) --tpu_metrics_debug TPU: Whether to print debug metrics --use_mt_uncertainty Use MT-Uncertainty sampling method ``` Since our code is based on the [huggingface implementation](https://github.com/huggingface/transformers). All parameters are described in their [documentation](https://huggingface.co/transformers/main_classes/trainer.html?highlight=trainer) ## License MIT ## How do I cite CA-MTL ? ``` @inproceedings{ pilault2021conditionally, title={Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in {\{}NLP{\}} Using Fewer Parameters {\&} Less Data}, author={Jonathan Pilault and Amine El hattami and Christopher Pal}, booktitle={International Conference on Learning Representations}, year={2021}, url={https://openreview.net/forum?id=de11dbHzAMF} } ``` ## Contact and Contribution For any question or request, please create a Github issue in this repository.