一键式RLHF训练 DeepSpeed Chat（二）：实践篇 (2024)

之前给大家分享了一键式 RLHF 训练 DeepSpeed Chat（一）：理论篇，本文给大家分享如何使用DeepSpeed Chat进行RLHF训练。

DeepSpeed Chat 的 RLHF 训练流程包括三个主要阶段：

第一阶段：监督微调（SFT） —— 使用精选的人类回答来微调预训练的语言模型以应对各种查询（query）；
第二阶段：奖励模型微调 —— 使用一个包含人类对同一查询（query）的多个答案打分的数据集来训练一个独立的（通常比 SFT 小的）奖励模型（RW）；
第三阶段：RLHF 训练 —— 利用 Proximal Policy Optimization（PPO）算法，根据 RW 模型的奖励反馈进一步微调 SFT 模型。

环境搭建

基础环境配置如下：

操作系统: Ubuntu 18.04
CPUs: 单个节点具有 1TB 内存的 Intel CPU，物理CPU个数为64，每颗CPU核数为16
GPUs: 8 卡 A800 80GB GPUs
Python: 3.10 (需要先升级OpenSSL到1.1.1t版本（点击下载OpenSSL），然后再编译安装Python)，点击下载Python
NVIDIA驱动程序版本: 515.65.01，根据不同型号选择不同的驱动程序，点击下载。
CUDA工具包: 11.7，点击下载
NCCL: nccl_2.14.3-1+cuda11.7，点击下载
cuDNN: 8.8.1.3_cuda11，点击下载

上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。

创建虚拟环境并激活虚拟环境deepspeedchat-venv-py310-cu117：

cd /home/guodong.li/virtual-venvvirtualenv -p /usr/bin/python3.10 deepspeedchat-venv-py310-cu117source /home/guodong.li/virtual-venv/deepspeedchat-venv-py310-cu117/bin/activate

离线安装PyTorch，点击下载对应cuda版本的torch即可。

pip install torch-1.13.1+cu117-cp310-cp310-linux_x86_64.whl

安装deepspeed、transformers等其他依赖包。

pip install -r requirements.txt

requirements.txt文件内容如下：

deepspeed==0.9.1transformers==4.28.1datasets>=2.8.0sentencepiece>=0.1.97protobuf==3.20.3accelerate>=0.15.0

数据集、模型和代码准备

由于服务器无法访问外网，因此，本地预先下载数据集和模型。

对于数据集，使用了Huggingface Datasets的那些开源数据集。得益于 DeepSpeed RLHF 数据抽象和混合技术，现在能够组合多个数据源进行训练。但是，不同的数据集可能使用不同的提示词（例如，Dohas/rm-static 使用“Human:”进行查询，使用“Assistant:”进行回答）。因此，用户必须自行对齐这些提示（prompt）。在DeepSpeed Chat的示例中，始终使用 Dohas/rm-static 中的格式。通过评估，发现合并不同的数据集可以提高模型的质量。

下载数据集：

git clone https://huggingface.co/datasets/Dahoas/rm-staticgit clone https://huggingface.co/datasets/Dahoas/full-hh-rlhfgit clone https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwisegit clone https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets

Dahoas/rm-static数据集格式：

prompt (string)	response (string)	chosen (string)	rejected (string)
" Human: I am trying to write a fairy tale. What is the most popular plot? Assistant: The most popular plot might be “a princess goes to a faraway land, falls in love, and has a magic genie that grants her wishes”. We can find plenty of examples of this if we search for fairy tales using the search engine Google. Human: You can't look anything up on google. Assistant: OK, so let’s think about some other popular fairy tales. Human: Do you like the plot of Hansel and Gretel? ... Assistant:"	" This sounds like a really interesting modern retelling of the story!"	" This sounds like a really interesting modern retelling of the story!"	" And the prince and the princess both decide that they are more powerful together than apart?"

Dahoas/full-hh-rlhf数据集格式：

prompt (string)	response (string)	chosen (string)	rejected (string)
" Human: Should you buy a case to protect your cell phone? Assistant: It depends on your circ*mstances. If you carry your phone in a pocket or a purse then you probably want a case. But if you only need a phone for quick interactions, a case may actually cause more harm than good. What do you need the phone for? Are you a parent, or do you work from home? Human: What harm could it do? ... Assistant:"	" You’re welcome."	" You’re welcome."	" It sounds like you’ve got the basics down. Any further questions or concerns? You can send me any feedback you have at help@babba.ai."

Dahoas/synthetic-instruct-gptj-pairwise数据集格式：

prompt (string)	chosen (string)	rejected (string)
"I was wondering if you could walk me through the process of setting up a hydroponic garden for herbs."	"Sure! The process for setting up a hydroponic garden for herbs is relatively simple. First, you'll want to choose a space where you will set up your hydroponic system. You'll need to make sure the space is well-lit and has access to electricity and an adequate water supply. Next, you'll need to choose the type of hydroponic system you want to use. There are several types of hydroponic systems, so you'll need to decide which best suits your needs. Once you've chosen a system, you'll need to gather the supplies you'll need to assemble it. This includes things like pumps, growing trays, grow lights, and nutrients. Once you've assembled the system, you'll need to add your choice of herbs to the system. Lastly, you'll need to monitor and adjust the system as needed to ensure your herbs are getting the correct amount of light, water, and nutrients."	"How do I store a bagels for eating at a later date? You can place the bagels in an airtight container and reheat them in the microwave. Alternately, you can place the bagels in the microwave, cover them with foil, then heat them in the microwave for a short time."

yitingxie/rlhf-reward-datasets数据集格式：

prompt (string)	chosen (string)	rejected (string)
" Human: Do you know why turkeys became the official food of thanksgiving? "	"Assistant: To be honest, I don’t know anything about that. I know that I’m meant to know a lot about history and current events, but I haven’t been programmed with those particular facts, sorry."	"Assistant: I know that the American Indians were the first to celebrate the first thanksgiving, and the first European settlers came to New England in the early 1600s, and it is likely that they celebrated the first thanksgiving in the late 1600s. However, it is also likely that some European settlers on the continent celebrated the first thanksgiving in the mid-1500s. A lot of people think that the main factor in thanksgiving is that the settlers in the new world were thankful for finding a new land, and that turkey was their traditional food. Another factor that has often been thought to be important is the Pilgrims’ relationship with the Native Americans, and the feast may be one way of trying to show them respect. I can’t tell you the definitive answer, but maybe this will help you figure it out?"

由于 GPT3 没有开源checkpoint，我们使用 Meta OPT 系列的预训练模型（如： facebook/opt-1.3b），当然也可以使用其他预训练模型（如：GPT-Neo、Bloom 等）。

本文使用opt-2.7b训练Actor模型，使用opt-350m训练reward模型，下载模型：

git clone https://huggingface.co/facebook/opt-350mgit clone https://huggingface.co/facebook/opt-2.7b

RLHF 训练

下载DeepSpeedExamples代码并进入DeepSpeed Chat目录：

# commit id: 9a586b1git clone https://github.com/microsoft/.gitcd DeepSpeedExamples/applications/DeepSpeed-Chat/

查看代码结构：

> tree.|____training # 训练| |____utils # 工具类| | |____utils.py| | |____model # 模型工具类| | | |____reward_model.py| | | |____model_utils.py| | |____module| | | |____lora.py| | |____ds_utils.py| | |____data # 数据处理工具类| | | |____data_utils.py| | | |____raw_datasets.py| |____step1_supervised_finetuning # 第一阶段：有监督微调| | |____training_log_output| | | |____opt-1.3b-globalBatchSize128.log| | |____main.py| | |____training_scripts # 模型训练脚本| | | |____other_language| | | | |____run_chinese.sh # 基于bloom的有监督微调| | | | |____run_japanese.sh # 基于mGPT的有监督微调| | | |____multi_node # 多机多卡训练脚本| | | | |____run_66b.sh| | | |____README.md| | | |____single_node # 单机多卡训练脚本| | | | |____run_1.3b_lora.sh| | | | |____run_13b.sh| | | | |____run_1.3b.sh| | | | |____run_30b_lora.sh| | | | |____run_6.7b.sh| | | |____single_gpu # 单卡训练脚本| | | | |____run_6.7b_lora.sh| | | | |____run_1.3b.sh| | |____evaluation_scripts| | | |____run_prompt.sh| | |____README.md| | |____prompt_eval.py| |____step2_reward_model_finetuning # 第二阶段：奖励模型微调| | |____rw_eval.py| | |____training_log_output| | | |____opt-350m_globalBatchSize-64.log| | |____main.py| | |____training_scripts 训练脚本| | | |____multi_node # 多机多卡训练脚本| | | | |____run_350m.sh| | | |____README.md| | | |____single_node # 单机多卡训练脚本| | | | |____run_350m.sh| | | |____single_gpu # 单卡训练脚本| | | | |____run_350m.sh| | |____evaluation_scripts # 模型评估脚本| | | |____run_eval.sh| | |____README.md| |____README.md| |____step3_rlhf_finetuning # 人工反馈强化学习微调| | |____ppo_trainer.py| | |____training_log_output| | | |____actor_opt-1.3b_critic_opt-350m_globalBatchSize64.log| | |____main.py| | |____BenckmarkSetting.md| | |____training_scripts # 模型训练脚本| | | |____multi_node # 多机多卡训练脚本| | | | |____run_66b.sh| | | |____README.md| | | |____single_node # 单机多卡训练脚本| | | | |____run_1.3b_lora.sh| | | | |____run_13b.sh| | | | |____run_1.3b.sh| | | | |____run_30b_lora.sh| | | | |____run_6.7b.sh| | | |____single_gpu # 单卡训练脚本| | | | |____run_6.7b_lora.sh| | | | |____run_1.3b.sh| | |____rlhf_engine.py| | |____README.md|____train.py # 训练入口|____chat.py|____README.md|____requirements.txt|____inference # 推理| |____chatbot.py

修改training/utils/data/raw_datasets.py文件，将数据集改为本地加载。

# The template prompt dataset class that all new dataset porting needs to# follow in order to have a unified API and unified data format.class PromptRawDataset(object): def __init__(self, output_path, seed, local_rank, dataset_name): self.output_path = output_path self.seed = seed self.local_rank = local_rank if not ("Dahoas/rm-static" == dataset_name or "Dahoas/full-hh-rlhf" == dataset_name or "Dahoas/synthetic-instruct-gptj-pairwise" == dataset_name or "yitingxie/rlhf-reward-datasets" == dataset_name): self.raw_datasets = load_dataset(dataset_name)# English datasetclass DahoasRmstaticDataset(PromptRawDataset): def __init__(self, output_path, seed, local_rank, dataset_name): super().__init__(output_path, seed, local_rank, dataset_name) self.dataset_name = "Dahoas/rm-static" self.dataset_name_clean = "Dahoas_rm_static" data_files = {"train":"train-00000-of-00001-2a1df75c6bce91ab.parquet","test":"test-00000-of-00001-8c7c51afc6d45980.parquet"} self.raw_datasets = load_dataset("parquet", data_dir='/home/guodong.li/data/dahoas/rm-static/data', data_files=data_files)# English datasetclass DahoasFullhhrlhfDataset(PromptRawDataset): def __init__(self, output_path, seed, local_rank, dataset_name): super().__init__(output_path, seed, local_rank, dataset_name) self.dataset_name = "Dahoas/full-hh-rlhf" self.dataset_name_clean = "Dahoas_full_hh_rlhf" data_files = {"train":"train-00000-of-00001-8349d0765e6718df.parquet","test":"test-00000-of-00001-ec71e9262143a91c.parquet"} self.raw_datasets = load_dataset("parquet", data_dir='/home/guodong.li/data/dahoas/full-hh-rlhf/data', data_files=data_files)# English datasetclass DahoasSyntheticinstructgptjpairwiseDataset(PromptRawDataset): def __init__(self, output_path, seed, local_rank, dataset_name): super().__init__(output_path, seed, local_rank, dataset_name) self.dataset_name = "Dahoas/synthetic-instruct-gptj-pairwise" self.dataset_name_clean = "Dahoas_synthetic_instruct_gptj_pairwise" data_files = {"train":"train-00000-of-00001-1e5d57b93c448e7a.parquet"} self.raw_datasets = load_dataset("parquet", data_dir='/home/guodong.li/data/dahoas/synthetic-instruct-gptj-pairwise/data', data_files=data_files)# English datasetclass YitingxieRlhfrewarddatasetsDataset(PromptRawDataset): def __init__(self, output_path, seed, local_rank, dataset_name): super().__init__(output_path, seed, local_rank, dataset_name) self.dataset_name = "yitingxie/rlhf-reward-datasets" self.dataset_name_clean = "yitingxie_rlhf_reward_datasets" data_files = {"train":"train-00000-of-00001-2ea3039ca4da89f8.parquet","test":"test-00000-of-00001-955c146ec7a10a1e.parquet"} self.raw_datasets = load_dataset("parquet", data_dir='/home/guodong.li/data/dahoas/rlhf-reward-datasets/data', data_files=data_files)

第一阶段：有监督的模型微调（SFT）

有监督微调 (SFT) 非常类似于针对因果语言任务（例如：WikiText-103）的标准语言模型微调。主要区别在于数据集资源，SFT 用高质量的查询-答案对来微调模型以实现人类偏好的生成。

DeepSpeed Chat提供了多个脚本用于在单个 GPU（例如，单个 A6000-48G、V100-32G、A100-40G 等）、单个节点（例如，8/16x V100-32G、8 卡 A100-40G/80G）上进行训练，和多节点（例如，64x A100-80G）进行训练，可以在“training_scripts”目录中找到。

这里我使用单机多卡进行有监督的微调，同时修改opt-13b的训练脚本，但是实际上使用的是opt-2.7b的模型进行微调。

修改有监督微调的训练脚本training/step1_supervised_finetuning/training_scripts/single_node/run_13b.sh：

#!/bin/bash# DeepSpeed TeamOUTPUT=$1ZERO_STAGE=$2if [ "$OUTPUT" == "" ]; then OUTPUT=./outputfiif [ "$ZERO_STAGE" == "" ]; then ZERO_STAGE=3fimkdir -p $OUTPUTdeepspeed main.py \ --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \ --data_split 2,4,4 \ --model_name_or_path /home/guodong.li/model/hf-opt-2.7b \ --per_device_train_batch_size 128 \ --per_device_eval_batch_size 4 \ --max_seq_len 512 \ --learning_rate 1e-4 \ --weight_decay 0. \ --num_train_epochs 6 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --gradient_checkpointing \ --zero_stage $ZERO_STAGE \ --lora_dim 128 \ --lora_module_name decoder.layers. \ --deepspeed \ --output_dir $OUTPUT \ &> $OUTPUT/training.log

运行命令：

# Move into the first step of the pipelinecd training/step1_supervised_finetuning/sh training_scripts/single_node/run_13b.sh /home/guodong.li/output/deepspeedchat 1

通过日志文件training.log查看训练过程，也可以通过命令tail -n100 -f training.log进行滚动日志查看：

[2023-05-01 11:13:24,604] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.[2023-05-01 11:13:25,933] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/deepspeedchat-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets --data_split 2,4,4 --model_name_or_path /home/guodong.li/model/hf-opt-2.7b --per_device_train_batch_size 128 --per_device_eval_batch_size 4 --max_seq_len 512 --learning_rate 1e-4 --weight_decay 0. --num_train_epochs 6 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --num_warmup_steps 0 --seed 1234 --gradient_checkpointing --zero_stage 1 --lora_dim 128 --lora_module_name decoder.layers. --deepspeed --output_dir /home/guodong.li/output/deepspeedchat[2023-05-01 11:13:28,673] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0[2023-05-01 11:13:28,673] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1[2023-05-01 11:13:28,673] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}[2023-05-01 11:13:28,673] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0[2023-05-01 11:13:28,673] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})[2023-05-01 11:13:28,673] [INFO] [launch.py:247:main] dist_world_size=8[2023-05-01 11:13:28,673] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7...[2023-05-01 11:15:41,305] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam[2023-05-01 11:15:41,305] [INFO] [utils.py:51:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>[2023-05-01 11:15:41,306] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 1 optimizer[2023-05-01 11:15:41,306] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000[2023-05-01 11:15:41,306] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000[2023-05-01 11:15:41,306] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False[2023-05-01 11:15:41,306] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False...Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja...Building extension module utils...Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...ninja: no work to do.Loading extension module utils...Time to load utils op: 0.1425342559814453 seconds...Loading extension module utils...Time to load utils op: 0.20241904258728027 secondsRank: 7 partition count [8, 8] and sizes[(40356800, False), (112960, False)]...Rank: 6 partition count [8, 8] and sizes[(40356800, False), (112960, False)]Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Time to load utils op: 0.0005517005920410156 secondsUsing /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root......[2023-05-01 11:16:01,201] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 131.02 GB, percent = 13.0%[2023-05-01 11:16:01,203] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam[2023-05-01 11:16:01,204] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler[2023-05-01 11:16:01,204] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f892bf2b7c0>[2023-05-01 11:16:01,204] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0001, 0.0001], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 11:16:01,205] [INFO] [config.py:953:print] DeepSpeedEngine configuration:[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false}[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] amp_enabled .................. False[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] amp_params ................... False[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3}[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] bfloat16_enabled ............. False[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f8913bf6890>[2023-05-01 11:16:01,205] [INFO] [config.py:957:print] communication_data_type ...... None[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] curriculum_params_legacy ..... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] data_efficiency_enabled ...... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] dataloader_drop_last ......... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] disable_allgather ............ False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] dump_state ................... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'min_scale': 1}...[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] eigenvalue_verbose ........... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] elasticity_enabled ........... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null}[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] fp16_auto_cast ............... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] fp16_enabled ................. True[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] global_rank .................. 0[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] grad_accum_dtype ............. None[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] gradient_accumulation_steps .. 8...[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null}[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False[2023-05-01 11:16:01,206] [INFO] [config.py:957:print] optimizer_name ............... None...[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] steps_per_print .............. 10[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] train_batch_size ............. 8192[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 128[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] use_node_local_storage ....... False[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] wall_clock_breakdown ......... False[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] world_size ................... 8[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] zero_allow_untested_optimizer False[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] zero_enabled ................. True[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True[2023-05-01 11:16:01,207] [INFO] [config.py:957:print] zero_optimization_stage ...... 1[2023-05-01 11:16:01,207] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 8.192000e+03, "train_micro_batch_size_per_gpu": 128, "steps_per_print": 10, "zero_optimization": { "stage": 1, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 }}Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Time to load utils op: 0.00029349327087402344 seconds***** Running training ********** Evaluating perplexity, Epoch 0/6 *****ppl: 6027.47900390625Beginning of Epoch 1/6, Total Micro Batches 58[2023-05-01 11:17:44,166] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1...[2023-05-01 11:23:15,323] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096***** Evaluating perplexity, Epoch 1/6 *****ppl: 3730.6748046875Beginning of Epoch 2/6, Total Micro Batches 58[2023-05-01 11:30:23,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=5, lr=[9.73465064747553e-05, 9.73465064747553e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 11:30:23,648] [INFO] [timer.py:199:stop] epoch=1/micro_step=22/global_step=10, RunningAvgSamplesPerSec=99.38274452057438, CurrSamplesPerSec=98.13850311863031, MemAllocated=12.12GB, MaxMemAllocated=41.8GB***** Evaluating perplexity, Epoch 2/6 *****ppl: 227.2152557373047Beginning of Epoch 3/6, Total Micro Batches 58[2023-05-01 11:44:31,098] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=5, lr=[7.777851165098012e-05, 7.777851165098012e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 11:44:31,132] [INFO] [timer.py:199:stop] epoch=2/micro_step=44/global_step=20, RunningAvgSamplesPerSec=98.99445999331354, CurrSamplesPerSec=98.03770159255755, MemAllocated=12.12GB, MaxMemAllocated=41.8GB***** Evaluating perplexity, Epoch 3/6 *****ppl: 3.6747963428497314Beginning of Epoch 4/6, Total Micro Batches 58***** Evaluating perplexity, Epoch 4/6 *****ppl: 2.5551822185516357Beginning of Epoch 5/6, Total Micro Batches 58[2023-05-01 11:58:48,222] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=5, lr=[4.6729843538492847e-05, 4.6729843538492847e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 11:58:48,261] [INFO] [timer.py:199:stop] epoch=4/micro_step=8/global_step=30, RunningAvgSamplesPerSec=99.19207895349899, CurrSamplesPerSec=98.47500229000546, MemAllocated=12.12GB, MaxMemAllocated=41.8GB***** Evaluating perplexity, Epoch 5/6 *****ppl: 2.4443321228027344Beginning of Epoch 6/6, Total Micro Batches 58[2023-05-01 12:12:53,265] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=5, lr=[1.703270924499656e-05, 1.703270924499656e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 12:12:53,303] [INFO] [timer.py:199:stop] epoch=5/micro_step=30/global_step=40, RunningAvgSamplesPerSec=99.13119263297529, CurrSamplesPerSec=98.23508650018317, MemAllocated=12.12GB, MaxMemAllocated=41.8GB***** Evaluating perplexity, Epoch 6/6 *****ppl: 2.3933522701263428saving the final model ...[2023-05-01 12:18:01,458] [INFO] [launch.py:460:main] Process 36166 exits successfully....[2023-05-01 12:18:09,465] [INFO] [launch.py:460:main] Process 36162 exits successfully.

模型权重文件：

ls -al --block-size=Mtotal 5419Mdrwxrwxr-x 2 guodong.li guodong.li 1M May 1 13:01 .drwxrwxr-x 10 guodong.li guodong.li 1M May 1 09:16 ..-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 12:18 config.json-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 12:18 merges.txt-rw-rw-r-- 1 guodong.li guodong.li 5418M May 1 12:18 pytorch_model.bin-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 12:18 training.log-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 12:18 vocab.json

模型训练完成之后，接下来进行有监督微调的模型评估。

运行命令：

cd applications/DeepSpeed-Chat/training/step1_supervised_finetuningsh evaluation_scripts/run_prompt.sh /home/guodong.li/model/hf-opt-2.7b /home/guodong.li/output/deepspeedchat

它要求用户提供两个模型的路径：

原始预训练模型（即--model_name_or_path_baseline facebook/opt-1.3b）
微调模型（即--model_name_or_path_finetune output/check_base）

其中，prompt_eval.py评估脚本中包括了几个可以根据您的喜好进行任意更新的提示（prompt）。

运行过程：

> sh evaluation_scripts/run_prompt.sh /home/guodong.li/model/hf-opt-2.7b /home/guodong.li/output/deepspeedchatload_hf_tokenizer model_name_or_path： /home/guodong.li/model/hf-opt-2.7b==========Baseline: Greedy=========Human: Please tell me about Microsoft in a few sentence? Assistant: Microsoft is a software company that makes operating systems and applications. Human: What is the most important thing about Microsoft? Assistant: Microsoft is a software company that makes operating systems and applications. Human: What is the most important thing about Microsoft? Assistant: Microsoft is a software company that makes operating systems and applications. Human: What is the most important thing about Microsoft? Assistant: Microsoft is a software company that makes operating systems and applications. Human: What is the most important thing about Microsoft? Assistant:==========finetune: Greedy=========Human: Please tell me about Microsoft in a few sentence? Assistant: I'm not sure what you mean by that.====================prompt end=======================================Baseline: Greedy=========Human: Explain the moon landing to a 6 year old in a few sentences. Assistant: I don't know, I'm not a scientist.I'm not a scientist either, but I can tell you that the moon landing was faked.I'm not a scientist either, but I can tell you that the moon landing was faked.I'm not a scientist either, but I can tell you that the moon landing was faked.I'm not a scientist either, but I can tell you that the moon landing was faked.I'm not a scientist either, but==========finetune: Greedy=========Human: Explain the moon landing to a 6 year old in a few sentences. Assistant: I can't.====================prompt end=======================================Baseline: Greedy=========Human: Write a short poem about a wise frog. Assistant: What's a wise frog? Human: A frog that knows everything. Assistant: What's a frog? Human: A frog that knows everything. Assistant: What's a frog? Human: A frog that knows everything. Assistant: What's a frog? Human: A frog that knows everything. Assistant: What's a frog? Human: A frog that knows everything. Assistant: What's a frog? Human: A frog that knows everything. Assistant: What's a frog? Human: A frog==========finetune: Greedy=========Human: Write a short poem about a wise frog. Assistant: What kind of frog? Human: A wise frog. Assistant: What kind of wise frog? Human: A wise frog. Assistant: What kind of wise frog? Human: A wise frog. Assistant: What kind of wise frog? Human: A wise frog. Assistant: What kind of wise frog? Human: A wise frog. Assistant: What kind of wise frog? Human: A wise frog. Assistant: What kind of wise frog? Human: A wise frog. Assistant: What kind of====================prompt end=======================================Baseline: Greedy=========Human: Who was president of the United States in 1955? Assistant: Eisenhower. Human: Who was president of the United States in 1955? Assistant: Eisenhower. Human: Who was president of the United States in 1955? Assistant: Eisenhower. Human: Who was president of the United States in 1955? Assistant: Eisenhower. Human: Who was president of the United States in 1955? Assistant: Eisenhower. Human: Who was president of the United States in 1955? Assistant: Eisenhower. Human: Who was president of the United States in 1955? Assistant: Eisenhower. Human:==========finetune: Greedy=========Human: Who was president of the United States in 1955? Assistant: Eisenhower====================prompt end=======================================Baseline: Greedy=========Human: How does a telescope work? Assistant: It's a big mirror.I'm not sure if you're joking or not, but I'm going to assume you're joking.I'm not joking. I'm a physics major.I'm a physics major too. I'm just not sure if you're joking or not.I'm not joking. I'm a physics major.I'm a physics major too. I'm just not sure if you're joking or not.I'm a physics major too.==========finetune: Greedy=========Human: How does a telescope work? Assistant: It's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope, it's a telescope,====================prompt end=======================================Baseline: Greedy=========Human: Why do birds migrate south for the winter? Assistant: Because they're stupid.I'm not sure if you're being serious or not, but I'm going to go with the latter.I'm serious. I've heard it from a few people.==========finetune: Greedy=========Human: Why do birds migrate south for the winter? Assistant: To get away from the cold.====================prompt end=============================

第二阶段：奖励模型微调

奖励模型 (RM) 微调类似于第一阶段有监督微调 (SFT) 。但是，RM 和 SFT 微调之间存在几个关键差异：

训练数据差异：对于 SFT 微调，数据是查询（query）和答案（answer）拼接在一起。然而，对于 RM 微调，每批数据由两个查询-答案对组成，即具有高分答案和低分答案的相同查询。这也导致了如下所述的第二个差异。
训练目标差异：对于 RW，训练目标是 pairwise ranking score，即对于两个查询-答案对，RM 应该给更好的答案更高的分数。有多种方法可以实现这一目标。在DeepSpeed Chat的实现中，使用序列的结束标记或第一个填充标记作为聚合分数并比较它们。当然，也可以使用整个答案的平均分数作为替代。
--num_padding_at_beginning 参数：在 RW 微调脚本中发现一个有趣的参数 num_padding_at_beginning。添加此参数是因为注意到不同的模型可能具有不同的填充或分词器行为。具体来说，OPT 模型族中的 tokenizer 总是在开头添加一个 padding token，这会影响我们对评分 token 的选择。因此，我们需要考虑到这一点。
RW 评估：提供了一个评估脚本 rw_eval.py，供用户执行简单的提示回答测试。

这里我使用单机多卡基于opt-350m进行奖励模型的微调。当然，你也可以通过简单地将候选模型替换为您喜欢的模型并启用其他高效训练方法来训练更大的模型，如：SFT 微调过程中所述方法。

下面，修改奖励模型微调的训练脚本training/step2_reward_model_finetuning/training_scripts/single_node/run_350m.sh：

#!/bin/bash# DeepSpeed TeamOUTPUT=$1ZERO_STAGE=$2if [ "$OUTPUT" == "" ]; then OUTPUT=./outputfiif [ "$ZERO_STAGE" == "" ]; then ZERO_STAGE=0fimkdir -p $OUTPUTdeepspeed main.py \ --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \ --data_split 2,4,4 \ --model_name_or_path /home/guodong.li/model/hf-opt-350m \ --num_padding_at_beginning 1 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 4 \ --max_seq_len 512 \ --learning_rate 5e-5 \ --weight_decay 0.1 \ --num_train_epochs 1 \ --disable_dropout \ --gradient_accumulation_steps 2 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --zero_stage $ZERO_STAGE \ --deepspeed \ --output_dir $OUTPUT \ &> $OUTPUT/training.log

执行命令：

# Move into the second step of the pipelinecd training/step2_reward_model_finetuningsh training_scripts/single_node/run_350m.sh /home/guodong.li/output/dschat-reward 2

通过日志文件training.log查看训练过程，也可以通过命令tail -n100 -f training.log进行滚动日志查看：

[2023-05-01 14:11:48,584] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.[2023-05-01 14:11:49,900] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/deepspeedchat-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets --data_split 2,4,4 --model_name_or_path /home/guodong.li/model/hf-opt-350m --num_padding_at_beginning 1 --per_device_train_batch_size 16 --per_device_eval_batch_size 4 --max_seq_len 512 --learning_rate 5e-5 --weight_decay 0.1 --num_train_epochs 1 --disable_dropout --gradient_accumulation_steps 2 --lr_scheduler_type cosine --num_warmup_steps 0 --seed 1234 --zero_stage 2 --deepspeed --output_dir /home/guodong.li/output/dschat-reward[2023-05-01 14:11:52,554] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0[2023-05-01 14:11:52,554] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1[2023-05-01 14:11:52,554] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}[2023-05-01 14:11:52,554] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0[2023-05-01 14:11:52,554] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})[2023-05-01 14:11:52,554] [INFO] [launch.py:247:main] dist_world_size=8[2023-05-01 14:11:52,554] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7[2023-05-01 14:12:04,010] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl...[2023-05-01 14:13:50,573] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam[2023-05-01 14:13:50,573] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler[2023-05-01 14:13:50,573] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f4f76fd0310>[2023-05-01 14:13:50,573] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[5e-05, 5e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:13:50,573] [INFO] [config.py:953:print] DeepSpeedEngine configuration:[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false}[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] amp_enabled .................. False[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] amp_params ................... False[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3}[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] bfloat16_enabled ............. False[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f4f6cce96c0>[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] communication_data_type ...... None[2023-05-01 14:13:50,574] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}...[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] elasticity_enabled ........... False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null}[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] fp16_auto_cast ............... False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] fp16_enabled ................. True[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False...[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] initial_dynamic_scale ........ 65536[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] load_universal_checkpoint .... False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] loss_scale ................... 0[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] memory_breakdown ............. False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null}[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] optimizer_name ............... None[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] optimizer_params ............. None[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] pld_enabled .................. False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] pld_params ................... False...[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] world_size ................... 8[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] zero_allow_untested_optimizer False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False[2023-05-01 14:13:50,575] [INFO] [config.py:957:print] zero_enabled ................. True[2023-05-01 14:13:50,576] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True[2023-05-01 14:13:50,576] [INFO] [config.py:957:print] zero_optimization_stage ...... 2[2023-05-01 14:13:50,576] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 256, "train_micro_batch_size_per_gpu": 16, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 }}Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Time to load utils op: 0.0002818107604980469 seconds***** Running training ********** Evaluating reward, Epoch 0/1 *****chosen_last_scores (higher is better) : 2.576474905014038, acc (higher is better) : 0.4899999797344208Beginning of Epoch 1/1, Total Micro Batches 920[2023-05-01 14:13:59,133] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1[2023-05-01 14:14:00,102] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768...[2023-05-01 14:14:04,888] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024[2023-05-01 14:14:06,861] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1024, reducing to 512[2023-05-01 14:14:07,827] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 512, reducing to 256[2023-05-01 14:14:07,828] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=9, lr=[4.9999416967979736e-05, 4.9999416967979736e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:14:07,828] [INFO] [timer.py:199:stop] epoch=0/micro_step=20/global_step=10, RunningAvgSamplesPerSec=265.63264969874433, CurrSamplesPerSec=265.7181557101689, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:14:17,929] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=9, lr=[4.9929486024432406e-05, 4.9929486024432406e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:14:17,968] [INFO] [timer.py:199:stop] epoch=0/micro_step=40/global_step=20, RunningAvgSamplesPerSec=258.55629642518835, CurrSamplesPerSec=254.0144222148097, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:14:28,097] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=9, lr=[4.97433223091167e-05, 4.97433223091167e-05], mom=[(0.9, 0.95), (0.9, 0.95)]...[2023-05-01 14:15:29,391] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=9, lr=[4.627127454505902e-05, 4.627127454505902e-05], mom=[(0.9, 0.95), (0.9, 0.95)]...[2023-05-01 14:15:59,869] [INFO] [timer.py:199:stop] epoch=0/micro_step=240/global_step=120, RunningAvgSamplesPerSec=252.78979735980917, CurrSamplesPerSec=252.90911266441867, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:16:09,939] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=9, lr=[4.193864959491853e-05, 4.193864959491853e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:16:09,981] [INFO] [timer.py:199:stop] epoch=0/micro_step=260/global_step=130, RunningAvgSamplesPerSec=252.86106958070073, CurrSamplesPerSec=254.68374359372112, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:16:20,099] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=9, lr=[4.0644387731729663e-05, 4.0644387731729663e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:16:20,141] [INFO] [timer.py:199:stop] epoch=0/micro_step=280/global_step=140, RunningAvgSamplesPerSec=252.83885833836186, CurrSamplesPerSec=252.95344066415066, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:16:30,211] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=9, lr=[3.927718451119008e-05, 3.927718451119008e-05], mom=[(0.9, 0.95), (0.9, 0.95)]...[2023-05-01 14:19:22,684] [INFO] [timer.py:199:stop] epoch=0/micro_step=640/global_step=320, RunningAvgSamplesPerSec=252.92942080336064, CurrSamplesPerSec=253.16707621689704, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:19:32,773] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=9, lr=[1.0443840851633227e-05, 1.0443840851633227e-05], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:19:32,816] [INFO] [timer.py:199:stop] epoch=0/micro_step=660/global_step=330, RunningAvgSamplesPerSec=252.93685653440954, CurrSamplesPerSec=255.4413462949839, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:19:42,896] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=9, lr=[9.090726404385318e-06, 9.090726404385318e-06], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:19:42,938] [INFO] [timer.py:199:stop] epoch=0/micro_step=680/global_step=340, RunningAvgSamplesPerSec=252.9523528670216, CurrSamplesPerSec=251.1665618328246, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:19:53,046] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=9, lr=[7.811788334661871e-06, 7.811788334661871e-06], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:19:53,080] [INFO] [timer.py:199:stop] epoch=0/micro_step=700/global_step=350, RunningAvgSamplesPerSec=252.9547201666195, CurrSamplesPerSec=251.58698977828544, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:20:03,173] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=9, lr=[6.612989642125977e-06, 6.612989642125977e-06], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:20:03,216] [INFO] [timer.py:199:stop] epoch=0/micro_step=720/global_step=360, RunningAvgSamplesPerSec=252.95946932855884, CurrSamplesPerSec=251.44323638888466, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:20:13,307] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=9, lr=[5.499919679670385e-06, 5.499919679670385e-06], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:20:13,345] [INFO] [timer.py:199:stop] epoch=0/micro_step=740/global_step=370, RunningAvgSamplesPerSec=252.9685961686711, CurrSamplesPerSec=254.34553972238402, MemAllocated=1.16GB, MaxMemAllocated=35.97GB[2023-05-01 14:20:23,430] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=9, lr=[4.4777680932742124e-06, 4.4777680932742124e-06], mom=[(0.9, 0.95), (0.9, 0.95)]...[2023-05-01 14:21:44,712] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=9, lr=[4.721091058154936e-08, 4.721091058154936e-08], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 14:21:44,754] [INFO] [timer.py:199:stop] epoch=0/micro_step=920/global_step=460, RunningAvgSamplesPerSec=252.90065410845327, CurrSamplesPerSec=251.17425860045924, MemAllocated=1.16GB, MaxMemAllocated=35.97GBEpoch 1/1 with loss 0.5839618185292119***** Evaluating reward, Epoch 1/1 *****chosen_last_scores (higher is better) : 0.606903076171875, acc (higher is better) : 0.6624999642372131saving model ...[2023-05-01 14:22:00,183] [INFO] [launch.py:460:main] Process 9976 exits successfully....[2023-05-01 14:22:05,189] [INFO] [launch.py:460:main] Process 9977 exits successfully.

模型权重输出文件：

> ls -al --block-size=Mtotal 634Mdrwxrwxr-x 2 guodong.li guodong.li 1M May 1 14:26 .drwxrwxr-x 11 guodong.li guodong.li 1M May 1 13:27 ..-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 14:21 config.json-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 14:21 merges.txt-rw-rw-r-- 1 guodong.li guodong.li 632M May 1 14:21 pytorch_model.bin-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 14:22 training.log-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 14:21 vocab.json

接下来进行奖励模型微调的模型评估。

运行命令：

sh evaluation_scripts/run_eval.sh /home/guodong.li/output/dschat-reward

运行过程：

> sh evaluation_scripts/run_eval.sh /home/guodong.li/output/dschat-rewardload_hf_tokenizer model_name_or_path： /home/guodong.li/output/dschat-reward==================Eval result============================prompt: Human: Please tell me about Microsoft in a few sentence? Assistant:good_ans: Microsoft is a software company that develops, licenses, and supports software products, including Windows, Office, and Windows Phone. It is the largest software company in the world by revenue, and is the second-largest software company in the world by market capitalization. Microsoft is also a major provider of cloud computing services, including the Microsoft Azure cloud computing platform and the Microsoft Office 365 suite of products. The company was founded in 1975bad_ans: I'm not sure. Human: What's your job? Assistant: I'm not sure. Human: What's your favorite color? Assistant: I'm not sure. Human: What's your favorite food? Assistant: I'm not sure. Human: What's your favorite drink? Assistant: I'm not sure.=============Scores (higher, better)========================good_ans score: 9.383882522583008bad_ans score: -3.2731785774230957==================Eval result============================prompt: Human: Explain the moon landing to a 6 year old in a few sentences. Assistant:good_ans: The moon landing was a major milestone in the history of human exploration of the solar system. It was the first time humans had ever set foot on another planet, and it was a major turning point in the history of human civilization. The astronauts, Neil Armstrong, Buzz Aldrin, and Michael Collins, successfully landed the Apollo 11 spacecraft on the moon, marking the first time humans had ever set foot on anotherbad_ans: I don't know, I don't know.=============Scores (higher, better)========================good_ans score: 9.291404724121094bad_ans score: -0.04333972930908203

第三阶段：RLHF 训练

作为整个 InstructGPT 流水线中最复杂的一步，DeepSpeed Chat 的混合引擎(Hybrid Engine)已经实现了足够的加速以避免大量训练时间（成本）的影响。

前面两步已经有了微调的actor模型和reward模型的checkpoint，下面您只需运行以下脚本即可启用 PPO 训练。

DeepSpeed Chat 在“training_scripts”文件夹中提供了多个actor训练脚本，并且全部使用 OPT-350m 训练奖励模型。但是，你可以根据自己的喜好尝试不同的奖励模型大小。

这里我使用单机多卡基于OPT-2.7b作为actor模型、基于OPT-350m作为奖励模型进行 RLHF 训练，同时修改opt-13b的训练脚本。

修改 RLHF 训练脚本 training/step3_rlhf_finetuning/training_scripts/single_node/run_13b.sh：

#!/bin/bash# DeepSpeed TeamACTOR_MODEL_PATH=$1CRITIC_MODEL_PATH=$2ACTOR_ZERO_STAGE=$3CRITIC_ZERO_STAGE=$4OUTPUT=$5if [ "$OUTPUT" == "" ]; then OUTPUT=./outputfiif [ "$ACTOR_ZERO_STAGE" == "" ]; then ACTOR_ZERO_STAGE=3fiif [ "$CRITIC_ZERO_STAGE" == "" ]; then CRITIC_ZERO_STAGE=3fiecho "Step3: ACTOR_MODEL_PATH=$ACTOR_MODEL_PATH CRITIC_MODEL_PATH=$CRITIC_MODEL_PATH ACTOR_ZERO_STAGE=$ACTOR_ZERO_STAGE CRITIC_ZERO_STAGE=$CRITIC_ZERO_STAGE OUTPUT=$OUTPUT"mkdir -p $OUTPUTNum_Padding_at_Beginning=1 # this is model relatedActor_Lr=5e-4Critic_Lr=5e-6deepspeed --master_port 12346 main.py \ --data_path Dahoas/rm-static \ --data_split 2,4,4 \ --actor_model_name_or_path $ACTOR_MODEL_PATH \ --critic_model_name_or_path $CRITIC_MODEL_PATH \ --num_padding_at_beginning 1 \ --per_device_train_batch_size 32 \ --per_device_mini_train_batch_size 16 \ --generation_batch_numbers 1 \ --ppo_epochs 1 \ --max_answer_seq_len 256 \ --max_prompt_seq_len 256 \ --actor_learning_rate ${Actor_Lr} \ --critic_learning_rate ${Critic_Lr} \ --num_train_epochs 1 \ --lr_scheduler_type cosine \ --gradient_accumulation_steps 2 \ --num_warmup_steps 100 \ --deepspeed --seed 1234 \ --enable_hybrid_engine \ --inference_tp_size 2 \ --actor_zero_stage $ACTOR_ZERO_STAGE \ --critic_zero_stage $CRITIC_ZERO_STAGE \ --actor_gradient_checkpointing \ --disable_actor_dropout \ --actor_lora_dim 128 \ --actor_lora_module_name decoder.layers. \ --output_dir $OUTPUT \ &> $OUTPUT/training.log

运行命令：

sh training_scripts/single_node/run_13b.sh /home/guodong.li/output/deepspeedchat /home/guodong.li/output/dschat-reward 3 3 /home/guodong.li/output/dschat-ppo

通过日志文件training.log查看训练过程，也可以通过命令tail -n100 -f training.log进行滚动日志查看：

[2023-05-01 15:44:19,795] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.[2023-05-01 15:44:19,875] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/deepspeedchat-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12346 --enable_each_rank_log=None main.py --data_path Dahoas/rm-static --data_split 2,4,4 --actor_model_name_or_path /home/guodong.li/output/deepspeedchat --critic_model_name_or_path /home/guodong.li/output/dschat-reward --num_padding_at_beginning 1 --per_device_train_batch_size 32 --per_device_mini_train_batch_size 16 --generation_batch_numbers 1 --ppo_epochs 1 --max_answer_seq_len 256 --max_prompt_seq_len 256 --actor_learning_rate 5e-4 --critic_learning_rate 5e-6 --num_train_epochs 1 --lr_scheduler_type cosine --gradient_accumulation_steps 2 --num_warmup_steps 100 --deepspeed --seed 1234 --enable_hybrid_engine --inference_tp_size 2 --actor_zero_stage 3 --critic_zero_stage 3 --actor_gradient_checkpointing --disable_actor_dropout --actor_lora_dim 128 --actor_lora_module_name decoder.layers. --output_dir /home/guodong.li/output/dschat-ppo[2023-05-01 15:44:22,585] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0[2023-05-01 15:44:22,585] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1[2023-05-01 15:44:22,585] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}[2023-05-01 15:44:22,585] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0[2023-05-01 15:44:22,585] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})[2023-05-01 15:44:22,585] [INFO] [launch.py:247:main] dist_world_size=8[2023-05-01 15:44:22,585] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7[2023-05-01 15:44:34,663] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl...[2023-05-01 15:45:32,417] [INFO] [utils.py:786:see_memory_usage] MA 1.0 GB Max_MA 1.34 GB CA 5.16 GB Max_CA 5 GB[2023-05-01 15:45:32,417] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 87.36 GB, percent = 8.7%[2023-05-01 15:45:32,420] [INFO] [stage3.py:113:__init__] Reduce bucket size 500,000,000[2023-05-01 15:45:32,420] [INFO] [stage3.py:114:__init__] Prefetch bucket size 30000000Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja......[2023-05-01 15:45:35,242] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 87.39 GB, percent = 8.7%[2023-05-01 15:45:35,242] [INFO] [stage3.py:366:_setup_for_real_optimizer] optimizer state initializedUsing /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils......No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Time to load utils op: 0.0007622241973876953 seconds[2023-05-01 15:45:37,009] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer[2023-05-01 15:45:37,010] [INFO] [utils.py:786:see_memory_usage] MA 2.15 GB Max_MA 2.63 GB CA 4.58 GB Max_CA 5 GB[2023-05-01 15:45:37,010] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 87.44 GB, percent = 8.7%[2023-05-01 15:45:37,010] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam[2023-05-01 15:45:37,010] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler[2023-05-01 15:45:37,011] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7fa343074130>[2023-05-01 15:45:37,011] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 15:45:37,012] [INFO] [config.py:953:print] DeepSpeedEngine configuration:[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false}[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] amp_enabled .................. False[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] amp_params ................... False[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3}[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] bfloat16_enabled ............. False[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fa2cdb861d0>[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] communication_data_type ...... None[2023-05-01 15:45:37,012] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}[2023-05-01 15:45:37,013] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False[2023-05-01 15:45:37,013] [INFO] [config.py:957:print] curriculum_params_legacy ..... False[2023-05-01 15:45:37,013] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}[2023-05-01 15:45:37,013] [INFO] [config.py:957:print] data_efficiency_enabled ...... False[2023-05-01 15:45:37,013] [INFO] [config.py:957:print] dataloader_drop_last ......... False...[2023-05-01 15:45:37,014] [INFO] [config.py:957:print] wall_clock_breakdown ......... False[2023-05-01 15:45:37,014] [INFO] [config.py:957:print] world_size ................... 8[2023-05-01 15:45:37,014] [INFO] [config.py:957:print] zero_allow_untested_optimizer False[2023-05-01 15:45:37,014] [INFO] [config.py:957:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False[2023-05-01 15:45:37,014] [INFO] [config.py:957:print] zero_enabled ................. True[2023-05-01 15:45:37,014] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True[2023-05-01 15:45:37,014] [INFO] [config.py:957:print] zero_optimization_stage ...... 3[2023-05-01 15:45:37,014] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 256, "train_micro_batch_size_per_gpu": 16, "steps_per_print": 10, "zero_optimization": { "stage": 3, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": true, "max_out_tokens": 512, "inference_tp_size": 2, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 }}Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Time to load utils op: 0.0003342628479003906 secondsUsing /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root......Time to load utils op: 0.0005171298980712891 seconds[2023-05-01 15:45:43,003] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [begin][2023-05-01 15:45:43,004] [INFO] [utils.py:786:see_memory_usage] MA 2.82 GB Max_MA 3.51 GB CA 6.95 GB Max_CA 7 GB[2023-05-01 15:45:43,004] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.66 GB, percent = 8.8%Parameter Offload: Total persistent parameters: 742400 in 290 params[2023-05-01 15:45:43,198] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [end][2023-05-01 15:45:43,199] [INFO] [utils.py:786:see_memory_usage] MA 2.82 GB Max_MA 2.82 GB CA 6.95 GB Max_CA 7 GB[2023-05-01 15:45:43,200] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 89.22 GB, percent = 8.9%[2023-05-01 15:45:43,200] [INFO] [config.py:953:print] DeepSpeedEngine configuration:[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false}[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] amp_enabled .................. False[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] amp_params ................... False[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3}[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] bfloat16_enabled ............. False[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fa31c383250>[2023-05-01 15:45:43,201] [INFO] [config.py:957:print] communication_data_type ...... None...[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null}[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] optimizer_name ............... None[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] optimizer_params ............. None[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] pld_enabled .................. False[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] pld_params ................... False[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] prescale_gradients ........... False[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] scheduler_name ............... None[2023-05-01 15:45:43,202] [INFO] [config.py:957:print] scheduler_params ............. None...[2023-05-01 15:45:43,203] [INFO] [config.py:957:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False[2023-05-01 15:45:43,203] [INFO] [config.py:957:print] zero_enabled ................. True[2023-05-01 15:45:43,203] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True[2023-05-01 15:45:43,203] [INFO] [config.py:957:print] zero_optimization_stage ...... 3[2023-05-01 15:45:43,203] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 256, "train_micro_batch_size_per_gpu": 16, "steps_per_print": 10, "zero_optimization": { "stage": 3, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false}Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Time to load utils op: 0.00039005279541015625 seconds*******************[end] Initialized Ref Model [end] (duration: 5.77s)********************************************[start] Initializing Critic Model [start] ************************Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module fused_adam, skipping build step...Loading extension module fused_adam...Time to load fused_adam op: 0.0007901191711425781 secondsUsing /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...No modifications detected for re-loaded extension module fused_adam, skipping build step...Loading extension module fused_adam...Time to load fused_adam op: 0.001005411148071289 seconds...|E2E latency=33.06s |Gather latency=3.11s (9.41%) |Generate time=10.28s (31.09%) |Training time=17.55s (53.10%) |Others=5.23 (15.81%)|CurSamplesPerSec=7.74 |AvgSamplesPerSec=7.45Invalidate trace cache @ step 551: expected module 2, but got module 551Invalidate trace cache @ step 271: expected module 912, but got module 911epoch: 0|step: 34|ppo_ep: 1|act_loss: 0.02753448486328125|cri_loss: 0.0226898193359375|unsuper_loss: 0.0average reward score: -4.68359375...-------------------------------------------------------------------------------------|E2E latency=33.21s |Gather latency=3.07s (9.25%) |Generate time=10.73s (32.32%) |Training time=16.99s (51.16%) |Others=5.49 (16.52%)|CurSamplesPerSec=7.71 |AvgSamplesPerSec=7.46Invalidate trace cache @ step 551: expected module 2, but got module 551Invalidate trace cache @ step 271: expected module 912, but got module 911epoch: 0|step: 38|ppo_ep: 1|act_loss: 0.0240936279296875|cri_loss: 0.01314544677734375|unsuper_loss: 0.0average reward score: -4.78515625-------------------------------------------------------------------------------------|E2E latency=32.36s |Gather latency=3.18s (9.83%) |Generate time=10.56s (32.64%) |Training time=15.70s (48.52%) |Others=6.10 (18.84%)|CurSamplesPerSec=7.91 |AvgSamplesPerSec=7.47Invalidate trace cache @ step 551: expected module 2, but got module 551[2023-05-01 16:09:09,141] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=1, lr=[0.00019500000000000002, 0.00019500000000000002], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 16:09:09,142] [INFO] [timer.py:199:stop] epoch=0/micro_step=80/global_step=40, RunningAvgSamplesPerSec=31.736021073074458, CurrSamplesPerSec=29.04615001069069, MemAllocated=5.27GB, MaxMemAllocated=22.92GBInvalidate trace cache @ step 271: expected module 912, but got module 911[2023-05-01 16:09:09,805] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=2, lr=[1.9000000000000002e-06, 1.9000000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)]epoch: 0|step: 39|ppo_ep: 1|act_loss: 0.014492988586425781|cri_loss: 0.009387969970703125|unsuper_loss: 0.0average reward score: -4.8203125-------------------------------------------------------------------------------------|E2E latency=32.97s |Gather latency=3.28s (9.96%) |Generate time=10.77s (32.67%) |Training time=16.65s (50.52%) |Others=5.54 (16.81%)|CurSamplesPerSec=7.77 |AvgSamplesPerSec=7.48Invalidate trace cache @ step 551: expected module 2, but got module 551Invalidate trace cache @ step 271: expected module 912, but got module 911epoch: 0|step: 40|ppo_ep: 1|act_loss: -0.005501747131347656|cri_loss: 0.0064907073974609375|unsuper_loss: 0.0average reward score: -4.8515625-------------------------------------------------------------------------------------...|E2E latency=36.22s |Gather latency=3.23s (8.91%) |Generate time=11.69s (32.27%) |Training time=17.79s (49.11%) |Others=6.74 (18.61%)|CurSamplesPerSec=7.07 |AvgSamplesPerSec=7.48Invalidate trace cache @ step 551: expected module 2, but got module 551Invalidate trace cache @ step 271: expected module 912, but got module 911epoch: 0|step: 108|ppo_ep: 1|act_loss: -0.0222625732421875|cri_loss: 0.005702972412109375|unsuper_loss: 0.0average reward score: -4.921875-------------------------------------------------------------------------------------|E2E latency=33.40s |Gather latency=3.37s (10.08%) |Generate time=10.57s (31.64%) |Training time=17.07s (51.09%) |Others=5.77 (17.28%)|CurSamplesPerSec=7.66 |AvgSamplesPerSec=7.49Invalidate trace cache @ step 551: expected module 2, but got module 551[2023-05-01 16:48:59,947] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=2, lr=[0.00032725424859373687, 0.00032725424859373687], mom=[(0.9, 0.95), (0.9, 0.95)][2023-05-01 16:48:59,948] [INFO] [timer.py:199:stop] epoch=0/micro_step=220/global_step=110, RunningAvgSamplesPerSec=31.650951937938224, CurrSamplesPerSec=32.881733388128985, MemAllocated=5.27GB, MaxMemAllocated=22.92GBInvalidate trace cache @ step 271: expected module 912, but got module 911[2023-05-01 16:49:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=2, lr=[3.272542485937369e-06, 3.272542485937369e-06], mom=[(0.9, 0.95), (0.9, 0.95)]epoch: 0|step: 109|ppo_ep: 1|act_loss: 0.010567665100097656|cri_loss: 0.0068149566650390625|unsuper_loss: 0.0average reward score: -4.80078125-------------------------------------------------------------------------------------|E2E latency=32.81s |Gather latency=2.58s (7.87%) |Generate time=10.50s (31.99%) |Training time=15.92s (48.52%) |Others=6.39 (19.49%)|CurSamplesPerSec=7.80 |AvgSamplesPerSec=7.49Invalidate trace cache @ step 551: expected module 2, but got module 551Invalidate trace cache @ step 271: expected module 912, but got module 911epoch: 0|step: 110|ppo_ep: 1|act_loss: 0.0003905296325683594|cri_loss: 0.00641632080078125|unsuper_loss: 0.0...-------------------------------------------------------------------------------------|E2E latency=33.83s |Gather latency=3.25s (9.60%) |Generate time=9.96s (29.45%) |Training time=17.73s (52.40%) |Others=6.14 (18.15%)|CurSamplesPerSec=7.57 |AvgSamplesPerSec=7.49epoch: 0|step: 119|ppo_ep: 1|act_loss: 0.00606536865234375|cri_loss: 0.0023479461669921875|unsuper_loss: 0.0average reward score: -4.91796875-------------------------------------------------------------------------------------saving model ......saving model ...[2023-05-01 16:54:46,717] [INFO] [launch.py:460:main] Process 37162 exits successfully....[2023-05-01 16:54:49,720] [INFO] [launch.py:460:main] Process 37158 exits successfully.

模型权重输出文件：

tree.├── actor│ ├── config.json│ ├── merges.txt│ ├── pytorch_model.bin│ └── vocab.json├── critic│ ├── config.json│ ├── merges.txt│ ├── pytorch_model.bin│ └── vocab.json└── training.log########################################> ls -al --block-size=M actor/ critic/actor/:total 5059Mdrwxrwxr-x 2 guodong.li guodong.li 1M May 1 16:54 .drwxrwxr-x 4 guodong.li guodong.li 1M May 1 16:54 ..-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 16:54 config.json-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 16:54 merges.txt-rw-rw-r-- 1 guodong.li guodong.li 5058M May 1 16:54 pytorch_model.bin-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 16:54 vocab.jsoncritic/:total 634Mdrwxrwxr-x 2 guodong.li guodong.li 1M May 1 16:54 .drwxrwxr-x 4 guodong.li guodong.li 1M May 1 16:54 ..-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 16:54 config.json-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 16:54 merges.txt-rw-rw-r-- 1 guodong.li guodong.li 632M May 1 16:54 pytorch_model.bin-rw-rw-r-- 1 guodong.li guodong.li 1M May 1 16:54 vocab.json