ArrowInvalid: Column 4 named images expected length 360 but got length 352 #5

DhruvaBansal00 · 2025-01-16T21:46:42Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1
Platform: Linux-5.15.0-1048-oracle-x86_64-with-glibc2.31
Python version: 3.11.9
PyTorch version: 2.3.1+cu121 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A100-SXM4-80GB
DeepSpeed version: 0.14.4
Bitsandbytes version: 0.45.0

Reproduction

### model
model_name_or_path: Qwen/Qwen2-VL-72B-Instruct

### method
stage: sft
do_train: true
finetuning_type: full
freeze_vision_tower: true  # choices: [true, false]
train_mm_proj_only: false  # choices: [true, false]
deepspeed: examples/deepspeed/ds_z3_offload_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
use_adam_mini: true

### dataset
dataset: mllm_demo,identity,alpaca_en_demo,slimorca
template: qwen2_vl
cutoff_len: 128000
max_samples: 25000
overwrite_cache: true
preprocessing_num_workers: 16
sequence_parallel_size: 4

### output
output_dir: saves/qwen2_vl-72b/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 30.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
packing: true
enable_liger_kernel: false
flash_attn: fa2
use_unsloth_gc: true

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

### logging
report_to: wandb
run_name: qwen2vl-72b-full-sft-1

^config file for training

Stack trace:

[rank7]: ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
[rank7]: │ /360-LLaMA-Factory/src/train.py:28 in <module>                               │
[rank7]: │                                                                              │
[rank7]: │   25                                                                         │
[rank7]: │   26                                                                         │
[rank7]: │   27 if __name__ == "__main__":                                              │
[rank7]: │ ❱ 28 │   main()                                                              │
[rank7]: │   29                                                                         │
[rank7]: │                                                                              │
[rank7]: │ /360-LLaMA-Factory/src/train.py:19 in main                                   │
[rank7]: │                                                                              │
[rank7]: │   16                                                                         │
[rank7]: │   17                                                                         │
[rank7]: │   18 def main():                                                             │
[rank7]: │ ❱ 19 │   run_exp()                                                           │
[rank7]: │   20                                                                         │
[rank7]: │   21                                                                         │
[rank7]: │   22 def _mp_fn(index):                                                      │
[rank7]: │                                                                              │
[rank7]: │ /360-LLaMA-Factory/src/llamafactory/train/tuner.py:50 in run_exp             │
[rank7]: │                                                                              │
[rank7]: │    47 │   if finetuning_args.stage == "pt":                                  │
[rank7]: │    48 │   │   run_pt(model_args, data_args, training_args, finetuning_args,  │
[rank7]: │    49 │   elif finetuning_args.stage == "sft":                               │
[rank7]: │ ❱  50 │   │   run_sft(model_args, data_args, training_args, finetuning_args, │
[rank7]: │    51 │   elif finetuning_args.stage == "rm":                                │
[rank7]: │    52 │   │   run_rm(model_args, data_args, training_args, finetuning_args,  │
[rank7]: │    53 │   elif finetuning_args.stage == "ppo":                               │
[rank7]: │                                                                              │
[rank7]: │ /360-LLaMA-Factory/src/llamafactory/train/sft/workflow.py:47 in run_sft      │
[rank7]: │                                                                              │
[rank7]: │    44 │   tokenizer_module = load_tokenizer(model_args)                      │
[rank7]: │    45 │   tokenizer = tokenizer_module["tokenizer"]                          │
[rank7]: │    46 │   template = get_template_and_fix_tokenizer(tokenizer, data_args)    │
[rank7]: │ ❱  47 │   dataset_module = get_dataset(template, model_args, data_args, trai │
[rank7]: │    48 │   model = load_model(tokenizer, model_args, finetuning_args, trainin │
[rank7]: │    49 │                                                                      │
[rank7]: │    50 │   if getattr(model, "is_quantized", False) and not training_args.do_ │
[rank7]: │                                                                              │
[rank7]: │ /360-LLaMA-Factory/src/llamafactory/data/loader.py:279 in                    │
[rank7]: │ sequence_parallel_processor                                                  │
[rank7]: │                                                                              │
[rank7]: │   276 │   │   │   │   if data_args.shuffle_for_sequence_parallel:            │
[rank7]: │   277 │   │   │   │   │   dataset = dataset.shuffle(seed=training_args.seed) │
[rank7]: │   278 │   │   │   │   padded_dataset = dataset.map(pad_sequence, batched=Tru │
[rank7]: │ ❱ 279 │   │   │   │   sp_dataset = padded_dataset.map(sp_split, batched=True │
[rank7]: │   280 │   │   │   │   dataset_module[k] = sp_dataset                         │
[rank7]: │   281 │   │                                                                  │
[rank7]: │   282 │   │   else:                                                          │
[rank7]: │                                                                              │
[rank7]: │ /usr/lib/python3/dist-packages/datasets/arrow_dataset.py:560 in wrapper      │
[rank7]: │                                                                              │
[rank7]: │    557 │   │   │   "output_all_columns": self._output_all_columns,           │
[rank7]: │    558 │   │   }                                                             │
[rank7]: │    559 │   │   # apply actual function                                       │
[rank7]: │ ❱  560 │   │   out: Union["Dataset", "DatasetDict"] = func(self, *args, **kw │
[rank7]: │    561 │   │   datasets: List["Dataset"] = list(out.values()) if isinstance( │
[rank7]: │    562 │   │   # re-apply format to the output                               │
[rank7]: │    563 │   │   for dataset in datasets:                                      │
[rank7]: │                                                                              │
[rank7]: │ /usr/lib/python3/dist-packages/datasets/arrow_dataset.py:3055 in map         │
[rank7]: │                                                                              │
[rank7]: │   3052 │   │   │   │   │   total=pbar_total,                                 │
[rank7]: │   3053 │   │   │   │   │   desc=desc or "Map",                               │
[rank7]: │   3054 │   │   │   │   ) as pbar:                                            │
[rank7]: │ ❱ 3055 │   │   │   │   │   for rank, done, content in Dataset._map_single(** │
[rank7]: │   3056 │   │   │   │   │   │   if done:                                      │
[rank7]: │   3057 │   │   │   │   │   │   │   shards_done += 1                          │
[rank7]: │   3058 │   │   │   │   │   │   │   logger.debug(f"Finished processing shard  │
[rank7]: │                                                                              │
[rank7]: │ /usr/lib/python3/dist-packages/datasets/arrow_dataset.py:3481 in _map_single │
[rank7]: │                                                                              │
[rank7]: │   3478 │   │   │   │   │   │   │   ):                                        │
[rank7]: │   3479 │   │   │   │   │   │   │   │   writer.write_table(batch.to_arrow())  │
[rank7]: │   3480 │   │   │   │   │   │   │   else:                                     │
[rank7]: │ ❱ 3481 │   │   │   │   │   │   │   │   writer.write_batch(batch)             │
[rank7]: │   3482 │   │   │   │   │   │   num_examples_progress_update += num_examples_ │
[rank7]: │   3483 │   │   │   │   │   │   if time.time() > _time + config.PBAR_REFRESH_ │
[rank7]: │   3484 │   │   │   │   │   │   │   _time = time.time()                       │
[rank7]: │                                                                              │
[rank7]: │ /usr/lib/python3/dist-packages/datasets/arrow_writer.py:608 in write_batch   │
[rank7]: │                                                                              │
[rank7]: │   605 │   │   │   │   arrays.append(pa.array(typed_sequence))                │
[rank7]: │   606 │   │   │   │   inferred_features[col] = typed_sequence.get_inferred_t │
[rank7]: │   607 │   │   schema = inferred_features.arrow_schema if self.pa_writer is N │
[rank7]: │ ❱ 608 │   │   pa_table = pa.Table.from_arrays(arrays, schema=schema)         │
[rank7]: │   609 │   │   self.write_table(pa_table, writer_batch_size)                  │
[rank7]: │   610 │                                                                      │
[rank7]: │   611 │   def write_table(self, pa_table: pa.Table, writer_batch_size: Optio │
[rank7]: │                                                                              │
[rank7]: │ in pyarrow.lib.Table.from_arrays:4868                                        │
[rank7]: │                                                                              │
[rank7]: │ in pyarrow.lib.Table.validate:4214                                           │
[rank7]: │                                                                              │
[rank7]: │ in pyarrow.lib.check_status:92                                               │
[rank7]: ╰──────────────────────────────────────────────────────────────────────────────╯
[rank7]: ArrowInvalid: Column 4 named images expected length 360 but got length 352

Expected behavior

Training should proceed for Qwen 2.5 VL 72b normally

Others

mllm_demo is a dataset with images. Has this repo been tested with multimodal datasets yet?

The text was updated successfully, but these errors were encountered:

DhruvaBansal00 · 2025-01-16T22:18:20Z

Ran the same config through without the mllm_demo dataset and training succeeded.

I am hoping to train on multimodal datasets with sequence parallelism - would love advice on how we could enable training on image datasets too.

HaoshengZou · 2025-01-17T14:00:35Z

Thanks for your interest!
Multimodal SP is our next internal milestone - the current release hasn't been tested with multimodal data but we are playing with them already. Stay tuned for a tested multimodal SP support!

DhruvaBansal00 · 2025-01-17T16:41:20Z

Thanks for the update!

What timelines are you tracking internally for releasing multimodal SP? And any way I could help with adding support for multimodal SP? This is relatively high on my priority list at the moment!

HaoshengZou self-assigned this Jan 17, 2025

HaoshengZou added the enhancement New feature or request label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArrowInvalid: Column 4 named images expected length 360 but got length 352 #5

ArrowInvalid: Column 4 named images expected length 360 but got length 352 #5

DhruvaBansal00 commented Jan 16, 2025 •

edited

Loading

DhruvaBansal00 commented Jan 16, 2025

HaoshengZou commented Jan 17, 2025

DhruvaBansal00 commented Jan 17, 2025

ArrowInvalid: Column 4 named images expected length 360 but got length 352 #5

ArrowInvalid: Column 4 named images expected length 360 but got length 352 #5

Comments

DhruvaBansal00 commented Jan 16, 2025 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

DhruvaBansal00 commented Jan 16, 2025

HaoshengZou commented Jan 17, 2025

DhruvaBansal00 commented Jan 17, 2025

DhruvaBansal00 commented Jan 16, 2025 •

edited

Loading