-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Data modification encountered pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs
Thanks for your excellent work. I encountered an error when trying to modify your data_v0.8_visual_toolbox_v2.parquet data system: I read and saved the file and then used dataset.loaddataset to report an error
After my investigation, I found that the 8k bytes of data might be too large, because I did not have such a problem with another data operation. I would like to ask how you saved the data at that time and whether there is relevant code
my code as follows:
`import datasets
import pandas as pd
import pyarrow.parquet as pq
input_file = f"data_v0.8_visual_toolbox_v2temtem.parquet"
dataframe = datasets.load_dataset("parquet", data_files=input_file)["train"]
df = pd.read_parquet(input_file, engine="pyarrow")
output_file_path=input_file.replace(".parquet","temtem.parquet")
df.to_parquet(output_file_path, engine="pyarrow", index=False)
dataframe = datasets.load_dataset("parquet", data_files=output_file_path)["train"]`