Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for COPY data transfers #279

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

17dec
Copy link

@17dec 17dec commented Dec 20, 2023

No description provided.

@17dec
Copy link
Author

17dec commented Dec 21, 2023

After a good night's sleep I realized a few things:

  • I initially forgot to consume @frame_size on IO.close, so closing after a partial read was bugged.
  • Mandating an IO.close when reading felt like fragile API design, so it now automatically consumes the final frames when the reader has been drained. IO.close is now optional.
  • The protocol allows us to allocate perfectly-sized buffers to read COPY data on a row-by-row basis, but the bare IO interface didn't expose that functionality. I've added a remaining_row_size method to support that use case. (I thought of implementing a more performant specialization of IO.gets on top, but that's only correct for TEXT formats)
  • With the COPY-out infrastructure in place, adding COPY-in support is actually pretty trivial as well. I don't really have a use case for that yet, but implemented anyway.

That should be all for now, hopefully I got everything this time. :)

@17dec 17dec changed the title Add support for COPY out data transfers Add support for COPY data transfers Dec 21, 2023
@jhf
Copy link

jhf commented Jan 23, 2024

I've tried this in a crystal project for fast import from a csv file to postgresql.
It worked as expected with copying, but I think there should be an example on how to use that, I can make a PR after this is merged.
My usage looks like this:

DB.connect("postgres://#{postgres_user}:#{postgres_password}@#{postgres_host}:#{postgres_port}/#{postgres_db}") do |db|
  io = db.exec_copy "COPY public.legal_unit_region_activity_category_stats_current(tax_reg_ident,name,employees,physical_region_code,primary_activity_category_code) FROM STDIN"
  csv = CSV.new(File.open(import_filename), headers: true, separator: ',', quote_char: '"')
  while csv.next
    sql_text = [csv["tax_reg_ident"],
                csv["name"],
                csv["employees"],
                csv["physical_region_code"],
                csv["primary_activity_category_code"],
    ].map do |v|
      case v
      when ""  then nil
      else          v
      end
    end.join("\t")
    puts "Uploading #{sql_text}" if verbose
    io.puts sql_text
  end
  puts "Waiting for processing" if verbose
  io.close
  db.close
end

I think there are possible improvements, such as having a IO#write_row that does the join and the puts, but I think those can come later.
For efficiency there could also be a binary version, but that would require using a PostgreSQL binary encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants