If you care about performance, you may want to avoid CSV files. But since our data sources are often like our family, we can’t make a choice, we’ll see in this blog post how to process a CSV file as fast as possible.

  • fartsparkles@sh.itjust.works
    cake
    link
    fedilink
    arrow-up
    27
    ·
    3 months ago

    Holy shit, switching to PyArrow is going to make me seem a mystical wizard when I merge in the morning. I’ve easily halved the execution time of a horrible but unavoidable job (yay crappy vendor “API” that returns a huge CSV).