By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
462,393 Members | 565 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 462,393 IT Pros & Developers. It's quick & easy.

Pandas: Merging Sorted Dataframes

P: 10

I have a large (Nx4, >10GB) array that I need to sort based on col.2.

I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. Here is what I have tried yet:

Expand|Select|Wrap|Line Numbers
  1. chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0,
  2.                      names=['col-1', 'col-2', 'col-3', 'col-4'])
  4. for df in chunks:
  5.     df = df.sort_values(by='col-2', kind='mergesort') # sorted chunks
  6.     print(df)
Aug 12 '20 #1
Share this Question
Share on Google+
2 Replies

P: 163
The process when reading the file divided is as follows.
Expand|Select|Wrap|Line Numbers
  1. import pandas as pd
  2. df = None
  3. for tmp in  pd.read_csv(ifile[0], chunksize=50000, names=['col-1', 'col-2', 'col-3', 'col-4']):
  4.     if df is None:
  5.         df = tmp
  6.     else:
  7.         df = df.append(tmp, ignore_index=True)
  9. df_s = df.sort_values(by='col-2', kind='mergesort')
  10. print(df_s)
Aug 18 '20 #2

P: 305
Follow standalone syntaxt for sort_values.

Aug 26 '20 #3

Post your reply

Sign in to post your reply or Sign up for a free account.