By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,988 Members | 1,367 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,988 IT Pros & Developers. It's quick & easy.

Python/Numpy have I already written the swiftest code for large array?

P: 1
I would like to get my script total execution time down from 4 minutes to less than 30 secs. I have a large 1d array (3000000+) of distances with many duplicate distances. I am trying to write the swiftest function that returns all distances that appear n times in the array. I have written a function in numpy but there is a bottleneck at one line in the code. Swift performance is an issue because the calculations are done in a for loop for 2400 different large distance arrays.*

Expand|Select|Wrap|Line Numbers
  1.  import numpy as np
  2.     for t in range(0, 2400):
  3.      a=np.random.randint(1000000000, 5000000000, 3000000)
  4.      b=np.bincount(a,minlength=np.size(a))
  5.      c=np.where(b == 3)[0] #SLOW STATEMENT/BOTTLENECK
  6.      return c
Given a 1d array of distances [2000000000,3005670000,2000000000,12345667,40007890 00,12345687,12345667,2000000000,12345667]
I would expect back an array of [2000000000,12345667] when queried to return an array of all distances that appear 3 times in the main array.

What should I do?
Dec 12 '17 #1
Share this question for a faster answer!
Share on Google+

Post your reply

Sign in to post your reply or Sign up for a free account.