Both codes below read the same huge(~35MB) text file.
In the file 1000000 lines, the length of each line < 99 chars.
Stable result:
Python runs ~0.65s
C : ~0.70s
Any thoughts?
import time
t=time.time()
f=open('D:\\some.txt','r')
z=f.readlines()
f.close()
print len(z)
print time.time()-t
m=input()
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <ctime>
using namespace std;
char vs[1002000][99];
FILE *fp=fopen("D:\\some.txt","r");
int main() {
int i=0;
while (true) {
if (!fgets(vs[i],999,fp)) break;
++i;
}
fclose(fp);
cout << i << endl;
cout << clock()/CLOCKS_PER_SEC << endl;
int m;
cin >m;
cout << vs[m];
system("pause");
return 0;
} 28 2314
On Apr 26, 11:10 am, n00m <n...@narod.ruwrote:
Both codes below read the same huge(~35MB) text file.
In the file 1000000 lines, the length of each line < 99 chars.
Stable result:
Python runs ~0.65s
C : ~0.70s
Any thoughts?
Yes.
Most of the dirty work in the Python example is spent in tight loop
written in C. This is very likely to be faster on Python on Windows
than your "C" example for several reasons:
1. Python is compiled with Microsoft's C compiler, which produces more
optimal code than Mingw.
2. The Python readline() function has been in the library for a long
time and has had time for many developers to optimize it's
performance.
3. Your "pure C" code isn't even C, let alone pure C. It's C++. On
most systems, the C++ iostream libraries have a lot more overhead than
C's stdio.
And, finally, we must not fail to observe that you measured these
times without startup, which is obviousy much greater for Python. (Of
course, we only need to point this so it's not misunderstood that
you're claiming this Python process will terminate faster than the C++
one.)
So, I must regrettably opine that your example isn't very meaningful.
import time
t=time.time()
f=open('D:\\some.txt','r')
z=f.readlines()
f.close()
print len(z)
print time.time()-t
m=input()
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <ctime>
using namespace std;
char vs[1002000][99];
FILE *fp=fopen("D:\\some.txt","r");
int main() {
int i=0;
while (true) {
if (!fgets(vs[i],999,fp)) break;
++i;
}
fclose(fp);
cout << i << endl;
cout << clock()/CLOCKS_PER_SEC << endl;
int m;
cin >m;
cout << vs[m];
system("pause");
return 0;
}
fgets() from C++ iostream library???
I guess if I'd came up with "Python reads SLOWER than C"
I'd get another (not less) smart explanation "why it's so".
"n00m" <n0**@narod.ruschreef in bericht
news:6a**********************************@a23g2000 hsc.googlegroups.com...
import time
t=time.time()
f=open('D:\\some.txt','r')
z=f.readlines()
f.close()
print len(z)
print time.time()-t
m=input()
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <ctime>
using namespace std;
char vs[1002000][99];
FILE *fp=fopen("D:\\some.txt","r");
int main() {
int i=0;
while (true) {
if (!fgets(vs[i],999,fp)) break;
++i;
}
first of all I would rewrite the C loop to:
int main() {
int i=0;
while (fgets(vs[i],999,fp))
++i;
}
but I think that the difference comes from what you do in the beginning of
the C source:
char vs[1002000][99];
this reserves 99,198,000 bytes so expect a lot of cache trashing in the C
code!
Is there an implementation of f.readlines on the internet somewhere?
interested to see in how they implemented it. I'm pretty sure they did it
smarter than just reserve 100meg of data :)
fclose(fp);
cout << i << endl;
cout << clock()/CLOCKS_PER_SEC << endl;
int m;
cin >m;
cout << vs[m];
system("pause");
return 0;
}
"SL" <ni@hao.comschreef in bericht
news:be**************************@cache3.tilbu1.nb .home.nl...
>
"n00m" <n0**@narod.ruschreef in bericht
news:6a**********************************@a23g2000 hsc.googlegroups.com...
>using namespace std; char vs[1002000][99];
> if (!fgets(vs[i],999,fp)) break;
BTW why are you declaring the array as 99 and pass 999 to fgets to read a
line?
On Apr 26, 12:10*pm, n00m <n...@narod.ruwrote:
Both codes below read the same huge(~35MB) text file.
In the file 1000000 lines, the length of each line < 99 chars.
Stable result:
Python runs ~0.65s
C : ~0.70s
Any thoughts?
import time
t=time.time()
f=open('D:\\some.txt','r')
z=f.readlines()
f.close()
print len(z)
print time.time()-t
m=input()
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <ctime>
using namespace std;
char vs[1002000][99];
FILE *fp=fopen("D:\\some.txt","r");
int main() {
* * int i=0;
* * while (true) {
* * * * if (!fgets(vs[i],999,fp)) break;
* * * * ++i;
* * }
* * fclose(fp);
* * cout << i << endl;
* * cout << clock()/CLOCKS_PER_SEC << endl;
* * int m;
* * cin >m;
* * cout << vs[m];
* * system("pause");
return 0;
}
First try again with pure C code and compile with a C compiler, not
with C++ code and C++ compiler.
Then, tweak the code to use more buffering, to make it more similar
to readline code, like this (not tested):
#include <stdio.h>
#include <time.h>
char vs[1002000][100];
char buffer[65536];
int main(void) {
FILE *fp;
int i, m;
clock_t begin, end;
double t;
begin = clock();
fp = fopen("cvspython.txt", "r");
i = 0;
setvbuf(fp, buffer, _IOFBF, sizeof(buffer));
while(1) {
if(!fgets(vs[i], 100, fp)) break;
++i;
}
fclose(fp);
printf("%d\n", i);
end = clock();
t = (double)(end - begin)/CLOCKS_PER_SEC;
printf("%g\n", t);
scanf("%d", &m);
printf("%s\n", vs[m]);
getchar();
return 0;
}
Finally, repeat your statement again, if necessary.
On Apr 26, 1:15*pm, n00m <n...@narod.ruwrote:
fgets() from C++ iostream library???
fgets is part of the standard C++ library and it lives in the std
namespace.
I guess if I'd came up with "Python reads SLOWER than C"
I'd get another (not less) smart explanation "why it's so".
>char vs[1002000][99];
In the file 1001622(or so) records like phone number + f/l names.
So the reserving makes sense, i think. Populating of vector<string>
is by zillion times slower.
>Is there an implementation of f.readlines on the internet somewhere?
I was greatly surprised how fast it is. As a matter of fact, it's the
point
of my message here.
>BTW why are you declaring the array as 99 and pass 999 to fgets to read a
line?
It doesn't matter. All the same, reading of a current line stopped at
'\n'.
Big number 999(9999,99999,...) just ensures that each line is read up
to its end.
>fgets is part of the standard C++ library and it lives in the std
namespace.
I thought it's from <stdio.h>. Anyway, it does not matter. PS Thanx
for the code.
2All:
Origin of my "discovery" is from http://www.spoj.pl/ranks/SBANK/start=400
I never thought it can be done in Python (there are a couple of Py
solutions
even faster than mine) and without stuff like radix sort etc.
hdante:
I run your code quite a few times.
Its time = 0.734s.
Of mine = 0.703-0.718s.
PS All I have is an ancient Mingw compiler (~1.9.5v) in Dev-C++.
On Apr 26, 12:15 pm, n00m <n...@narod.ruwrote:
fgets() from C++ iostream library???
Sheesh. That'll teach me to read carefully. (Ok, it probably won't.)
Other two points still apply.
Carl Banks
On Apr 26, 5:54*pm, n00m <n...@narod.ruwrote:
hdante:
I run your code quite a few times.
Its time = 0.734s.
Of mine = 0.703-0.718s.
PS All I have is an ancient Mingw compiler (~1.9.5v) in Dev-C++.
Okay, now I believe in you. :-P
The next step would be to reimplement readline.
No so simple, guys.
E.g., I can't solve (in Python) this: http://www.spoj.pl/problems/INTEST/
Keep getting TLE (time limit exceeded). Any ideas? After all, it's
weekend.
450. Enormous Input Test
Problem code: INTEST
The purpose of this problem is to verify whether the method you are
using to read input data is sufficiently fast to handle problems
branded with the enormous Input/Output warning. You are expected to be
able to process at least 2.5MB of input data per second at runtime.
Input
The input begins with two positive integers n k (n, k<=107). The next
n lines of input contain one positive integer ti, not greater than
109, each.
Output
Write a single integer to output, denoting how many integers ti are
divisible by k.
Example
Input:
7 3
1
51
966369
7
9
999996
11
Output:
4
On Apr 26, 8:28*pm, n00m <n...@narod.ruwrote:
No so simple, guys.
E.g., I can't solve (in Python) this:http://www.spoj.pl/problems/INTEST/
Keep getting TLE (time limit exceeded). Any ideas? After all, it's
weekend.
450. Enormous Input Test
Problem code: INTEST
The purpose of this problem is to verify whether the method you are
using to read input data is sufficiently fast to handle problems
branded with the enormous Input/Output warning. You are expected to be
able to process at least 2.5MB of input data per second at runtime.
Input
The input begins with two positive integers n k (n, k<=107). The next
n lines of input contain one positive integer ti, not greater than
109, each.
Output
Write a single integer to output, denoting how many integers ti are
divisible by k.
Example
Input:
7 3
1
51
966369
7
9
999996
11
Output:
4
Maybe the problem is not in reading the input.
PS: you can throw a lot of time away in that site. :-)
I'm there since summer 2004 :) (with several time breaks)
Btw seems all accepted pyth solutions (for this prob) used Psyco.
SL schrieb:
Is there an implementation of f.readlines on the internet somewhere?
interested to see in how they implemented it. I'm pretty sure they did
it smarter than just reserve 100meg of data :)
Of course it is. Checkout the Python sources :)
Christian
Dennis Lee Bieber wrote:
(untested for both):
-=-=-=-=-=-=-
Many thanks but alas both your codes got "wrong answer" verdict.
I can't understand why; they seem Ok (but I'm a bit sleepy:)).
One more brick.
This time I compare list.sort() vs sort(vector<string>).
Incredible. Python does it by 8.3s / 2.75s = 3 times faster than C++.
import time
f=open('D:\\v.txt','r')
z=f.readlines()
f.close()
t=time.time()
z.sort()
print time.time()-t
m=int(raw_input())
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <ctime>
using namespace std;
vector<stringvs;
FILE *fp=fopen("D:\\v.txt","r");
int main() {
int i=0;
while (true) {
char line[50];
if (!fgets(line,50,fp)) break;
vs.push_back(line);
++i;
}
fclose(fp);
double t;
t=clock()/CLOCKS_PER_SEC;
sort(vs.begin(),vs.end());
cout << clock()/CLOCKS_PER_SEC << endl;
int m;
cin >m;
cout << vs[m];
getchar();
return 0;
}
Oops... I spotted a slip in my C++ code. Forgot " - t" in
cout << clock()/CLOCKS_PER_SEC << endl;
The correct proportion is 7.5s / 2.75s = 2.7 times.
Have you tried this now?
First try again with pure C code and compile with a C compiler, not
with C++ code and C++ compiler.
Then, tweak the code to use more buffering, to make it more similar
to readline code, like this (not tested):
#include <stdio.h>
#include <time.h>
char vs[1002000][100];
char buffer[65536];
int main(void) {
FILE *fp;
int i, m;
clock_t begin, end;
double t;
begin = clock();
fp = fopen("cvspython.txt", "r");
i = 0;
setvbuf(fp, buffer, _IOFBF, sizeof(buffer));
while(1) {
if(!fgets(vs[i], 100, fp)) break;
++i;
}
fclose(fp);
printf("%d\n", i);
end = clock();
t = (double)(end - begin)/CLOCKS_PER_SEC;
printf("%g\n", t);
scanf("%d", &m);
printf("%s\n", vs[m]);
getchar();
return 0;
}
Finally, repeat your statement again, if necessary.
I did something near like that several days ago. Instead of programming in
C++ I did it with RM-Cobol. I used to know the times that cobol takes to
read the file and search for resutls, and I was surprised about the time
that Python took doing the same: really, really fast.
----- Original Message -----
From: "n00m" <n0**@narod.ru>
Newsgroups: comp.lang.python
To: <py*********@python.org>
Sent: Sunday, April 27, 2008 6:28 AM
Subject: Re: Python(2.5) reads an input file FASTER than pure C(Mingw)
One more brick.
This time I compare list.sort() vs sort(vector<string>).
Incredible. Python does it by 8.3s / 2.75s = 3 times faster than C++.
import time
f=open('D:\\v.txt','r')
z=f.readlines()
f.close()
t=time.time()
z.sort()
print time.time()-t
m=int(raw_input())
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <ctime>
using namespace std;
vector<stringvs;
FILE *fp=fopen("D:\\v.txt","r");
int main() {
int i=0;
while (true) {
char line[50];
if (!fgets(line,50,fp)) break;
vs.push_back(line);
++i;
}
fclose(fp);
double t;
t=clock()/CLOCKS_PER_SEC;
sort(vs.begin(),vs.end());
cout << clock()/CLOCKS_PER_SEC << endl;
int m;
cin >m;
cout << vs[m];
getchar();
return 0;
}
-- http://mail.python.org/mailman/listinfo/python-list
Both codes by Dennis Lee Bieber are Ok. The 2nd one ("slurper") ,
seems , a bit faster.
I only corrected the author's typo: should be "% div" instead of "/
div".
And added this (don't know helped it or not):
if div == 1:
print lim
return
And of course:
import psyco
psyco.full()
On Apr 27, 6:28*am, n00m <n...@narod.ruwrote:
No so simple, guys.
E.g., I can't solve (in Python) this:http://www.spoj.pl/problems/INTEST/
Keep getting TLE (time limit exceeded). Any ideas? After all, it's
weekend.
450. Enormous Input Test
Problem code: INTEST
The purpose of this problem is to verify whether the method you are
using to read input data is sufficiently fast to handle problems
branded with the enormous Input/Output warning. You are expected to be
able to process at least 2.5MB of input data per second at runtime.
Input
The input begins with two positive integers n k (n, k<=107). The next
n lines of input contain one positive integer ti, not greater than
109, each.
Output
Write a single integer to output, denoting how many integers ti are
divisible by k.
Example
Input:
7 3
1
51
966369
7
9
999996
11
Output:
4
The problem is faulty.
First, the bottleneck on reading a huge input is the harddisk speed
and the string processing, the result of the competition doesn't prove
anything but how fast the string processor and the harddisk is.
Python's string processing is not a very fast one as it creates a copy
of the string every now and then.
Lie wrote:
On Apr 27, 6:28�am, n00m <n...@narod.ruwrote:
No so simple, guys.
E.g., I can't solve (in Python) this:http://www.spoj.pl/problems/INTEST/
Keep getting TLE (time limit exceeded). Any ideas? After all, it's
weekend.
450. Enormous Input Test
Problem code: INTEST
The purpose of this problem is to verify whether the method you are
using to read input data is sufficiently fast to handle problems
branded with the enormous Input/Output warning. You are expected to be
able to process at least 2.5MB of input data per second at runtime.
Input
The input begins with two positive integers n k (n, k<=107). The next
n lines of input contain one positive integer ti, not greater than
109, each.
Output
Write a single integer to output, denoting how many integers ti are
divisible by k.
Example
Input:
7 3
1
51
966369
7
9
999996
11
Output:
4
The problem is faulty.
First, the bottleneck on reading a huge input is the harddisk speed
and the string processing, the result of the competition doesn't prove
anything but how fast the string processor and the harddisk is.
Python's string processing is not a very fast one as it creates a copy
of the string every now and then.
In case you didn't pay attention:
Python and C++ were tested on the same PC.
Another PC, another OS (Linux) and another compiler C++ (g++ 4.0.0-8)
Compare 2 my latest submissions: http://www.spoj.pl/status/SBANK,zzz/
times: 1.32s and 0.60s
Submitted codes:
import sys
z=sys.stdin.readlines()
print z[5]
#include <cstdio>
#include <cstdlib>
#include <vector>
#include <string>
using namespace std;
vector<stringvs;
int main() {
while (true) {
char line[50];
if (!fgets(line,50,stdin)) break;
vs.push_back(line);
}
return 0;
}
If it proves nothing then white is black and good is evil
On Apr 27, 4:54*pm, n00m <n...@narod.ruwrote:
Another PC, another OS (Linux) and another compiler C++ (g++ 4.0.0-8)
Compare 2 my latest submissions:http://www.spoj.pl/status/SBANK,zzz/
times: 1.32s and 0.60s
Submitted codes:
import sys
z=sys.stdin.readlines()
print z[5]
#include <cstdio>
#include <cstdlib>
#include <vector>
#include <string>
using namespace std;
vector<stringvs;
int main() {
* * while (true) {
* * * * char line[50];
* * * * if (!fgets(line,50,stdin)) break;
* * * * vs.push_back(line);
* * }
return 0;
}
If it proves nothing then white is black and good is evil
It seems that the "push_back" line takes most of the time of the
code. Remove it and execution will drop to 0.25s.
Python readline uses fread instead of fgets: http://svn.python.org/view/python/ta...64&view=markup
(see the file_readlines function)
If you write a code that does an fread loop, execution will drop to
0.01s.
This C code takes 0.25s. Almost all time is spent with string
manipulation.
#include <stdio.h>
#include <string.h>
#define B 8192
char vs[100000][40];
char buffer[b];
int main(void) {
int count;
char *begin, *end;
int i;
i = 0;
while (1) {
count = fread(buffer, 1, B, stdin);
if (count == 0) break;
begin = buffer;
while(1) {
end = (char *)memchr(begin, '\n', buffer+B-begin);
if (end == NULL) {
memmove(buffer, begin, buffer+B-begin);
break;
}
memmove(vs[i], begin, end-begin);
i = (i+1)%100000;
begin = end + 1;
}
}
return 0;
}
The difference, 0.60s-0.25s = 0.35s is probably mostly python's
memory management (which seems to be much more efficient than
std::vector default).
Very interesting post. :-) I had no idea about how much optimized the
builtin library was.
On Apr 28, 2:14Â*am, n00m <n...@narod.ruwrote:
Lie wrote:
On Apr 27, 6:28�am, n00m <n...@narod.ruwrote:
No so simple, guys.
E.g., I can't solve (in Python) this:http://www.spoj.pl/problems/INTEST/
Keep getting TLE (time limit exceeded). Any ideas? After all, it's
weekend.
450. Enormous Input Test
Problem code: INTEST
The purpose of this problem is to verify whether the method you are
using to read input data is sufficiently fast to handle problems
branded with the enormous Input/Output warning. You are expected to be
able to process at least 2.5MB of input data per second at runtime.
Input
The input begins with two positive integers n k (n, k<=107). The next
n lines of input contain one positive integer ti, not greater than
109, each.
Output
Write a single integer to output, denoting how many integers ti are
divisible by k.
Example
Input:
7 3
1
51
966369
7
9
999996
11
Output:
4
The problem is faulty.
First, the bottleneck on reading a huge input is the harddisk speed
and the string processing, the result of the competition doesn't prove
anything but how fast the string processor and the harddisk is.
Python's string processing is not a very fast one as it creates a copy
of the string every now and then.
In case you didn't pay attention:
Python and C++ were tested on the same PC.
In case you didn't pay attention, what I'm mainly concerned about is
running the same algorithm in the same machine in the same programming
language but in different phase of the moon that would give two
significantly different timing.
Harddisk seek time is based on harddisk's fragmentation, MFT's (or
equivalent) size, what "other" program are currently doing with the
harddisk, and by harddisk age. You can't avoid fragmentation, except
if you ghosted the harddisk and reinstall the OS after every test. You
can't prevent the OS to switch to other tasks while the test is
running they might possibly requested for a huge harddisk query. And
I'm sure you can't prevent hardware degradation (although the impact
of this is probably not significant). Heck I can go on to other phases
of the moon like how full and fragmented the memory (RAM) is, how much
swap is being used.
My secondary concern, is how the string processing is done in the
language. In python, every string operation creates a new string, in
C, a string processing is just a mere address passing, for this test,
the security given by python is unnecessary since it is read only. We
know which one is the more superior in speed, but we can't do anything
about this since this is implementation detail of the language.
>>a = ['zzz', 'aaa']
>>id(a[0]), id(a[1])
(12258848, 12259296)
>>a.sort() id(a[0]), id(a[1])
(12259296, 12258848)
>>>
On Apr 30, 8:57*pm, n00m <n...@narod.ruwrote:
>a = ['zzz', 'aaa'] id(a[0]), id(a[1])
(12258848, 12259296)>>a.sort()
>id(a[0]), id(a[1])
(12259296, 12258848)
That proves you know nothing, that is a list operation, not a string
operation. This discussion thread is closed Replies have been disabled for this discussion. Similar topics
4 posts
views
Thread by Gert Van den Eynde |
last post: by
|
reply
views
Thread by tbrothers |
last post: by
|
5 posts
views
Thread by google |
last post: by
|
4 posts
views
Thread by Pacino |
last post: by
|
reply
views
Thread by Fredrik Lundh |
last post: by
| | | | | | | | | | | |