Hello, all,
I have an interesting problem about stl map and pthread on Linux
and g++. The source code is as follows.
//mt_map_test.cpp
#include <string>
#include <map>
#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <pthread.h>
using namespace std;
static void* thrd_work( void* data )
{
long count = reinterpret_cast<long>( data );
const int SIZE =32;
char buf[SIZE];
memset( buf, 'h', SIZE );
map<string, string> strMap;
for ( long i=0; i< count ; i++ ) {
char key[8];
sprintf( key, "%d", i );
strMap[ key ] = buf;
}
}
int main()
{
int JOB_NUM = 320000;
pthread_t tid[THRD_NUM];
for ( int i=0; i< THRD_NUM; i++ ){
pthread_create( tid+i, NULL, &thrd_work,
reinterpret_cast<void*> (JOB_NUM) );
}
for ( int i=0; i< THRD_NUM; i++ ){
pthread_join( tid[i], NULL );
}
}
And here is what I got from my workstation ( Dual AMD Opteron machine,
RHEL 3)
[nan@eudyptula test]$ for t in 1 2; do g++ -DTHRD_NUM=$t
mt_map_test.cpp -lpthread ; time ./a.out; done
real 0m1.390s
user 0m1.280s
sys 0m0.120s
real 0m3.450s
user 0m5.320s
sys 0m1.170s
I expected that the 2 times should be roughly equal. But clearly I
experienced significant slowdown with 2 threads.The same also happened
to a dual Intel Xeon machine. I suspect the internal stl map
implementation is improper.
I've spent hours googling without any answer. I really need advice from
a C++ expert. Thanks a lot.
Nan 9 7961 na******@gmail.com wrote:
[] I expected that the 2 times should be roughly equal. But clearly I experienced significant slowdown with 2 threads.The same also happened to a dual Intel Xeon machine. I suspect the internal stl map implementation is improper.
I've spent hours googling without any answer. I really need advice from a C++ expert. Thanks a lot.
The problem may well be in malloc(), since it uses a mutex to protect
its data structures, thus serializing your map<> inserts. Try http://www.hoard.org/ na******@gmail.com wrote: Hello, all, I have an interesting problem about stl map and pthread on Linux and g++. The source code is as follows.
//mt_map_test.cpp #include <string> #include <map> #include <unistd.h> #include <sys/types.h> #include <stdio.h> #include <string.h> #include <pthread.h>
using namespace std;
static void* thrd_work( void* data ) { long count = reinterpret_cast<long>( data );
const int SIZE =32; //char buf[SIZE];
// allow room for a nul-terminator so the temp std::string
// constructed from buf[] by 'strMap[ key ] = buf' will
// always be SIZE bytes in length.
char buf[SIZE + 1];
memset( buf, 'h', SIZE );
// nul-terminate buf[]. if we don't, then when a
// std::string is made from buf[] the nbr of bytes
// put into the std::string will be all bytes up to the
// next zero byte (an unknown length) - undefined behavior.
buf[SIZE] = '\0';
map<string, string> strMap;
for ( long i=0; i< count ; i++ ) { char key[8]; sprintf( key, "%d", i ); strMap[ key ] = buf; } }
int main() { int JOB_NUM = 320000; pthread_t tid[THRD_NUM];
for ( int i=0; i< THRD_NUM; i++ ){ pthread_create( tid+i, NULL, &thrd_work, reinterpret_cast<void*> (JOB_NUM) ); }
for ( int i=0; i< THRD_NUM; i++ ){ pthread_join( tid[i], NULL ); }
}
And here is what I got from my workstation ( Dual AMD Opteron machine, RHEL 3)
[nan@eudyptula test]$ for t in 1 2; do g++ -DTHRD_NUM=$t mt_map_test.cpp -lpthread ; time ./a.out; done
real 0m1.390s user 0m1.280s sys 0m0.120s
real 0m3.450s user 0m5.320s sys 0m1.170s
I expected that the 2 times should be roughly equal. But clearly I experienced significant slowdown with 2 threads.The same also happened to a dual Intel Xeon machine. I suspect the internal stl map implementation is improper.
I've spent hours googling without any answer. I really need advice from a C++ expert. Thanks a lot.
Nan
Just because the you have 2 cpu's doesn't mean that each thread
will run on its own cpu. I've seen discussions about this in
the gcc and g++ newsgroups. I don't remember the details, but
I seem to recall that special compile/link options were involved
to force usage of the multiple cpu's.
FYI, here's what I get on my old single-cpu PII-450 with
384MB of RAM:
real 0m9.407s
user 0m9.006s
sys 0m0.215s
real 0m18.671s
user 0m17.959s
sys 0m0.489s
Regards,
Larry
Thanks a lot for the replies. I can use sched_setaffinity for
processor binding on Linux. But that does not look like what the
problem is. Also, I made a multi-process program using fork(), and
the result there is satisfactory.
Below is the new code with processor binding, multi-process and
corrections on the std::string.
-----------------------------------------------------------
#include <string>
#include <map>
#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <sys/wait.h>
#include <sched.h>
using namespace std;
struct WorkData {
int cpu;
long count;
};
static void* work( void* data )
{
WorkData* workData = reinterpret_cast<WorkData*>( data );
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET( workData->cpu, &mask);
sched_setaffinity(0, &mask);
const int SIZE =32;
char buf[SIZE];
memset( buf, 'h', SIZE );
buf[SIZE-1] = '\0';
map<string, string> strMap;
for ( long i=0; i< workData->count ; i++ ) {
char key[8];
sprintf( key, "%d", i );
strMap[ key ] = buf;
}
}
int main()
{
const int JOB_NUM = 320000;
WorkData workData[ CONCURRENCY ];
#ifdef MT
pthread_t tid[ CONCURRENCY ];
for ( int i=0; i< CONCURRENCY; i++ ){
workData[i].count = JOB_NUM;
workData[i].cpu = i;
pthread_create( tid+i, NULL, &work, reinterpret_cast<void*>
(&(workData[i])) );
}
for ( int i=0; i< CONCURRENCY; i++ ){
pthread_join( tid[i], NULL );
}
#endif
#ifdef MP
pid_t pid[ CONCURRENCY ];
for ( int i=0; i< CONCURRENCY; i++ ){
workData[i].count = JOB_NUM;
workData[i].cpu = i;
if ( (pid[i] = fork()) == 0 ) {
work( reinterpret_cast<void*> ( &(workData[i])));
exit( 0 );
}
}
for ( int i=0; i< CONCURRENCY; i++ ){
waitpid( pid[i], NULL, 0 );
}
#endif
}
----------------------------------------------------------
Performace:
[nan@eudyptula test]$ for m in MT MP; do for t in 1 2; do printf "\n%s
%d" $m $t; g++ -D$m -DCONCURRENCY=$t mt_map_test.cpp -lpthread ;
time ./a.out; done; done
MT 1
real 0m1.410s
user 0m1.380s
sys 0m0.030s
MT 2
real 0m3.650s
user 0m5.230s
sys 0m1.640s
MP 1
real 0m1.380s
user 0m1.340s
sys 0m0.040s
MP 2
real 0m1.400s
user 0m2.650s
sys 0m0.130s
I also suspect there is some locking problem. I have not got my
program to work with hoard yet. But I did not find any slowdown when I
did a simple test on malloc in a multi-threaded program. For now, I
am going to dig into the map implementation on my machine.
I tried the same program again tonight on a Redhat FC3 dual processor
machine. The problem just went away. Notice the gcc on FC3 is
3.4.2 . Before I used 3.2.3. on RHEL3.
gcc -v
Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--disable-checking --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-java-awt=gtk
--host=i386-redhat-linux
Thread model: posix
gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)
$ for m in MT ; do for t in 1 2; do printf "\n%s %d" $m $t; g++
-D$m -DCONCURRENCY=$t mt_map_test.cpp -lpthread; time ./a.out; done;
done
MT 1
real 0m1.707s
user 0m1.632s
sys 0m0.075s
MT 2
real 0m1.805s
user 0m3.303s
sys 0m0.240s
BTW, the signature of sched_setaffinity changed on FC3,
need to use 'sched_setaffinity(0, sizeof(mask), &mask);'
I also got hoard working on FC3. But no further improvement. On RHEL3,
I got the so file with the following warning:
Compiling for Linux
In file included from libhoard.cpp:164:
usesimtls.cpp: In function `int pthread_create(pthread_t*, const
pthread_attr_t*, void*(*)(void*), void*)':
usesimtls.cpp:323: warning: `pthread_attr_setstackaddr' is deprecated
(declared
at /usr/include/nptl/pthread.h:299)
In file included from libhoard.cpp:164:
usesimtls.cpp: In function `void pthread_exit(void*)':
usesimtls.cpp:354: warning: `noreturn' function does ret
But I got segfault in running my program with MT2. MT1, MP1 and MP2
are all OK.
Clearly, something has changed between gcc 3.2 and gcc 3.4. na******@gmail.com wrote:
[] And here is what I got from my workstation ( Dual AMD Opteron machine, RHEL 3)
[nan@eudyptula test]$ for t in 1 2; do g++ -DTHRD_NUM=$t mt_map_test.cpp -lpthread ; time ./a.out; done
real 0m1.390s user 0m1.280s sys 0m0.120s
real 0m3.450s user 0m5.320s sys 0m1.170s
I expected that the 2 times should be roughly equal. But clearly I experienced significant slowdown with 2 threads.The same also happened to a dual Intel Xeon machine. I suspect the internal stl map implementation is improper.
I've spent hours googling without any answer. I really need advice from a C++ expert. Thanks a lot.
The problem may be in libstdc++ caching allocator. See http://gcc.gnu.org/onlinedocs/libstd...allocator.html
I ran your code on a Dual Xeon 2.8 box with caching disabled and
enabled. Here are my results:
my@devel:~/src/exp> cat /etc/issue
Welcome to SuSE Linux 9.2 (i586) - Kernel \r (\l).
my@devel:~/src/exp> uname -a
Linux devel 2.6.11.4-20a-smp #1 SMP Wed Mar 23 21:52:37 UTC 2005 i686
i686 i386 GNU/Linux
my@devel:~/src/exp> g++ --version
g++ (GCC) 3.3.4 (pre 3.3.5 20040809)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
my@devel:~/src/exp> for t in 1 2; do g++ -O3 -DTHRD_NUM=$t exp.cpp
-pthread ; GLIBCPP_FORCE_NEW=1 GLIBCXX_FORCE_NEW=1 time -p ./a.out;
done
real 1.16
user 1.10
sys 0.05
real 1.37
user 2.55
sys 0.14
my@devel:~/src/exp> for t in 1 2; do g++ -O3 -DTHRD_NUM=$t exp.cpp
-pthread ; time -p ./a.out; done
real 1.12
user 1.06
sys 0.05
real 3.48
user 5.28
sys 1.40
In the former case with caching disabled it's clear from the real time
numbers that the task is scaled well on the two processors.
A little boost is gained using hoard allocator:
my@devel:~/src/exp> for t in 1 2; do g++ -O3 -DTHRD_NUM=$t exp.cpp
-pthread ; LD_PRELOAD=/usr/local/lib/libhoard.so:/usr/lib/libdl.so
GLIBCPP_FORCE_NEW=1 GL\IBCXX_FORCE_NEW=1 time -p ./a.out; done
real 1.15
user 1.09
sys 0.05
real 1.31
user 2.46
sys 0.13
Just to fix up the report output, I ran the following command,
cat /proc/cpuinfo | egrep "(model name|cpu MHz)" ; \
uname -a | awk '{print "kernel name : " $1 " " $3 " " $11 " " $14
}' ; \
gcc -v 2>&1 | grep gcc | awk '{print "gcc version : " $3 $4 }'; \
for method in MT MP; do \
for concurrency in 1 2 3 4 5 ; do \
printf "%s / %s = " $method $concurrency ; \
g++ -D$method=1 -DCONCURRENCY=$concurrency mt_map_test.cpp
-lpthread ; \
for samples in 1 2 3 4 5 6 7 8 9 ; do \
/usr/bin/time ./a.out 2>&1 | \
grep -v swaps | \
sed -e "s/^\(.*\)user.*$/\1/g" ; \
done | sort | head -5 | tail -1 ; \
done ; \
done
The basic idea is to run multiple times and pull the median . . .
For convenience I also added some conditional definitions,
#ifndef CONCURRENCY
#define CONCURRENCY 1
#endif
#if !defined( MP ) && !defined( MT )
#define MT 1
#endif
My machine has only a single processor, so the results do not seem
surprising,
model name : AMD Athlon(tm) 64 Processor 2800+
cpu MHz : 1808.843
kernel name : Linux 2.6.12-1.1398_FC4 x86_64 GNU/Linux
gcc version : 4.0.120050727
MT / 1 = 1.30
MT / 2 = 2.65
MT / 3 = 4.01
MT / 4 = 5.35
MT / 5 = 6.47
MP / 1 = 1.34
MP / 2 = 2.69
MP / 3 = 4.01
MP / 4 = 5.35
MP / 5 = 6.69
Thanks a lot. Maxim. GLIBCPP_FORCE_NEW is exactly where the problem
is. I dropped the processor binding code and ran the following script
( adapted from Vic's report script ) on 2 machines.
$ cat /home/nan/measure.sh
#!/bin/bash
cat /proc/cpuinfo | egrep "(model name|cpu MHz)" ;
uname -a | awk '{print "kernel name : " $1 " " $3 " " $11 " " $14
}';
gcc -v 2>& 1 | grep '^gcc' | awk '{print "gcc version : " $3 $4 }';
printf "\n CD CE";
for m in MT MP; do
for t in $(seq 1 4); do
printf "\n%s / %d = " $m $t;
g++ -D$m -DCONCURRENCY=$t mt_map_test.cpp
-lpthread;
for cache in 'export' 'export -n' ; do
for samples in $(seq 1 5); do
$cache GLIBCPP_FORCE_NEW=1; $cache
GLIBCXX_FORCE_NEW=1; /usr/bin/time -f %e ./a.out 2>&1;
done | sort | head -3 | tail -1 | tr '\n' ' '
done
done
done
Machine 1
$ sh measure.sh
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2791.459
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2791.459
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2791.459
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2791.459
kernel name : Linux 2.6.9-1.667smp 2004 i386
gcc version : 3.4.220041017
CD CE
MT / 1 = 1.69 1.68
MT / 2 = 1.79 1.86
MT / 3 = 2.85 2.93
MT / 4 = 3.12 3.11
MP / 1 = 1.54 1.54
MP / 2 = 1.56 1.57
MP / 3 = 3.49 2.45
MP / 4 = 3.51 3.14
Machine 2
$ sh measure.sh
model name : Intel(R) Xeon(TM) CPU 2.40GHz
cpu MHz : 2387.948
model name : Intel(R) Xeon(TM) CPU 2.40GHz
cpu MHz : 2387.948
model name : Intel(R) Xeon(TM) CPU 2.40GHz
cpu MHz : 2387.948
model name : Intel(R) Xeon(TM) CPU 2.40GHz
cpu MHz : 2387.948
kernel name : Linux 2.4.21-4.ELsmp 2003 i386
gcc version : 3.2.320030502
CD CE
MT / 1 = 2.00 1.97
MT / 2 = 2.19 4.93
MT / 3 = 3.16 7.22
MT / 4 = 3.75 9.94
MP / 1 = 1.82 1.99
MP / 2 = 1.84 2.00
MP / 3 = 3.11 3.24
MP / 4 = 3.73 4.25
Note the above machines have 2 Hyperthreaded CPUs, so it only scales
up to 2.
For this particular program, it seems that MP is consistently faster
than MT.
And surprisingly, GLIBCPP_FORCE_NEW or GLIBCXX_FORCE_NEW has no effect
on gcc 3.4. na******@gmail.com wrote: Thanks a lot. Maxim. GLIBCPP_FORCE_NEW is exactly where the problem is. I dropped the processor binding code and ran the following script ( adapted from Vic's report script ) on 2 machines.
[]
Note the above machines have 2 Hyperthreaded CPUs, so it only scales up to 2. For this particular program, it seems that MP is consistently faster than MT. And surprisingly, GLIBCPP_FORCE_NEW or GLIBCXX_FORCE_NEW has no effect on gcc 3.4.
I've just checked out gcc 3.4.3 sources and GLIBCXX_FORCE_NEW check is
still there.
I ran a slightly modified script version to include results with Hoard
allocator. Here CE/D stand for c++ caching allocator enabled/disabled,
H stands for Hoard.
my@devel:~/src/exp> ./measure
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2792.047
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2792.047
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2792.047
model name : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz : 2792.047
kernel name : Linux 2.6.11.4-20a-smp 2005 i386
gcc version : 3.3.4
CE CD CEH CDH
MT / 1 = 1.63 1.66 1.63 1.65
MT / 2 = 4.29 1.79 4.24 1.77
MT / 3 = 6.16 2.53 6.23 2.44
MT / 4 = 9.33 3.05 9.03 2.24
MP / 1 = 1.62 1.49 1.63 1.51
MP / 2 = 1.63 1.51 1.64 1.53
MP / 3 = 2.29 2.14 2.30 2.17
MP / 4 = 2.70 2.59 2.74 2.66
It would be nice if you can try this again with gcc 3.4.3. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Avin |
last post by:
Hi,
I am using the Red Hat Linux 9.0 & gcc 3.2
I am trying to make multithreaded library from my c/c++ source code.
Do I have to use -pthread gcc option during compile time, or this
option is...
|
by: lokb |
last post by:
Hi,
I am creating a detach thread as shown below and caling pthread_create in
a while loop where the file names in the directory are fetched and is
passed as a parmater to pthread create. The...
|
by: szehau |
last post by:
Hi all,
I have a program written in C with embeded SQL. Following are the
configuration:
DB2/LINUX 8.1.5
Thread model: posix
gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
My problems...
|
by: Mark Shelor |
last post by:
I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical...
|
by: fran |
last post by:
Server: IBM XSERIES 225 (Intel Xeon CPU 2.40GHz 1GB RAM)
Operating System: Linux RedHat ES 2.1 kernel 2.4.9
Languaje: C++
Compiler: gcc 2.96
Libs: pthread
We are in need of your help in...
|
by: noleander |
last post by:
I've got an application that uses Pthread to do threading. Mostly Im using
Condition Variables and the associated function calls:
- pthread_cond_wait()
- pthread_cond_signal()
-...
|
by: noel.yim |
last post by:
Hi,
First question is:
I want to assign 8 threads to 8 processors (must one to one), when I
initialize the thread, does it guarantee that each CPU has one thread?
or there can be some case such...
|
by: tonno01 |
last post by:
Hi...
i am developing some multi-threaded application on linux (fedora) Intel SMP hardware.
i was wondering if it is possible to force a pthread to use always the same processor, in other words,...
|
by: Alexander Kotelnikov |
last post by:
Hello.
I faced the same problem which arose before
Message-ID: <1124162913.832263.262410@g14g2000cwa.googlegroups.com>...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
| |