If anybody has any insight into this problem I'm running into I would
really appreciate if you could write to me...
I'm running a simple C++ program on Solaris 8 that forks and execs a
bunch of processes. It's been running fine for years, but now that
I've moved to faster hardware, I'm running into a problem that
surfaces more frequently as the hardware I'm using gets better/faster
-- it seems like some sort of race condition issue.
Basically, at random, a handful of the process immediately fail
(before actually doing anything) and return an exit code of 4294967295
(UINT_MAX). I imagine that this is just an umbrella status code for
all unexpected/unexplained errors, so I'm not sure if it means
anything?
One thing to note is that if I just trap this error and execute the
process again, it runs fine. It just seems like at the time the
fork/exec takes place, something in the system temporarily screws up
but I don't know what. Of course I do have a workaround (just re-run
the process) but I'd like to know what's going on.
If anybody has encounted anything like this, please let me know, and I
can provide you with more information if need be...thanks. 7 10788
"Vineet" <vi************ *@mantas.com> wrote in message news:4a******** *************** ***@posting.goo gle.com...
: If anybody has any insight into this problem I'm running into I would
: really appreciate if you could write to me...
:
: I'm running a simple C++ program on Solaris 8 that forks and execs a
: bunch of processes. It's been running fine for years, but now that
: I've moved to faster hardware, I'm running into a problem that
: surfaces more frequently as the hardware I'm using gets better/faster
: -- it seems like some sort of race condition issue.
Sounds like fork is failing. If the program is not running as root,
you are probably exceeding maxuproc (see your kernel parameters
documentation). If it is running as root, you're exceeding nprocs.
Nprocs determines how many structures controlling processes are
created. Maxuproc determines how many processes a user can
have - it should always be less than half nprocs and is usually quite
a bit smaller. Default parameters for most machines are not set up
properly to handle heavy usage by a small number of programs or users.
One other problem, it is likely that you are not handling wait situations
properly and may have a large number of zombie processes. These
will consume limited process resources. Even if your processes
are being reaped by init it's possible to get in race conditions where zombie's
are created faster than init can reap them. Without seeing any
actual code - like your fork-exec code - it's impossible to say.
Post some code if you want more help.
Good luck,
Dan
:
: Basically, at random, a handful of the process immediately fail
: (before actually doing anything) and return an exit code of 4294967295
: (UINT_MAX). I imagine that this is just an umbrella status code for
: all unexpected/unexplained errors, so I'm not sure if it means
: anything?
It's a -1.
:
: One thing to note is that if I just trap this error and execute the
: process again, it runs fine. It just seems like at the time the
: fork/exec takes place, something in the system temporarily screws up
: but I don't know what. Of course I do have a workaround (just re-run
: the process) but I'd like to know what's going on.
:
: If anybody has encounted anything like this, please let me know, and I
: can provide you with more information if need be...thanks. vi************* @mantas.com (Vineet) writes: If anybody has any insight into this problem I'm running into I would really appreciate if you could write to me...
I'm running a simple C++ program on Solaris 8 that forks and execs a bunch of processes. It's been running fine for years, but now that I've moved to faster hardware, I'm running into a problem that surfaces more frequently as the hardware I'm using gets better/faster -- it seems like some sort of race condition issue.
Basically, at random, a handful of the process immediately fail (before actually doing anything) and return an exit code of 4294967295 (UINT_MAX). I imagine that this is just an umbrella status code for all unexpected/unexplained errors, so I'm not sure if it means anything?
That's -1, which the man page documents fork() to return in case of an
error. In that case you should check to see what the value of errno
is. You can get a textual version of the error by calling either
perror() or strerror(errno) . That will probably enlighten things.
Joe
--
Folks who don't know why America is the Land of Promise should be here
during an election campaign.
-- Milton Berle Post some code if you want more help.
Really, guys, this discussion belongs off-line or in a forum that is more
appropriate. This newsgroup doesn't discuss platform-specific stuff like
processes/threads, etc. This is a language newsgroup.
-Howard
"Howard" <al*****@hotmai l.com> writes: Post some code if you want more help.
Really, guys, this discussion belongs off-line or in a forum that is more appropriate. This newsgroup doesn't discuss platform-specific stuff like processes/threads, etc. This is a language newsgroup.
This thread is cross-posted to comp.lang.c++, comp.unix.progr ammer,
comp.lang.c, and comp.unix.solar is. It's probably appropriate in
comp.unix.progr ammer and/or comp.unix.solar is, which do discuss
platform-specific stuff. Please trim the newsgroups line on any
followups.
--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this. vi************* @mantas.com (Vineet) writes: Basically, at random, a handful of the process immediately fail (before actually doing anything) and return an exit code of 4294967295 (UINT_MAX). I imagine that this is just an umbrella status code for all unexpected/unexplained errors, so I'm not sure if it means anything?
A process cannot fail with an exit code of 4294967295 (UINT_MAX);
the only valid exit codes are between 0 and 255 (inclusive).
So the first question is: what is returning -1 (whatever returns
a number with all bits set is more ikely to return -1 than UINT_MAX)
One thing to note is that if I just trap this error and execute the process again, it runs fine. It just seems like at the time the fork/exec takes place, something in the system temporarily screws up but I don't know what. Of course I do have a workaround (just re-run the process) but I'd like to know what's going on.
Have you tried "truss -f"?
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Howard wrote:
.... snip ... Really, guys, this discussion belongs off-line or in a forum that is more appropriate. This newsgroup doesn't discuss platform-specific stuff like processes/threads, etc. This is a language newsgroup.
So, instead of simply muttering, set followups.
--
Replies should be to the newsgroup
Replace this line to publish your e-mail address.
In article <40************ ***********@new s.xs4all.nl>, Casper H.S. Dik wrote: fork/exec takes place, something in the system temporarily screws up but I don't know what. Of course I do have a workaround (just re-run the process) but I'd like to know what's going on.
Have you tried "truss -f"?
This might help spot the difference between 2 truss outputs
(which you name on the command line). Random interleaving
of parent and child contributions is left for you to sort
out in this version. I might have done more if I'd thoght
I'd need it regularly.
#!/usr/bin/perl -w
sub display
{
for ($ln=$lineno-5; $ln<=$lineno; $ln++) {
next if ($ln<1);
printf("%s(%d) %s\n", $ARGV[0], $ln, $left{$ln});
}
print "\n";
for ($ln=$lineno-5; $ln<=$lineno; $ln++) {
next if ($ln<1);
printf("%s(%d) %s\n", $ARGV[1], $ln, $right{$ln});
}
}
sub lineparse
{
$_=shift;
TEST: while (1) {
$syscall="NA";
$result="NA";
if ($_ =~ /\s(\S+)$/) {
$result=$1;
}
if ($_ =~ /^\d+:\s+([^\(]+)\(/) {
$syscall=$1;
last TEST;
}
if ($_ =~ /^\d+:\s+\*\*\*. *\*\*\*$/) {
print "$_\n";
last TEST;
}
die("RE not matched: $_\n");
}
}
############### ############### #########
open(LFH, "<$ARGV[0]") or die("open $ARGV[0]");
open(RFH, "<$ARGV[1]") or die("open $ARGV[1]");
for($lineno=1;; $lineno++) {
$left=<LFH>;
$right=<RFH>;
if ( (!defined($left )) && (defined($right )) ) {
die("end of $ARGV[0]");
}
if ( (defined($left) ) && (!defined($righ t)) ) {
die("end of $ARGV[1]");
}
chomp($left);
chomp($right);
#print "DEBUG $left\n";
lineparse($left );
$left_syscall=$ syscall;
$left_result=$r esult;
$left{$lineno}= $left;
lineparse($righ t);
$right_syscall= $syscall;
$right_result=$ result;
$right{$lineno} =$right;
if ($right_syscall ne $left_syscall) {
print "syscall difference\n\n" ;
display();
exit(1);
}
if ( ($right_result =~ /^\d+$/) && ($left_result !~ /^\d+$/)) {
print "Non-Numerical Result (left)\n\n";
display();
exit(1);
}
if ( ($right_result !~ /^\d+$/) && ($left_result =~ /^\d+$/)) {
print "Non-Numerical Result (right)\n\n";
display();
exit(1);
}
if ( ( ("0" eq $right_result) && ("0" ne $left_result) ) ||
( ("0" ne $right_result) && ("0" eq $left_result) ) ){
print "Numerical Results\n\n";
display();
exit(1);
}
}
exit(0);
--
Elvis Notargiacomo master AT barefaced DOT cheek http://www.notatla.org.uk/goen/ This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Vineet |
last post by:
If anybody has any insight into this problem I'm running into I would
really appreciate if you could write to me...
I'm running a simple C++ program on Solaris 8 that forks and execs a
bunch of processes. It's been running fine for years, but now that
I've moved to faster hardware, I'm running into a problem that
surfaces more frequently as the hardware I'm using gets better/faster
-- it seems like some sort of race condition issue.
...
|
by: Starx |
last post by:
I am writing a fraction class and I was testing my addition operator to
find out how big the numerator and denominator can be before an
overflow occurs. I was doing it like this:
fraction frac1(10001, 10000); //Creates 10001/10000
cout << frac1 << " * 2 = " << frac1 + frac1 << endl;
I then proceeded to add zero's to the first line and kept executing the
program and looking for an obviously wrong result as an indication of
overflow. ...
|
by: Tom D |
last post by:
I'm running MySQL 4.1 under Linux. I need to have a MyISAM table with
more than 4G rows of data.
I've read the manual regarding the MAX_ROWS option on tables. This
table has a fixed row length, with 17 bytes per row.
MySQL simply won't set the ROW_LENGTH greater than 4G. When I use an
ALTER TABLE to set it higher, it acts as though it worked fine, but a SHOW
TABLE STATUS indicates that it is still limited to 4294967295.
|
by: Mantorok Redgormor |
last post by:
#include <stdio.h>
#include <limits.h>
int main(void)
{
unsigned int mask;
int a = -1;
mask = 1u << (CHAR_BIT * sizeof mask - 1);
|
by: BMarsh |
last post by:
Hi all,
I have a slight problem understanding the following code that I saw on
a Unix-PAM tutorial (not OT!)
The following code will compare and old string to a new one, bombing
out if 'max' similar chars is exceeded.
------8<------
| |
by: joshc |
last post by:
I've got two bits of code that I would like some more experienced folks
to check for conformance to the Standard. I've tried my best to read
the standard and search around and I think and hope this code contains
no cause for concern.
/* taking absolute value of signed integer */
int32 val;
uint32 abs_val;
val = -492;
|
by: Pedro Graca |
last post by:
I have a file with different ways to write numbers
---- 8< (cut) --------
0: zero, zilch,, nada, ,,,, empty , void, oh
1: one
7: seven
2: two, too
---- >8 --------------
I wanted to read that file and put it into dynamic memory, like
|
by: p.lavarre |
last post by:
Subject: Python CTypes translation of (pv != NULL)
And so then the next related Faq is:
Q: How should I test for ((void *) -1)?
A:
(pv == 0xffffFFFF) works often.
|
by: Yevgen Muntyan |
last post by:
Hey,
Is it correct that number of value bits in int and unsigned
int representation may be the same? If it is so, then INT_MIN
may be -(INT_MAX+1) (in mathematical sense), i.e. abs(INT_MIN)
is not representable in int or unsigned int. Or is it guaranteed
that magnitude of any int value fits in unsigned int?
Yevgen
|
by: subramanian100in |
last post by:
Below is my understanding about count algorithms.
Return type of count and count_if algorithms is
iterator_traits<InputIterator>::difference_type.
If the container contains more than 'difference_type' elements
satisfying the condition, then count and count_if algorithm cannot
return a value greater than 'difference_type'.
As an example, suppose maximum value of 'difference_type' is INT_MAX.
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |