Great, thanks to everyone who offered advice. I guess this question was more
off-topic than I thought, sorry about that.
The problem of child processes returning before the parent has a chance to
register them as having been started didn't even occur to me and although I
don't think that was the problem in this case, I've rectified it so it
doesn't come back to haunt me.
After taking everyone's advice in and making a few modifications to my list
manipulation functions I'm much happier with my understand of how linked
lists work (this would be the first time I've used any kind of linked list
in a program). However, that alone didn't resolve the problem. It turned out
as mentioned by Jens that using free() inside of a signal handler was
causing the problem, I've changed my code to instead just call waitpid()
when it's important to have accurate child information and all works
perfectly :)
Again my thanks.
~Kieran Simkin
Digital Crocus
http://digital-crocus.com/
"Kieran Simkin" <kieran@digital-crocus.com> wrote in message
news:Rq6nc.43$3p1.34@newsfe1-win...[color=blue]
> Hi all,
> I'm having some trouble with a linked list function and was wondering if
> anyone could shed any light on it. Basically I have a singly-linked list
> which stores pid numbers of a process's children - when a child is[/color]
fork()ed[color=blue]
> its pid is added to the linked list. I then have a SIGCHLD handler which[/color]
is[color=blue]
> supposed to remove the pid from the list when a child exits. The problem[/color]
I'm[color=blue]
> having is that very very occasionally and seemingly unpredictably, my
> function to remove an item from the list is just failing silently to do[/color]
so;[color=blue]
> a child exits, the signal handler gets run, but the child's pid does not[/color]
get[color=blue]
> removed from the list. I'm 99% sure this is a problem with my linked list
> code and not my signal handling code so I hope this is not off-topic for
> comp.lang.c.
> Anyway, here's my code. First the definitions for my structures:
>
> struct queue_node {
> char ipaddr[IPSIZE+1];
> int pid;
> struct queue_node *next;
> };
>
> struct queue_list {
> struct queue_node *head;
> int elements;
> };
>
> struct queue_list.head points to the first node in the list and I refer to
> my list by passing a pointer to a struct queue_list as an argument to
> functions.
>
> Here's my qdeletepid() function, this function removes a node from[/color]
anywhere[color=blue]
> in the list if its pid field matches the pid passed to the function:
>
> int qdeletepid (struct queue_list *pqueue, int pid) {
> struct queue_node *lcur;
> struct queue_node *lprev;
> if (pqueue->elements == 0) {
> listop=FALSE;
> return(0);
> }
> if (pqueue->elements == 1) {
> lcur=pqueue->head;
> if (lcur->pid == pid) {
> pqueue->head=NULL;
> pqueue->elements=0;
> free(lcur);
> }
> listop=FALSE;
> return(0);
> }
> lcur=pqueue->head;
> lprev=NULL;
> while (lcur!=NULL) {
> if (lcur->pid == pid) {
> if (lprev==NULL) {
> pqueue->head=NULL;
> pqueue->elements=0;
> } else {
> lprev->next=lcur->next;
> pqueue->elements=pqueue->elements-1;
> }
> free(lcur);
> }
> lprev=lcur;
> lcur=lcur->next;
> }
> return(0);
> }
>
> ----------
> For completeness, here's my qaddpid() function:
>
> int qaddpid (struct queue_list *nqueue, char *ip, int pid) {
> struct queue_node *new;
> struct queue_node *cur;
> struct queue_node *prev;
> listop=TRUE;
> new= (struct queue_node *) malloc(sizeof(struct queue_node));
> if (new==NULL) {
> syslog(LOG_NOTICE,"Failed to malloc");
> return(1);
> }
> new->next=NULL;
> snprintf(new->ipaddr,IPSIZE+1,"%s",ip);
> new->pid=pid;
> prev=NULL;
> cur=nqueue->head;
> while (cur != NULL) {
> prev=cur;
> cur=cur->next;
> }
> if (prev!=NULL) {
> new->next=prev->next;
> prev->next=new;
> } else {
> nqueue->head=new;
> }
> nqueue->elements++;
> return(0);
> }
>
> ------
> And here's the function I have set to handle sigchld:
>
> void childhandle (int signum) {
> pid_t cpid=0;
> while ((cpid=waitpid(0,NULL,WNOHANG)) > 0)
> qdeletepid(&queue,cpid);
> }
> --
> Any ideas why qdeletepid is failing to remove nodes from the list very
> occasionally? As this happens only rarely (but the script forks a lot of
> children constantly 24/7 so over the course of a couple of days the list
> gets progressively bigger and bigger) it's very difficult to debug without
> knowing the reason the problem is occurring.
>
> Help very much appreciated.
>
> ~Kieran Simkin
> Digital Crocus
>
http://digital-crocus.com/
>
>[/color]