By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,885 Members | 1,474 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,885 IT Pros & Developers. It's quick & easy.

extract all hotmail email addresses in a file and store in separatefile

P: n/a
Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help
Jun 27 '08 #1
Share this Question
Share on Google+
45 Replies


P: n/a
On Jun 18, 3:33 pm, Dennis <dcho...@gmail.comwrote:
Hi, I have a text file that contents a list of email addresses like
this:

"f...@yahoo.com"
"t...@hotmail.com"
"je...@gmail.com"
"to...@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help
open INFILE, "<all_emails.txt";
open HOTMAIL, ">hotmail_only.txt";
open NOTHOTMAIL, ">not_hotmail.txt";
while(<INFILE>)
{
$_ =~ s/"//g;
print HOTMAIL if $_ =~ /hotmail/i;
print NOTHOTMAIL if $_ != /hotmail/i;
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;
Jun 27 '08 #2

P: n/a
On Jun 18, 3:33 pm, Dennis <dcho...@gmail.comwrote:
Hi, I have a text file that contents a list of email addresses like
this:

"f...@yahoo.com"
"t...@hotmail.com"
"je...@gmail.com"
"to...@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help
open INFILE, "<all_emails.txt";
open HOTMAIL, ">hotmail_only.txt";
open NOTHOTMAIL, ">not_hotmail.txt";
while(<INFILE>)
{
$_ =~ s/"//g;
print HOTMAIL if $_ =~ /hotmail/i;
print NOTHOTMAIL if $_ != /hotmail/i;
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;
Jun 27 '08 #3

P: n/a
szr
Dennis wrote:
Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help
To get you started, assuming there are no escaped quotes in between:

while (m!"([^"]+?)"!g) {
if ($1 =~ m!\@hotmail\.com!) {
... do something with $1
}
else { ... }
}
Or if you are using an array:

foreach my $email (map { s!"([^"]+?)"!$1!g; $_; } @email_list) {
if ($email =~ m!\@hotmail\.com!) {
...
}
else { ... }
}

(Note, untested, but should give a starting point.)

--
szr
Jun 27 '08 #4

P: n/a
On 18 Jun 2008 at 20:01, cartercc wrote:
On Jun 18, 3:33 pm, Dennis <dcho...@gmail.comwrote:
>I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

open INFILE, "<all_emails.txt";
open HOTMAIL, ">hotmail_only.txt";
open NOTHOTMAIL, ">not_hotmail.txt";
while(<INFILE>)
{
$_ =~ s/"//g;
print HOTMAIL if $_ =~ /hotmail/i;
print NOTHOTMAIL if $_ != /hotmail/i;
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;
Firstly, you mean !~ instead of !=. Secondly, referring to $_ all the
time is unnecessary. Try:

open INFILE, "< all_emails.txt";
open HOTMAIL, "hotmail_only.txt";
open NOTHOTMAIL, "not_hotmail.txt";
while(<INFILE>)
{
s/"//g;
if (/hotmail/i) {
print HOTMAIL;
} else {
print NOTHOTMAIL;
}
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;

You might also like to include some error-checking, and avoid
hard-coding the paths.

Jun 27 '08 #5

P: n/a

"Dennis" <dc*****@gmail.comwrote in message
news:e4**********************************@m36g2000 hse.googlegroups.com...
Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.
You have perl solutions so you won't need this. But was an interesting
little snippet:

/* Sort email addresses (possibly for some nefarious purpose) from file
"input" */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void error(void) {puts("File error"); exit(0);}

int main(void) {
char line[200];
char *p;
int n;

FILE *in,*hot,*nothot;

in=fopen("input","r");
if (in==0) error();

hot=fopen("hotmail","w");
if (hot==0) {fclose(in); error();};

nothot=fopen("nothotmail","w");
if (nothot==0) {fclose(in); fclose(nothot); error();};

while (1) {

fgets(line,sizeof(line),in);
if (feof(in)) break;

n=strlen(line);
p=line;
if (line[n-1]='\n') {line[n-1]=0; --n;};
if (n) {
if (line[n-1]='""') {line[n-1]=0; --n;};
if (*p=='"') ++p;

if (strstr(p,"@hotmail.com"))
fprintf(hot,"%s\n",p);
else
fprintf(nothot,"%s\n",p);
};
};

fclose(in);
fclose(hot);
fclose(nothot);

}
--
Bartc
Jun 27 '08 #6

P: n/a
On Wed, 18 Jun 2008 12:33:45 -0700, Dennis wrote:
Hi, I have a
<snip />

Why are you cross-posting this to C and Perl newsgroups?
Rui Maciel
Jun 27 '08 #7

P: n/a
In article <48***********************@news.telepac.pt>,
Rui Maciel <ru********@gmail.comwrote:
>On Wed, 18 Jun 2008 12:33:45 -0700, Dennis wrote:
>Hi, I have a
<snip />

Why are you cross-posting this to C and Perl newsgroups?
Rui Maciel
I assume because he is interested in a C/Perl solution to his problem.

Jun 27 '08 #8

P: n/a
On Jun 18, 8:33*pm, Dennis <dcho...@gmail.comwrote:
1. Strip out the " characters and just leave the email addresses on
each line.
char const *const original = "\"bo*@hotmail.com\"";

char buf[50];

strcpy(buf,original);

buf[strlen(original) - 1] = 0;

2. extract out the hotmail addresses and store it into another file.

Take the last 12 characters, make them all lowercase, and then compare
with "@hotmail.com".

I'm not big up on the file access functions, I usually just consult
the reference at dinkumware.com when I want to use them.
Jun 27 '08 #9

P: n/a
Dennis <dc*****@gmail.comwrote:
>Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
'perldoc perlop' and look for either tr/// in combination with the 'd'
option or s/// in combination with the 'g' option. Maybe in combination
with anchoring the RE.

Or you can use substr() to grab anything between the first and last
character, excluding both.
>2. extract out the hotmail addresses
perldoc -f grep
>and store it into another file.
perldoc -f open
>The hotmail addresses in the original file would be deleted.
perldoc -q "delete a line"

jue
Jun 27 '08 #10

P: n/a
Dennis wrote:
Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help
/* BEGIN new.c output */

Original original file contents:
"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

Final original file contents:
fo*@yahoo.com
je***@gmail.com
to***@apple.com

Final other file contents:
to*@hotmail.com

/* END new.c output */

/* BEGIN new.c */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

#define STRINGS \
{ "\"fo*@yahoo.com\"", "\"to*@hotmail.com\"", \
"\"je***@gmail.com\"", "\"to***@apple.com\""}

struct list_node {
struct list_node *next;
void *data;
};

typedef struct list_node list_type;

void squeeze(char *s1, const int s2);
int get_line(char **lineptr, size_t *n, FILE *stream);
int list_fputs(const list_type *node, FILE *stream);
list_type *list_append
(list_type **head, list_type *tail, void *data, size_t size);
void list_free(list_type *node, void (*free_data)(void *));

int main (void)
{
int rc;
size_t n;
char fn[2][L_tmpnam];
FILE *fp[2];
char *string[] = STRINGS;
size_t size = 0;
char *buff = NULL;
list_type *head = NULL;
list_type *tail = NULL;

puts("/* BEGIN new.c output */\n");
/*
** Create input file
*/
tmpnam(fn[0]);
tmpnam(fn[1]);
fp[0] = fopen(fn[0], "w");
if (fp[0] == NULL) {
fputs("fopen(fn[0]), \"w\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
for (n = 0; n != sizeof string / sizeof *string; ++n) {
fprintf(fp[0], "%s\n", string[n]);
}
fclose(fp[0]);
/*
** Read input file into list
*/
fp[0] = fopen(fn[0], "r");
if (fp[0] == NULL) {
fputs("fopen(fn[0], \"r\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
while ((rc = get_line(&buff, &size, fp[0])) 0) {
tail = list_append(&head, tail, buff, rc);
if (tail == NULL) {
fputs("tail == NULL\n", stderr);
break;
}
}
fclose(fp[0]);
/*
** Display input file contents
*/
puts("Original original file contents:");
list_fputs(head, stdout);
putchar('\n');
/*
** Strip out quotes from strings in memory
*/
for (tail = head; tail != NULL; tail = tail -next) {
squeeze(tail -data, '"');
}
/*
** Create output files
*/
fp[0] = fopen(fn[0], "w");
if (fp[0] == NULL) {
fputs("fopen(fn[0]), \"w\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
fp[1] = fopen(fn[1], "w");
if (fp[1] == NULL) {
remove(fn[0]);
fputs("fopen(fn[1]), \"w\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
for (tail = head; tail != NULL; tail = tail -next) {
if (strstr(tail -data, "hotmail") == NULL) {
fprintf(fp[0], "%s\n", tail -data);
} else {
fprintf(fp[1], "%s\n", tail -data);
}
}
list_free(head, free);
tail = head = NULL;
fclose(fp[0]);
fclose(fp[1]);
/*
** Read original file
** Display original file contents
*/
fp[0] = fopen(fn[0], "r");
if (fp[0] == NULL) {
fputs("fopen(fn[0], \"r\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
puts("Final original file contents:");
while ((rc = get_line(&buff, &size, fp[0])) 0) {
puts(buff);
}
putchar('\n');
fclose(fp[0]);
/*
** Read other file
** Display other file contents
*/
fp[1] = fopen(fn[1], "r");
if (fp[1] == NULL) {
fputs("fopen(fn[1], \"r\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
puts("Final other file contents:");
while ((rc = get_line(&buff, &size, fp[1])) 0) {
puts(buff);
}
putchar('\n');
free(buff);
buff = NULL;
size = 0;
fclose(fp[1]);
remove(fn[0]);
remove(fn[1]);
puts("/* END new.c output */");
return 0;
}

void squeeze(char *s1, const int c)
{
char *p;

for (p = s1; *s1 != '\0'; ++s1) {
if (c != *s1) {
*p++ = *s1;
}
}
*p = '\0';
}

int get_line(char **lineptr, size_t *n, FILE *stream)
{
int rc;
void *p;
size_t count;
/*
** The (char) casts in this function are not required
** by the rules of the C programming language.
*/
count = 0;
while ((rc = getc(stream)) != EOF
|| !feof(stream) && !ferror(stream))
{
++count;
if (count == (size_t)-2) {
if (rc != '\n') {
(*lineptr)[count] = '\0';
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
}
break;
}
if (count + 2 *n) {
p = realloc(*lineptr, count + 2);
if (p == NULL) {
if (*n count) {
if (rc != '\n') {
(*lineptr)[count] = '\0';
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
}
} else {
if (*n != 0) {
**lineptr = '\0';
}
ungetc(rc, stream);
}
count = 0;
break;
}
*lineptr = p;
*n = count + 2;
}
if (rc != '\n') {
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
break;
}
}
if (rc != EOF || !feof(stream) && !ferror(stream)) {
rc = INT_MAX count ? count : INT_MAX;
} else {
if (*n count) {
(*lineptr)[count] = '\0';
}
}
return rc;
}

int list_fputs(const list_type *node, FILE *stream)
{
int rc = 0;

while (node != NULL
&& (rc = fputs(node -data, stream)) != EOF
&& (rc = putc('\n', stream)) != EOF)
{
node = node -next;
}
return rc;
}

list_type *list_append
(list_type **head, list_type *tail, void *data, size_t size)
{
list_type *node;

node = malloc(sizeof *node);
if (node != NULL) {
node -next = NULL;
node -data = malloc(size);
if (node -data != NULL) {
memcpy(node -data, data, size);
if (*head != NULL) {
tail -next = node;
} else {
*head = node;
}
} else {
free(node);
node = NULL;
}
}
return node;
}

void list_free(list_type *node, void (*free_data)(void *))
{
list_type *next_node;

while (node != NULL) {
next_node = node -next;
free_data(node -data);
free(node);
node = next_node;
}
}

/* END new.c */

--
pete
Jun 27 '08 #11

P: n/a
pete <pf*****@mindspring.comfell face-first on the keyboard. This was
the result: news:oe******************************@earthlink.co m:
Dennis wrote:
>Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help

/* BEGIN new.c output */

Original original file contents:
"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

Final original file contents:
fo*@yahoo.com
je***@gmail.com
to***@apple.com

Final other file contents:
to*@hotmail.com

/* END new.c output */

/* BEGIN new.c */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

#define STRINGS \
{ "\"fo*@yahoo.com\"", "\"to*@hotmail.com\"", \
"\"je***@gmail.com\"", "\"to***@apple.com\""}

struct list_node {
struct list_node *next;
void *data;
};

typedef struct list_node list_type;

void squeeze(char *s1, const int s2);
int get_line(char **lineptr, size_t *n, FILE *stream);
int list_fputs(const list_type *node, FILE *stream);
list_type *list_append
(list_type **head, list_type *tail, void *data, size_t size);
void list_free(list_type *node, void (*free_data)(void *));

int main (void)
{
int rc;
size_t n;
char fn[2][L_tmpnam];
FILE *fp[2];
char *string[] = STRINGS;
size_t size = 0;
char *buff = NULL;
list_type *head = NULL;
list_type *tail = NULL;

puts("/* BEGIN new.c output */\n");
/*
** Create input file
*/
tmpnam(fn[0]);
tmpnam(fn[1]);
fp[0] = fopen(fn[0], "w");
if (fp[0] == NULL) {
fputs("fopen(fn[0]), \"w\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
for (n = 0; n != sizeof string / sizeof *string; ++n) {
fprintf(fp[0], "%s\n", string[n]);
}
fclose(fp[0]);
/*
** Read input file into list
*/
fp[0] = fopen(fn[0], "r");
if (fp[0] == NULL) {
fputs("fopen(fn[0], \"r\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
while ((rc = get_line(&buff, &size, fp[0])) 0) {
tail = list_append(&head, tail, buff, rc);
if (tail == NULL) {
fputs("tail == NULL\n", stderr);
break;
}
}
fclose(fp[0]);
/*
** Display input file contents
*/
puts("Original original file contents:");
list_fputs(head, stdout);
putchar('\n');
/*
** Strip out quotes from strings in memory
*/
for (tail = head; tail != NULL; tail = tail -next) {
squeeze(tail -data, '"');
}
/*
** Create output files
*/
fp[0] = fopen(fn[0], "w");
if (fp[0] == NULL) {
fputs("fopen(fn[0]), \"w\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
fp[1] = fopen(fn[1], "w");
if (fp[1] == NULL) {
remove(fn[0]);
fputs("fopen(fn[1]), \"w\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
for (tail = head; tail != NULL; tail = tail -next) {
if (strstr(tail -data, "hotmail") == NULL) {
fprintf(fp[0], "%s\n", tail -data);
} else {
fprintf(fp[1], "%s\n", tail -data);
}
}
list_free(head, free);
tail = head = NULL;
fclose(fp[0]);
fclose(fp[1]);
/*
** Read original file
** Display original file contents
*/
fp[0] = fopen(fn[0], "r");
if (fp[0] == NULL) {
fputs("fopen(fn[0], \"r\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
puts("Final original file contents:");
while ((rc = get_line(&buff, &size, fp[0])) 0) {
puts(buff);
}
putchar('\n');
fclose(fp[0]);
/*
** Read other file
** Display other file contents
*/
fp[1] = fopen(fn[1], "r");
if (fp[1] == NULL) {
fputs("fopen(fn[1], \"r\") == NULL\n", stderr);
exit(EXIT_FAILURE);
}
puts("Final other file contents:");
while ((rc = get_line(&buff, &size, fp[1])) 0) {
puts(buff);
}
putchar('\n');
free(buff);
buff = NULL;
size = 0;
fclose(fp[1]);
remove(fn[0]);
remove(fn[1]);
puts("/* END new.c output */");
return 0;
}

void squeeze(char *s1, const int c)
{
char *p;

for (p = s1; *s1 != '\0'; ++s1) {
if (c != *s1) {
*p++ = *s1;
}
}
*p = '\0';
}

int get_line(char **lineptr, size_t *n, FILE *stream)
{
int rc;
void *p;
size_t count;
/*
** The (char) casts in this function are not required
** by the rules of the C programming language.
*/
count = 0;
while ((rc = getc(stream)) != EOF
|| !feof(stream) && !ferror(stream))
{
++count;
if (count == (size_t)-2) {
if (rc != '\n') {
(*lineptr)[count] = '\0';
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
}
break;
}
if (count + 2 *n) {
p = realloc(*lineptr, count + 2);
if (p == NULL) {
if (*n count) {
if (rc != '\n') {
(*lineptr)[count] = '\0';
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
}
} else {
if (*n != 0) {
**lineptr = '\0';
}
ungetc(rc, stream);
}
count = 0;
break;
}
*lineptr = p;
*n = count + 2;
}
if (rc != '\n') {
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
break;
}
}
if (rc != EOF || !feof(stream) && !ferror(stream)) {
rc = INT_MAX count ? count : INT_MAX;
} else {
if (*n count) {
(*lineptr)[count] = '\0';
}
}
return rc;
}

int list_fputs(const list_type *node, FILE *stream)
{
int rc = 0;

while (node != NULL
&& (rc = fputs(node -data, stream)) != EOF
&& (rc = putc('\n', stream)) != EOF)
{
node = node -next;
}
return rc;
}

list_type *list_append
(list_type **head, list_type *tail, void *data, size_t size)
{
list_type *node;

node = malloc(sizeof *node);
if (node != NULL) {
node -next = NULL;
node -data = malloc(size);
if (node -data != NULL) {
memcpy(node -data, data, size);
if (*head != NULL) {
tail -next = node;
} else {
*head = node;
}
} else {
free(node);
node = NULL;
}
}
return node;
}

void list_free(list_type *node, void (*free_data)(void *))
{
list_type *next_node;

while (node != NULL) {
next_node = node -next;
free_data(node -data);
free(node);
node = next_node;
}
}

/* END new.c */
Wow - All that just to separate @hotmail.com from anything else ? I'm
glad I stuck with perl :)

--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
Jun 27 '08 #12

P: n/a
On Jun 19, 1:02 am, Tomás Ó hÉilidhe <t...@lavabit.comwrote:
On Jun 18, 8:33 pm, Dennis <dcho...@gmail.comwrote:
1. Strip out the " characters and just leave the email addresses on
each line.

char const *const original = "\"b...@hotmail.com\"";

char buf[50];

strcpy(buf,original);

buf[strlen(original) - 1] = 0;
That does not strip both of the " characters.
char const is confusing, and the second const is unnecessary.
fix:
const char *original = "\"...\"";
2. extract out the hotmail addresses and store it into another file.

Take the last 12 characters, make them all lowercase, and then compare
with "@hotmail.com".
@ does not belong to C's basic character set, so, that's not possible.

Jun 27 '08 #13

P: n/a
On Wed, 18 Jun 2008 12:33:45 -0700, Dennis wrote:
Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on each
line.
2. extract out the hotmail addresses and store it into another file. The
hotmail addresses in the original file would be deleted.
perl -nie 'if (/\@hotmail.com@$/) { s/"//g; print; }' text_file

HTH,
M4
Jun 27 '08 #14

P: n/a

"Marc Bissonnette" <dragnet\_@_/internalysis.comwrote in message
news:Xn*********************************@216.196.9 7.131...
pete <pf*****@mindspring.comfell face-first on the keyboard. This was
the result: news:oe******************************@earthlink.co m:
>Dennis wrote:
>>Hi, I have a text file that contents a list of email addresses like
this:
>/* BEGIN new.c output */
>><snip 250+ lines of C >
Wow - All that just to separate @hotmail.com from anything else ? I'm
glad I stuck with perl :)
I think pete just enjoys writing huge amounts of C code. Or showing off..

I thought my 50-line answer (posted to comp.lang.c only) might have been a
bit long because it didn't make clever use of scanf(), but at least it could
deal with /any number/ of email addresses from a file.

This code I /think/ only deals with the 4 email addresses in the OP's
example..

--
Bartc
Jun 27 '08 #15

P: n/a
On Jun 19, 12:13 pm, "Bartc" <b...@freeuk.comwrote:
"Marc Bissonnette" <dragnet\_@_/internalysis.comwrote in message

news:Xn*********************************@216.196.9 7.131...
pete <pfil...@mindspring.comfell face-first on the keyboard. This was
the result:news:oe******************************@earth link.com:
Dennis wrote:
Hi, I have a text file that contents a list of email addresses like
this:
/* BEGIN new.c output */
<snip 250+ lines of C >
Wow - All that just to separate @hotmail.com from anything else ? I'm
glad I stuck with perl :)

I think pete just enjoys writing huge amounts of C code. Or showing off..
Or using concrete functions he has written in the past to write
concrete programs.
<snip>

Jun 27 '08 #16

P: n/a

<vi******@gmail.comwrote in message
news:56**********************************@l64g2000 hse.googlegroups.com...
On Jun 19, 12:13 pm, "Bartc" <b...@freeuk.comwrote:
>"Marc Bissonnette" <dragnet\_@_/internalysis.comwrote in message

news:Xn*********************************@216.196. 97.131...
pete <pfil...@mindspring.comfell face-first on the keyboard. This was
the result:news:oe******************************@earth link.com:
>Dennis wrote:
Hi, I have a text file that contents a list of email addresses like
this:
/* BEGIN new.c output */
<snip 250+ lines of C >
Wow - All that just to separate @hotmail.com from anything else ? I'm
glad I stuck with perl :)

I think pete just enjoys writing huge amounts of C code. Or showing off..
Or using concrete functions he has written in the past to write
concrete programs.
I thought it was some sort of unwritten rule here that when posting code
solutions you tend not to import large elements of your own library.
Otherwise everyone would post their own different version of getline() and
so on.

And also there's the possibility, as seems to have happened here, of using
something inappropriate just because it's there. There's no reason at all to
use a linked list to read all the input into memory (and risking
out-of-memory or thrashing for large input).

(Although I suspect pete may have created this over-the-top solution on
purpose..)
concrete programs.
Which is more concrete, this code which has a memory requirement of N or
code using fixed memory?

--
Bartc
Jun 27 '08 #17

P: n/a
On Jun 19, 12:55 pm, "Bartc" <b...@freeuk.comwrote:
<vipps...@gmail.comwrote in message

news:56**********************************@l64g2000 hse.googlegroups.com...
On Jun 19, 12:13 pm, "Bartc" <b...@freeuk.comwrote:
"Marc Bissonnette" <dragnet\_@_/internalysis.comwrote in message
>news:Xn*********************************@216.196. 97.131...
pete <pfil...@mindspring.comfell face-first on the keyboard. This was
the result:news:oe******************************@earth link.com:
Dennis wrote:
Hi, I have a text file that contents a list of email addresses like
this:
/* BEGIN new.c output */
<snip 250+ lines of C >
Wow - All that just to separate @hotmail.com from anything else ? I'm
glad I stuck with perl :)
I think pete just enjoys writing huge amounts of C code. Or showing off..
Or using concrete functions he has written in the past to write
concrete programs.

I thought it was some sort of unwritten rule here that when posting code
solutions you tend not to import large elements of your own library.
Otherwise everyone would post their own different version of getline() and
so on.
There's no such rule
And also there's the possibility, as seems to have happened here, of using
something inappropriate just because it's there. There's no reason at all to
use a linked list to read all the input into memory (and risking
out-of-memory or thrashing for large input).
What do you mean thrasing? The code risks nothing as all the calls to
malloc, etc are checked.
(Although I suspect pete may have created this over-the-top solution on
purpose..)
Yes, presumably the purpose was to provide the newbie with a concrete
example
concrete programs.

Which is more concrete, this code which has a memory requirement of N or
code using fixed memory?
It doesn't matter as long as error checking is there.
Jun 27 '08 #18

P: n/a
vi******@gmail.com wrote:
On Jun 19, 12:55 pm, "Bartc" <b...@freeuk.comwrote:
><vipps...@gmail.comwrote in message
>>...There's no
reason at all to use a linked list to read all the input into memory
(and risking out-of-memory or thrashing for large input).
What do you mean thrasing? The code risks nothing as all the calls to
malloc, etc are checked.
I mean the slow-down that occurs when memory gets nearly full.
>Which is more concrete, this code which has a memory requirement of
N or code using fixed memory?
It doesn't matter as long as error checking is there.
No, "Sorry out of memory" is just as acceptable as "Task completed"!

--
bartc
Jun 27 '08 #19

P: n/a
On Jun 19, 2:29 pm, "Bartc" <b...@freeuk.comwrote:
vipps...@gmail.com wrote:
On Jun 19, 12:55 pm, "Bartc" <b...@freeuk.comwrote:
<vipps...@gmail.comwrote in message
...There's no
reason at all to use a linked list to read all the input into memory
(and risking out-of-memory or thrashing for large input).
What do you mean thrasing? The code risks nothing as all the calls to
malloc, etc are checked.

I mean the slow-down that occurs when memory gets nearly full.
While true this has nothing to do with C.
Which is more concrete, this code which has a memory requirement of
N or code using fixed memory?
It doesn't matter as long as error checking is there.

No, "Sorry out of memory" is just as acceptable as "Task completed"!
A concrete example of code is one that cannot "break", ie behave
unexpectedly.
Jun 27 '08 #20

P: n/a
On Jun 19, 6:41*am, vipps...@gmail.com wrote:
That does not strip both of the " characters

Wups, meant to write strcpy(buf,original+1);

char const is confusing, and the second const is unnecessary.
fix:
const char *original = "\"...\"";

"char const" is confusing? :-O

You're right that the second const is unnecessary, just like my
breakfast this morning was unnecessary. I

@ does not belong to C's basic character set, so, that's not possible.

I had a feeling it mightn't be.

One might argue that if you're dealing with strings that have an @
symbol in them on a particular platform, that the compiler for that
platform will have the @ character.

Jun 27 '08 #21

P: n/a
Bartc wrote:
"Marc Bissonnette" <dragnet\_@_/internalysis.comwrote in message
news:Xn*********************************@216.196.9 7.131...
>pete <pf*****@mindspring.comfell face-first on the keyboard. This was
the result: news:oe******************************@earthlink.co m:
>>Dennis wrote:
Hi, I have a text file that contents a list of email addresses like
this:
>>/* BEGIN new.c output */
>><snip 250+ lines of C >
>Wow - All that just to separate @hotmail.com from anything else ? I'm
glad I stuck with perl :)

I think pete just enjoys writing huge amounts of C code. Or showing off..
I can see why you might think that.
I thought my 50-line answer (posted to comp.lang.c only) might have been a
bit long because it didn't make clever use of scanf(), but at least it could
deal with /any number/ of email addresses from a file.

This code I /think/ only deals with the 4 email addresses in the OP's
example..
It deals with how many and whichever string literals
are placed into this macro:

#define STRINGS \
{ "\"fo*@yahoo.com\"", "\"to*@hotmail.com\"", \
"\"je***@gmail.com\"", "\"to***@apple.com\""}

The program uses the STRINGS macro to initialize the input file.

--
pete
Jun 27 '08 #22

P: n/a
Bartc wrote:
>
<vi******@gmail.comwrote in message
news:56**********************************@l64g2000 hse.googlegroups.com...
>On Jun 19, 12:13 pm, "Bartc" <b...@freeuk.comwrote:
>>"Marc Bissonnette" <dragnet\_@_/internalysis.comwrote in message

news:Xn*********************************@216.196 .97.131...

pete <pfil...@mindspring.comfell face-first on the keyboard.
This was the
result:news:oe******************************@eart hlink.com:

Dennis wrote:
Hi, I have a text file that contents a list of email addresses
like this:
/* BEGIN new.c output */
<snip 250+ lines of C >
Wow - All that just to separate @hotmail.com from anything else ?
I'm glad I stuck with perl :)

I think pete just enjoys writing huge amounts of C code. Or showing
off..
>Or using concrete functions he has written in the past to write
concrete programs.

I thought it was some sort of unwritten rule here that when posting
code solutions you tend not to import large elements of your own
library. Otherwise everyone would post their own different version of
getline() and so on.
As it is, everyone does post different versions of code for the same
task (as this thread itself has brilliantly illustrated), so as long as
the post contains all the code to compile into a working program in a
self-sufficient manner, I don't see any harm in including something
from a personal library.

And pete has pre-written functions to read files into linked-lists. He
often posts a link to his website containing this and other C code
occasionally here in clc.
And also there's the possibility, as seems to have happened here, of
using something inappropriate just because it's there. There's no
reason at all to use a linked list to read all the input into memory
(and risking out-of-memory or thrashing for large input).
Well reading a file into a linked-list isn't exactly inappropriate, but
it may be overkill for the small fragment that the OP posted. But it
could be that the OP's actual file contains hundreds or thousands of
email addresses. Constructing a linked-list will obviously take more
storage than a plain linear array, but it makes some tasks like sorting
lines, inserting lines, deleting lines, etc., much more easier. I
suspect that this is the reason why pete uses them.
(Although I suspect pete may have created this over-the-top solution
on purpose..)
Hmm.
>concrete programs.

Which is more concrete, this code which has a memory requirement of N
or code using fixed memory?
Either code could run out memory on a sufficiently memory starved
system. Besides the linked-list approach has other advantages (which
may not be very pertinent to the particular task the OP wanted) which
must be considered in a fair comparison.

Jun 27 '08 #23

P: n/a
On 19 Juni, 07:41, vipps...@gmail.com wrote:
On Jun 19, 1:02 am, Tomás Ó hÉilidhe <t...@lavabit.comwrote:On Jun 18, 8:33 pm, Dennis <dcho...@gmail.comwrote:
1. Strip out the " characters and just leave the email addresses on
each line.
char const *const original = "\"b...@hotmail.com\"";
char buf[50];
strcpy(buf,original);
buf[strlen(original) - 1] = 0;

That does not strip both of the " characters.
char const is confusing, and the second const is unnecessary.
fix:
const char *original = "\"...\"";
2. extract out thehotmailaddresses and store it into another file.
Take the last 12 characters, make them all lowercase, and then compare
with "@hotmail.com".

@ does not belong to C's basic character set, so, that's not possible.
What does this mean? That you can't use '@' in strings without relying
on
the particular implementation?
so much for portability
Jun 27 '08 #24

P: n/a
ev**********@hushmail.com wrote:
On 19 Juni, 07:41, vipps...@gmail.com wrote:
<snip>
>@ does not belong to C's basic character set, so, that's not
possible.

What does this mean? That you can't use '@' in strings without relying
on the particular implementation?
The relevant clause in the standard is 5.2.1(3). The extract that
follows is from n1256 which not the official standard but a working
draft.

-----------
Both the basic source and basic execution character sets shall have the
following members: the 26 uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = ? [ \ ] ^ _ { | } ~

the space character, and control characters representing horizontal tab,
vertical tab, and form feed. The representation of each member of the
source and execution basic character sets shall fit in a byte. In both
the source and execution basic character sets, the value of each
character after 0 in the above list of decimal digits shall be one
greater than the value of the previous. In source files, there shall be
some way of indicating the end of each line of text; this International
Standard treats such an end-of-line indicator as if it were a single
new-line character. In the basic execution character set, there shall
be control characters representing alert, backspace, carriage return,
and new line. If any other characters are encountered in a source file
(except in an identifier, a character constant, a string literal, a
header name, a comment, or a preprocessing token that is never
converted to a token), the behavior is undefined.
-----------
so much for portability
But in practise most implementations do support the @ character, at
least those that I'm aware of, which is a tiny fraction of all the
implementations out there, so you might disregard my "most"
comment. :-)

Jun 27 '08 #25

P: n/a
Bartc wrote:
>
"Dennis" <dc*****@gmail.comwrote in message
news:e4**********************************@m36g2000 hse.googlegroups.com...
>Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

You have perl solutions so you won't need this. But was an interesting
little snippet:

/* Sort email addresses (possibly for some nefarious purpose) from
file "input" */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void error(void) {puts("File error"); exit(0);}

int main(void) {
char line[200];
char *p;
int n;

FILE *in,*hot,*nothot;

in=fopen("input","r");
if (in==0) error();

hot=fopen("hotmail","w");
if (hot==0) {fclose(in); error();};

nothot=fopen("nothotmail","w");
if (nothot==0) {fclose(in); fclose(nothot); error();};

while (1) {

fgets(line,sizeof(line),in);
if (feof(in)) break;
Fgets could fail due to an I/O error too, not necessarily end-of-file.
You need to check ferror() too before proceeding to be absolutely safe,
since you don't check fgets for an EOF return. The later strategy
involves only one check unless EOF is returned, but your strategy would
involve two checks (perhaps full function calls) after every fgets
call.
n=strlen(line);
Better to make 'n' unsigned long.
p=line;
if (line[n-1]='\n') {line[n-1]=0; --n;};
if (n) {
if (line[n-1]='""') {line[n-1]=0; --n;};
if (*p=='"') ++p;
if (strstr(p,"@hotmail.com"))
fprintf(hot,"%s\n",p);
else
fprintf(nothot,"%s\n",p);
};
};
fclose(in);
fclose(hot);
fclose(nothot);
}
Jun 27 '08 #26

P: n/a
Bartc wrote:
>
"Dennis" <dc*****@gmail.comwrote in message
news:e4**********************************@m36g2000 hse.googlegroups.com...
>Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

You have perl solutions so you won't need this. But was an interesting
little snippet:

/* Sort email addresses (possibly for some nefarious purpose) from
file "input" */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void error(void) {puts("File error"); exit(0);}

int main(void) {
char line[200];
char *p;
int n;

FILE *in,*hot,*nothot;

in=fopen("input","r");
if (in==0) error();

hot=fopen("hotmail","w");
if (hot==0) {fclose(in); error();};

nothot=fopen("nothotmail","w");
if (nothot==0) {fclose(in); fclose(nothot); error();};

while (1) {

fgets(line,sizeof(line),in);
if (feof(in)) break;

n=strlen(line);
p=line;
if (line[n-1]='\n') {line[n-1]=0; --n;};
if (n) {
if (line[n-1]='""') {line[n-1]=0; --n;};
Also what do you mean by '""' here? Did you mean to write '"'?

<snip rest>

Jun 27 '08 #27

P: n/a
Bartc wrote:
>
"Dennis" <dc*****@gmail.comwrote in message
news:e4**********************************@m36g2000 hse.googlegroups.com...
>Hi, I have a text file that contents a list of email addresses like
this:

"fo*@yahoo.com"
"to*@hotmail.com"
"je***@gmail.com"
"to***@apple.com"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

You have perl solutions so you won't need this. But was an interesting
little snippet:

/* Sort email addresses (possibly for some nefarious purpose) from
file "input" */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void error(void) {puts("File error"); exit(0);}

int main(void) {
char line[200];
Also according to the relevant RFCs (2821 & 2822), just the domain part
of an email address can be up to 255 characters. To be safe you might
want to make line at least 512 bytes.

<snip rest>

Jun 27 '08 #28

P: n/a
ev**********@hushmail.com said:
On 19 Juni, 07:41, vipps...@gmail.com wrote:
<snip>
>@ does not belong to C's basic character set, so, that's not possible.

What does this mean? That you can't use '@' in strings without relying
on the particular implementation?
so much for portability
The portable solution is to read everything in from a file, including the
string you're comparing against that includes the '@' character. If the
system supports '@', it's going to be in the execution character set, and
if it doesn't, there's not a lot of point trying to use it anyway, is
there?

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Jun 27 '08 #29

P: n/a
On 19 Juni, 17:09, santosh <santosh....@gmail.comwrote:
evanevank...@hushmail.com wrote:
On 19 Juni, 07:41, vipps...@gmail.com wrote:

<snip>
@ does not belong to C's basic character set, so, that's not
possible.
What does this mean? That you can't use '@' in strings without relying
on the particular implementation?

The relevant clause in the standard is 5.2.1(3). The extract that
follows is from n1256 which not the official standard but a working
draft.

-----------
[snip]
-----------
so much for portability

But in practise most implementations do support the @ character, at
least those that I'm aware of, which is a tiny fraction of all the
implementations out there, so you might disregard my "most"
comment. :-)
Thanks for the reply.

But what would happen if @ is not supported? Is it only in the source
code as character constants it is not supported?
I guess reading text from a file which contains @ works since
it is just bytes. Does toupper(), isdigit() and similar functions
just ignore unsupported characters?

And how much can be assumed to be supported, where does one draw
the line? For example @ is probably supported while a japanese
character is not, right?
Jun 27 '08 #30

P: n/a
ev**********@hushmail.com wrote:
On 19 Juni, 17:09, santosh <santosh....@gmail.comwrote:
>evanevank...@hushmail.com wrote:
On 19 Juni, 07:41, vipps...@gmail.com wrote:

<snip>
>@ does not belong to C's basic character set, so, that's not
possible.
What does this mean? That you can't use '@' in strings without
relying on the particular implementation?

The relevant clause in the standard is 5.2.1(3). The extract that
follows is from n1256 which not the official standard but a working
draft.

-----------
[snip]
-----------
so much for portability

But in practise most implementations do support the @ character, at
least those that I'm aware of, which is a tiny fraction of all the
implementations out there, so you might disregard my "most"
comment. :-)

Thanks for the reply.

But what would happen if @ is not supported? Is it only in the source
code as character constants it is not supported?
It could be either one or both.
I guess reading text from a file which contains @ works since
it is just bytes.
Does toupper(), isdigit() and similar functions
just ignore unsupported characters?
The return false.
And how much can be assumed to be supported, where does one draw
the line? For example @ is probably supported while a japanese
character is not, right?
It depends on the implementation. Some implementations have locales for
non-latin environments. You need to read the documentation for your
implementation and set the appropriate locale after program start-up.

Jun 27 '08 #31

P: n/a
On Thu, 19 Jun 2008 08:38:40 +0200, Martijn Lievaart wrote:
perl -nie 'if (/\@hotmail.com@$/) { s/"//g; print; }' text_file
Or even:

perl -nie 's/"//g; print if /\@hotmail.com@$/' text_file

M4
Jun 27 '08 #32

P: n/a

"santosh" <sa*********@gmail.comwrote in message
news:g3**********@registered.motzarella.org...
Bartc wrote:
>"Dennis" <dc*****@gmail.comwrote in message
>>"fo*@yahoo.com"
I like to
1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
> fgets(line,sizeof(line),in);
if (feof(in)) break;

Fgets could fail due to an I/O error too, not necessarily end-of-file.
OK. What happens to the buffer in that case, would it be an empty string?
And would feof() ever become true?
> if (line[n-1]='""') {line[n-1]=0; --n;};

Also what do you mean by '""' here? Did you mean to write '"'?
I can't actually see clearly, but yes it should be a single " inside single
quotes. Although two double quotes surprisingly still works; I would have
expected in this case to compare a char widened to whatever size '""' was,
not for the '""' to be narrowed to char.
>char line[200];
Also according to the relevant RFCs (2821 & 2822), just the domain part
of an email address can be up to 255 characters. To be safe you might
want to make line at least 512 bytes.
OK I was guessing that. Although someone with a 511-char email won't be very
popular with his friends. And he wouldn't be getting any spam via this
program either...

--
Bartc

Jun 27 '08 #33

P: n/a
Bartc wrote:
>
"santosh" <sa*********@gmail.comwrote in message
news:g3**********@registered.motzarella.org...
>Bartc wrote:
>>"Dennis" <dc*****@gmail.comwrote in message
>>>"fo*@yahoo.com"
I like to
1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another
file.
>> fgets(line,sizeof(line),in);
if (feof(in)) break;

Fgets could fail due to an I/O error too, not necessarily
end-of-file.

OK. What happens to the buffer in that case, would it be an empty
string?
When a read error occurs (i.e., fgets returns NULL and ferror is true)
then the array contents are indeterminate. In the case of end-of-file
(i.e., fgets returns NULL and feof is true) where no characters were
read, the array contents are left unchanged.
And would feof() ever become true?
Yes, when end-of-file is encountered.

<snip>

Jun 27 '08 #34

P: n/a
vi******@gmail.com wrote:
Tomás Ó hÉilidhe <t...@lavabit.comwrote:
.... snip ...
>
>Take the last 12 characters, make them all lowercase, and then
compare with "@hotmail.com".

@ does not belong to C's basic character set, so, that's not
possible.
No problem. getc etc. return integer equivalents of unsigned
char. There are lots of chars in any char set that do not have
lowercase versions, and the case conversion routines will simply
return the original char.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #35

P: n/a
Martijn Lievaart schreef:
On Thu, 19 Jun 2008 08:38:40 +0200, Martijn Lievaart wrote:
>perl -nie 'if (/\@hotmail.com@$/) { s/"//g; print; }' text_file

Or even:

perl -nie 's/"//g; print if /\@hotmail.com@$/' text_file
Don't you mean this?

perl -ne 's/"//g; print if /\@hotmail\./' text_file

--
Affijn, Ruud

"Gewoon is een tijger."
Jun 27 '08 #36

P: n/a
On Fri, 20 Jun 2008 11:32:27 +0200, Dr.Ruud wrote:
Martijn Lievaart schreef:
>On Thu, 19 Jun 2008 08:38:40 +0200, Martijn Lievaart wrote:
>>perl -nie 'if (/\@hotmail.com@$/) { s/"//g; print; }' text_file

Or even:

perl -nie 's/"//g; print if /\@hotmail.com@$/' text_file

Don't you mean this?

perl -ne 's/"//g; print if /\@hotmail\./' text_file
I think I ment this:

perl -ni -e 's/"//g; print if /\@hotmail.com@$/' text_file

(-i makes a backup, -ie probably takes the 'e' as the backup suffix.)

M4
Jun 27 '08 #37

P: n/a
szr
Martijn Lievaart wrote:
On Fri, 20 Jun 2008 11:32:27 +0200, Dr.Ruud wrote:
>Martijn Lievaart schreef:
>>On Thu, 19 Jun 2008 08:38:40 +0200, Martijn Lievaart wrote:
>>>perl -nie 'if (/\@hotmail.com@$/) { s/"//g; print; }' text_file

Or even:

perl -nie 's/"//g; print if /\@hotmail.com@$/' text_file

Don't you mean this?

perl -ne 's/"//g; print if /\@hotmail\./' text_file

I think I ment this:

perl -ni -e 's/"//g; print if /\@hotmail.com@$/' text_file

(-i makes a backup, -ie probably takes the 'e' as the backup suffix.)
Maybe I'm missing something, but I don't understand why you have an @
near the end of your regex just before the $ ? I can't find any mention
of it in perldoc or my Perl Pocket Refererence, but it's possible I'm
missing something.

Thanks.

--
szr
Jun 27 '08 #38

P: n/a
On Fri, 20 Jun 2008 13:59:05 -0700, szr wrote:
Maybe I'm missing something, but I don't understand why you have an @
near the end of your regex just before the $ ? I can't find any mention
of it in perldoc or my Perl Pocket Refererence, but it's possible I'm
missing something.
Your missing nothing. I'm blind.

M4
Jun 27 '08 #39

P: n/a
Tomás Ó hÉilidhe wrote:
On Jun 18, 8:33 pm, Dennis <dcho...@gmail.comwrote:
>1. Strip out the " characters and just leave the email addresses on
each line.

char const *const original = "\"bo*@hotmail.com\"";
Please learn to look at the groups to which an article is posted and
strip out those for which your reply is not relevant. In this case
comp.lang.perl.misc.

--

Henry Law Manchester, England
Jun 27 '08 #40

P: n/a
ev**********@hushmail.com writes:
On 19 Juni, 07:41, vipps...@gmail.com wrote:
[...]
@ does not belong to C's basic character set, so, that's not possible.

What does this mean? That you can't use '@' in strings without
relying on the particular implementation? so much for portability
It means that it's possible to have a conforming C implementation on a
system where the '@' character is not supported. Likewise for '$' and
'`' (backtick); those happen to be the only three ASCII printable
characters that the C standard doesn't require.

As it happens, the vast majority of character sets in current use are
based on ASCII, and most non-ASCII systems with C implementations use
some variant of EBCDIC. And, as it happens, both ASCII and EBCDIC can
represent '@', '$', and '`'. So, although the C standard doesn't
*require* all systems to support these characters, you're not likely
to run across an implementation that doesn't support them.

(IMHO it wouldn't hurt for a future revision of the C standard to
require support for these three characters, even if they're only
usable in character constants, string literals, comments, and a few
other similar contexts.)

So vippstar's statement that "that's not possible" is a bit
over-stated. You can't use the '@' character in completely 100%
theoretically portable strictly conforming C code. But you can almost
certainly use it safely if you don't mind the *theoretical* loss of
portability that's unlikely to be an issue in real life.

Unless (a) there's some system out there that I've never heard of that
doesn't support the '@' character, and (b) you want to search for
hotmail.com addresses on such a system.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 2 '08 #41

P: n/a
In article <lz************@stalkings.ghoti.net>,
Keith Thompson <ks***@mib.orgwrote:
>As it happens, the vast majority of character sets in current use are
based on ASCII, and most non-ASCII systems with C implementations use
some variant of EBCDIC. And, as it happens, both ASCII and EBCDIC can
represent '@', '$', and '`'.
'@' is apparently in the invariant code-sets (according
to wikipedia) but '$' and '`' are not, at least according to the
shading of the table (which might be wrong).
http://en.wikipedia.org/wiki/EBCDIC

I've looked through some of the IBM EBCDIC code pages but did not
happen upon any that were missing the '`' (though it was not in
the same place in all of the ones I looked at.) The EBCDIC code
page list, in case someone is sufficiently bored to look, is at
http://www-306.ibm.com/software/glob..._es.jsp#EBCDIC
--
"What we have to do is to be forever curiously testing new
opinions and courting new impressions." -- Walter Pater
Jul 2 '08 #42

P: n/a
On 2 Jul 2008 at 19:00, Keith Thompson wrote:
You can't use the '@' character in completely 100% theoretically
portable strictly conforming C code. But you can almost certainly use
it safely if you don't mind the *theoretical* loss of portability
that's unlikely to be an issue in real life.
Worrying about real life! Whatever next? I sense that KT is almost one
of us.

Jul 2 '08 #43

P: n/a
In article <sl*******************@nospam.invalid>,
Antoninus Twink <no****@nospam.invalidwrote:
>On 2 Jul 2008 at 19:00, Keith Thompson wrote:
>You can't use the '@' character in completely 100% theoretically
portable strictly conforming C code. But you can almost certainly use
it safely if you don't mind the *theoretical* loss of portability
that's unlikely to be an issue in real life.

Worrying about real life! Whatever next? I sense that KT is almost one
of us.
I *was* absolutely shocked to see KT mention real-world considerations.

What's going on here? Something we should know about? Death in the
family? What?

Jul 2 '08 #44

P: n/a
ro******@ibd.nrc-cnrc.gc.ca (Walter Roberson) writes:
In article <lz************@stalkings.ghoti.net>,
Keith Thompson <ks***@mib.orgwrote:
>>As it happens, the vast majority of character sets in current use are
based on ASCII, and most non-ASCII systems with C implementations use
some variant of EBCDIC. And, as it happens, both ASCII and EBCDIC can
represent '@', '$', and '`'.

'@' is apparently in the invariant code-sets (according
to wikipedia) but '$' and '`' are not, at least according to the
shading of the table (which might be wrong).
http://en.wikipedia.org/wiki/EBCDIC

I've looked through some of the IBM EBCDIC code pages but did not
happen upon any that were missing the '`' (though it was not in
the same place in all of the ones I looked at.) The EBCDIC code
page list, in case someone is sufficiently bored to look, is at
http://www-306.ibm.com/software/glob..._es.jsp#EBCDIC
I took a cursory look at the EBCDIC table in the Wikipedia article and
saw all three characters. I didn't consider variations or invariant
code-sets, and I'm no expert on EBCDIC, so feel free to take what I
wrote with a grain of salt.

[This is partly a test of a new news server; I had problems with
aioe.org, so I'm trying motzarella.org.]

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jul 3 '08 #45

P: n/a
Walter Roberson wrote:
Keith Thompson <ks***@mib.orgwrote:
>As it happens, the vast majority of character sets in current
use are based on ASCII, and most non-ASCII systems with C
implementations use some variant of EBCDIC. And, as it happens,
both ASCII and EBCDIC can represent '@', '$', and '`'.

'@' is apparently in the invariant code-sets (according to
wikipedia) but '$' and '`' are not, at least according to the
shading of the table (which might be wrong).
http://en.wikipedia.org/wiki/EBCDIC
I vaguely remember reading some standard (about 30 years or more
ago) that mentioned the '$' code as being "monetary symbol". I
don't recall what, if anything, it said about '@' and '`'.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Jul 3 '08 #46

This discussion thread is closed

Replies have been disabled for this discussion.