473,224 Members | 1,636 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,224 software developers and data experts.

Finding Peoples' Names in Files

Crazy question, but has anyone attempted this or seen Python code that
does? For example, if a text file contained 'Guido' and or 'Robert' and
or 'Susan', then we should return True, otherwise return False.
Oct 11 '07 #1
11 1479
On Oct 11, 5:22 pm, brad <byte8b...@gmail.comwrote:
Crazy question, but has anyone attempted this or seen Python code that
does? For example, if a text file contained 'Guido' and or 'Robert' and
or 'Susan', then we should return True, otherwise return False.
Can't you just use the string function .findall() ?

Oct 11 '07 #2
On 11/10/2007, brad <by*******@gmail.comwrote:
Crazy question, but has anyone attempted this or seen Python code that
does? For example, if a text file contained 'Guido' and or 'Robert' and
or 'Susan', then we should return True, otherwise return False.
--
http://mail.python.org/mailman/listinfo/python-list

Text = open(fname).read()

def a_function():
for Name in ['Guido', Robert',Susan']:
if Name in Text:
return 1

if a_function():
print "A name was found"

:)
Oct 11 '07 #3
co*********@gmail.com wrote:
On Oct 11, 5:22 pm, brad <byte8b...@gmail.comwrote:
>Crazy question, but has anyone attempted this or seen Python code that
does? For example, if a text file contained 'Guido' and or 'Robert' and
or 'Susan', then we should return True, otherwise return False.

Can't you just use the string function .findall() ?
I mean *any* possible person's name... I don't *know* the names
beforehand :)
Oct 11 '07 #4
On 10/11/07, brad <by*******@gmail.comwrote:
co*********@gmail.com wrote:
On Oct 11, 5:22 pm, brad <byte8b...@gmail.comwrote:
Crazy question, but has anyone attempted this or seen Python code that
does? For example, if a text file contained 'Guido' and or 'Robert' and
or 'Susan', then we should return True, otherwise return False.
Can't you just use the string function .findall() ?

I mean *any* possible person's name... I don't *know* the names
beforehand :)

"I cannot combine some characters

dhcmrlchtdj

which the divine Library has not foreseen and which in one of
its secret tongues do not contain a terrible meaning. No one can
articulate a syllable which is not filled with tenderness and fear,
which is not, in one of these languages, the powerful name of a god."

Jorge Luis Borges, The Library of Babel
Oct 11 '07 #5
co*********@gmail.com wrote:
However...how can you know it is a name...
OK, I admitted in my first post that it was a crazy question, but if one
could find an answer, one would be onto something. Maybe it's not a 100%
answerable question, but I would guess that it is an 80% answerable
question... I just don't know how... yet :)

Besides admitting that it's a crazy question, I should stop and explain
how it would be useful to me at least. Is a credit card number itself
valuable? I would think not. One can easily re and luhn check for credit
card numbers located in files with a great degree of accuracy, but a
number without a name is not very useful to me. So, if one could
associate names to luhn checked numbers automatically, then one would be
onto something. Or at least say, "hey, this file has luhn validated CCs
*AND* it seems to have people's names in it as well." Now then, I'd have
less to review or perhaps as much as I have now, but I could push the
files with numbers and names to the top of the list so that they would
be reviewed first.

Brad

Oct 11 '07 #6
On Oct 11, 9:11 am, brad <byte8b...@gmail.comwrote:
cokofree...@gmail.com wrote:
However...how can you know it is a name...

OK, I admitted in my first post that it was a crazy question, but if one
could find an answer, one would be onto something. Maybe it's not a 100%
answerable question, but I would guess that it is an 80% answerable
question... I just don't know how... yet :)

Besides admitting that it's a crazy question, I should stop and explain
how it would be useful to me at least. Is a credit card number itself
valuable? I would think not. One can easily re and luhn check for credit
card numbers located in files with a great degree of accuracy, but a
number without a name is not very useful to me. So, if one could
associate names to luhn checked numbers automatically, then one would be
onto something. Or at least say, "hey, this file has luhn validated CCs
*AND* it seems to have people's names in it as well." Now then, I'd have
less to review or perhaps as much as I have now, but I could push the
files with numbers and names to the top of the list so that they would
be reviewed first.

Brad
What the hell are you doing? Your post sounds to me like you have a
huge amount of stolen, or at the very least misapprehended, data. Now
you want to search it for credit card numbers and names so that you
can use them.

I am not cool with this! This is a public forum about a programming
language. What makes you think that anybody in this forum will be cool
with that. Perhaps you aren't doing anything illegal, but it sure is
coming off that way. If you are doing something illegal I hope you get
caught.

At the very least, you might want to clarify why you are looking for
such capability so that you don't get effectively black-listed (well,
by me at least).

Matt

Oct 11 '07 #7
On Thu, 11 Oct 2007 11:22:50 -0400, brad wrote:
Crazy question, but has anyone attempted this or seen Python code that
does? For example, if a text file contained 'Guido' and or 'Robert' and
or 'Susan', then we should return True, otherwise return False.
It'll be hard to handle the Dweezil's and Moon Unit's of the world (I
believe these are Frank Zappa's kids?), but you could compile a list of
reasonably common names by gaining access to a usenet news spool, and
pulling the names from the headers.

But then this is starting to sound dangerously like a spam campaign - in
which case, "Please don't!".
Oct 11 '07 #8
On Oct 11, 12:49 pm, Matimus <mccre...@gmail.comwrote:
On Oct 11, 9:11 am, brad <byte8b...@gmail.comwrote:
cokofree...@gmail.com wrote:
However...how can you know it is a name...
OK, I admitted in my first post that it was a crazy question, but if one
could find an answer, one would be onto something. Maybe it's not a 100%
answerable question, but I would guess that it is an 80% answerable
question... I just don't know how... yet :)
Besides admitting that it's a crazy question, I should stop and explain
how it would be useful to me at least. Is a credit card number itself
valuable? I would think not. One can easily re and luhn check for credit
card numbers located in files with a great degree of accuracy, but a
number without a name is not very useful to me. So, if one could
associate names to luhn checked numbers automatically, then one would be
onto something. Or at least say, "hey, this file has luhn validated CCs
*AND* it seems to have people's names in it as well." Now then, I'd have
less to review or perhaps as much as I have now, but I could push the
files with numbers and names to the top of the list so that they would
be reviewed first.
Brad

What the hell are you doing? Your post sounds to me like you have a
huge amount of stolen, or at the very least misapprehended, data. Now
you want to search it for credit card numbers and names so that you
can use them.

I am not cool with this! This is a public forum about a programming
language. What makes you think that anybody in this forum will be cool
with that. Perhaps you aren't doing anything illegal, but it sure is
coming off that way. If you are doing something illegal I hope you get
caught.

At the very least, you might want to clarify why you are looking for
such capability so that you don't get effectively black-listed (well,
by me at least).

Matt
Go have a beer and calm down a bit :) It's a legitimate purpose,
although it could (and probably is being used by bad guys right now).
My intent, as you can see from the links below, is to catch it before
the bad guys do.

http://filebox.vt.edu/users/rtilley/public/find_ccns/
http://filebox.vt.edu/users/rtilley/public/find_ssns/

Brad

Oct 11 '07 #9
brad <by*******@gmail.comwrites:
Crazy question, but has anyone attempted this or seen Python code that
does? For example, if a text file contained 'Guido' and or 'Robert'
and or 'Susan', then we should return True, otherwise return False.
A few ideas:

1. If you don't have a list of names, find a list of words that
doesn't contain proper nouns (there are a few word lists out there,
not sure if any exclude people's names, though). Look for short runs
of two or three "words" (punctuation-separated tokens) in the email
that aren't in the dictionary. Some of them will be people's names.

2. Send the text through Google translate and look for runs of words
that are unchanged. Some of them will be people's names.

3. Search the literature and look for fancy algorithms. Here are some
papers (the last mentions some commercial software to do this):

http://citeseer.ist.psu.edu/bikel99algorithm.html

http://citeseer.ist.psu.edu/618945.html

http://arxiv.org/html/cmp-lg/9706017
John
Oct 11 '07 #10
On 10/11/07, by*******@gmail.com <by*******@gmail.comwrote:
On Oct 11, 12:49 pm, Matimus <mccre...@gmail.comwrote:
On Oct 11, 9:11 am, brad <byte8b...@gmail.comwrote:
cokofree...@gmail.com wrote:
However...how can you know it is a name...
OK, I admitted in my first post that it was a crazy question, but if one
could find an answer, one would be onto something. Maybe it's not a 100%
answerable question, but I would guess that it is an 80% answerable
question... I just don't know how... yet :)
Besides admitting that it's a crazy question, I should stop and explain
how it would be useful to me at least. Is a credit card number itself
valuable? I would think not. One can easily re and luhn check for credit
card numbers located in files with a great degree of accuracy, but a
number without a name is not very useful to me. So, if one could
associate names to luhn checked numbers automatically, then one would be
onto something. Or at least say, "hey, this file has luhn validated CCs
*AND* it seems to have people's names in it as well." Now then, I'd have
less to review or perhaps as much as I have now, but I could push the
files with numbers and names to the top of the list so that they would
be reviewed first.
Brad
What the hell are you doing? Your post sounds to me like you have a
huge amount of stolen, or at the very least misapprehended, data. Now
you want to search it for credit card numbers and names so that you
can use them.

I am not cool with this! This is a public forum about a programming
language. What makes you think that anybody in this forum will be cool
with that. Perhaps you aren't doing anything illegal, but it sure is
coming off that way. If you are doing something illegal I hope you get
caught.

At the very least, you might want to clarify why you are looking for
such capability so that you don't get effectively black-listed (well,
by me at least).

Matt

Go have a beer and calm down a bit :) It's a legitimate purpose,
although it could (and probably is being used by bad guys right now).
My intent, as you can see from the links below, is to catch it before
the bad guys do.

http://filebox.vt.edu/users/rtilley/public/find_ccns/
http://filebox.vt.edu/users/rtilley/public/find_ssns/

Brad
In case you're doing this for PCI validation, be aware that just the
CC number is considered sensitive and you'd get some false negatives
if you filter on anything except that.

Random strings that match CC checksums are really quite rare and false
positives from that alone are unlikely to be a problem. Unless I
deployed this and there was a significant false positive rate I
wouldn't risk the false negatives, personally.
Oct 11 '07 #11
Chris Mellon wrote:
In case you're doing this for PCI validation, be aware that just the
CC number is considered sensitive and you'd get some false negatives
if you filter on anything except that.

Random strings that match CC checksums are really quite rare and false
positives from that alone are unlikely to be a problem. Unless I
deployed this and there was a significant false positive rate I
wouldn't risk the false negatives, personally.
Yes, it is for PCI. Our rate of false positives is low, very low. I
wasn't aware that a number alone was a PCI violation. Thank you! On
another note, we're a university (Virginia Tech) and we're subject to
FERPA, HIPPA, GLBA, etc... in addition to PCI. So we do these checks for
U.S. Social Security Numbers too in an effort to prevent or lessen the
chance of ID theft. Unfortunately, there is no luhn check for SSNs. We
follow the Social Security Administration verification guideline
religiously... here's an web front-end to my logic:

http://black.cirt.vt.edu/public/valid_ssn/index.html

but still have many false positives on SSNs, so being able to id *names
and numbers* in files would still be a be benefit to us.

Brad
Oct 11 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Mr. B | last post by:
I'm writing an app where I'm trying to look for and List all specific file 'types' found. So I point to a specific start top level Folder... and I want to drill down through ALL sub folders to...
4
by: Ali Abin Makesh | last post by:
Steve Ballmer calls Linux Advocates waging 'jihad' Jihad is Holy War and is most sacred. Ballmer makes insult to all our peoples.
2
by: Ben Fidge | last post by:
Hi I'm trying (and failing) to create and regular expression for parsing peoples names. I simply want to make sure that a firstname and surname, at least, can be extracted. For example, "Ben...
2
by: Elhurzen | last post by:
X-No-Archive: yes I have an XML file with hundreds of <resource> elements: <resource name="val"> ... </resource> There are now tens of files (written in a mix of C, C++ and C#!) in another...
13
by: athiane | last post by:
I want a way to parse out all function names that appear in a couple of C files. When the parsing logic finds a function name in a file, it should print out the Function name, line number and file...
2
by: Bob Johnson | last post by:
Using C#/2.0 I'm writing a small "data translator" utility app that reads data out of a MS Access database and inserts it into a SQL Server database. The source db lists a bunch of names of people...
7
by: gtb | last post by:
I wish to copy the highest version number of a file from directory \ \ \fileserver\D:\scripts to C:\scripts where the file names are of the form filename_MM.NN.SS.zip, where MM, NN, and SS...
5
by: mohi | last post by:
hello everyone i m positing this again but can't help as im not finding any solution to this . my problem is i have to browse a directory to search for all the files in it and process certain...
0
by: veera ravala | last post by:
ServiceNow is a powerful cloud-based platform that offers a wide range of services to help organizations manage their workflows, operations, and IT services more efficiently. At its core, ServiceNow...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
2
by: jimatqsi | last post by:
The boss wants the word "CONFIDENTIAL" overlaying certain reports. He wants it large, slanted across the page, on every page, very light gray, outlined letters, not block letters. I thought Word Art...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.