473,385 Members | 1,942 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Matching filenames with typos

Hello,

I'm working on a script that can place results of soccergames from different
seasons in a row, to see the history of the game.
I've gattered a lot of scores from different websites on a FreeBSD
webserver. The scores are all placed in a directory with the season as name,
and the names of the team as the filename.
So for example results of the game 'AC Milan - Ajax' are in different files
for different seasons:

../0405/AC Milan - Ajax.txt
../0304/AC Milan - Ajax.txt
../0203/AC Milan - Ajax.txt
(team names seperated with '-')

My script creates an HTML-page with an overview of the results of al
seasons.
The problem is that I gathered the names of the teams for the results from
different websites, and some websites will use 'AC Milan', others just
'Milan'
Some websites use the name 'Ajax', others 'Ajax FC', others 'Ajax
Amsterdam'.
Since I gathered results of hundreds of teams, in tenthousands of results,
renaming all the files is not an option.
Is there a way to improve the matching of these files, with the knowledge
that:

- two or three character strings can be left out (like FC, Utd.)
- make a match when, for example, two out of three names in the filename
match
(like: the game 'name1 name2 - name3' matches both 'name1 - name 3', and
'name2 - name3')

I hope i could make my question clear, and someone can help me.

Thanks!
Dec 4 '06 #1
1 2436
In article <45*********************@news.xs4all.nl>, Peter v.d. Berger
<pv*******@xs4all.nlwrote:
Hello,

I'm working on a script that can place results of soccergames from different
seasons in a row, to see the history of the game.
I've gattered a lot of scores from different websites on a FreeBSD
webserver. The scores are all placed in a directory with the season as name,
and the names of the team as the filename.
So for example results of the game 'AC Milan - Ajax' are in different files
for different seasons:

./0405/AC Milan - Ajax.txt
./0304/AC Milan - Ajax.txt
./0203/AC Milan - Ajax.txt
(team names seperated with '-')

My script creates an HTML-page with an overview of the results of al
seasons.
The problem is that I gathered the names of the teams for the results from
different websites, and some websites will use 'AC Milan', others just
'Milan'
Some websites use the name 'Ajax', others 'Ajax FC', others 'Ajax
Amsterdam'.
Since I gathered results of hundreds of teams, in tenthousands of results,
renaming all the files is not an option.
Is there a way to improve the matching of these files, with the knowledge
that:

- two or three character strings can be left out (like FC, Utd.)
- make a match when, for example, two out of three names in the filename
match
(like: the game 'name1 name2 - name3' matches both 'name1 - name 3', and
'name2 - name3')

I hope i could make my question clear, and someone can help me.
Create an array of unique team names and use a regular expression to
test if each name occurs in the file name. Generate a new name that
contains the two team names and either use that name as a key or rename
the old file to the new name. Example (untested):

my $name = 'AC Milan - Ajax FC';
my @teams = qw( Ajax Milan );

my $newname = '';
for my $team ( @teams ) {
if( $name =~ /$team/i ) {
$newname .= $team;
}
}
print "New name is '$newname'\n";

should produce

New name is 'AjaxMilan'

FYI: this newsgroup is defunct. Try comp.lang.perl.misc in the future.

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Dec 5 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Martin Lucas-Smith | last post by:
Is there some way of using ereg to detect when certain filename extensions are supplied and to return false if so, WITHOUT using the ! operator before ereg () ? I have an API that allows as an...
17
by: Andrew McLean | last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success. Basically, I have two databases containing lists of...
8
by: Synonymous | last post by:
Hello, Can regular expressions compare file names to one another. It seems RE can only compare with input i give it, while I want it to compare amongst itself and give me matches if the first x...
7
by: Matt L. | last post by:
In summary, I don't know how/why the following code works and would like to know. I'm trying to match the first 3 characters of a variable (filename). The code below crudely works, but only if I...
2
by: Joecx | last post by:
Hi If I want to copy files using a pattern like: I want all files on a directory that start with 20050822 to be copied to a different directory. I can't get file.copy or copyfile to accept *.*...
2
by: rbutch | last post by:
guys, i need a little help with this. this is working (well sort of) i get the info, but it's not moving to a new line as it iterates thru the array and all of the fields are like ONE HUGE LONG...
3
by: William Manley | last post by:
I was wondering if it is possible to match string1 to x% of string2 using a re. I'm not sure how I would implement this exactly if it can be done, but what im ultimately looking at is it being able...
11
by: tech | last post by:
Hi, I need a function to specify a match pattern including using wildcard characters as below to find chars in a std::string. The match pattern can contain the wildcard characters "*" and "?",...
8
by: joeferns79 | last post by:
Hi, I wanted to write a Perl script that searches a given folder for all files that have filenames based on the previous day's date. eg. if the filenames of the files in the said folder are .......
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.