Hi,
I'm just wrapping my head around regex and am pretty sure it can do the
task at hand - but it's too complex for my brain to process -- so am
throwing it out there for you experts to comment on. I am posing two
questions. In the interests of space and focus, I'll post a separate
thread for the other use case (clustering).
Use Case 1:
Filenames contain a TrackNumber (or not).
Examples:
01 - Calexico - Sonic Wind (instrumental mix).mp3
Gustav_Mahler-Symphony#10-Slatkin-St_Louis-1-Adagio.mp3
Carl Orff - Carmina Burana - 08 - Uf dem anger- Chramer, gip .mp3
01-linkin_park_-_foreword-mp3.mp3
[03] (Wish I Could Fly Like) Superman.mp3
Other examples might be: (XX), XX-, -XX-, - XX - , - XX,-XX
Where XX is a one or two digit number.
Specific examples of things that should not be captured:
Jethro Tull - 1999 - Live At House Of Blues - 13 - Hunting Girl.mp3
The 1999 i snot a track number, but the 13 is. A rule that the number
should be 2 digits should catch one.
Prince - Northrop - 06-13-2000 - 33 - Kiss.mp3
The date should not be captured, but the 33 should.
UB40 - 08 - Sing Our Own Song.mp3
The 40 shouldn't be captured, but the 08 should.
Blink 182 - Take Off Your Pants And Jacket - 06 - The Rock Show.mp3
The 182 should not be captured, but the 06 should.
One more case:
08_Smokie_Living Near The Edge.mp3
Phew...sorry for the length of the post --- can one put together a
regex to tackle this problem?
If so --- I will be both amazed and grateful for your suggestions.
Thanks.
P.S. Part 2 of this will deal with clustering...