By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,226 Members | 1,435 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,226 IT Pros & Developers. It's quick & easy.

Intepret date if IsDate is False

100+
P: 675
I am attempting to scrape as much as I can from certain web pages and/or emails. Date is giving me the most trouble.

I can code this, and have started, but I wondered if anyone has a solution already written, that they would share. I would like such a function, when passed 'Input' to return a date variable. At minimum, it should handle the following:

Expand|Select|Wrap|Line Numbers
  1. Input             Intepreted as      Date American
  2. ------------      -------------      -----------------------
  3. "4/7/05"          7 April, 2005      #4/7/05#
  4. "8/6"             8 June, 2009       #6/8/09#
  5. "8/2002"          1 August, 2002     #8/1/02#
  6. "August 03"       1 August, 2003     #8/1/03#
  7. "4.6.06"          4 June, 2006       #7/4/06#
  8. "8th of March"    8 March, 2009      #3/8/09#
  9. "8 th of Mar"     8 March, 2009      #3/8/09#
  10. "December 17 th"  17 December, 2009  #12/17/09#
  11. "Dec 17th"        17 December, 2009  #12/17/09#
  12. "August 2009"     1 August, 2009     #8/1/09#
  13. "Aug09"           1 August, 2009     #8/1/09#
  14. "September"       1 September, 2009  #9/1/09#
Apr 16 '09 #1
Share this Question
Share on Google+
9 Replies


FishVal
Expert 2.5K+
P: 2,653
Just subscribing.
Apr 16 '09 #2

JustJim
Expert 100+
P: 407
I'll watch this with interest too.
Apr 17 '09 #3

NeoPa
Expert Mod 15k+
P: 31,707
The problem with this is that humans are too non-algorithmic generally! They don't like to follow decent rules as computers do.

It's possible, but very laborious, to produce a routine for this, but breaking it down into something neat is pretty well impossible, as there's so little common to the various ways of portraying the dates.

You can be clever, but you still end up with a large chunk of code, most of which isn't very general.

BTW I'd be happy if someone proved me wrong. Impressed too of course.
Apr 18 '09 #4

FishVal
Expert 2.5K+
P: 2,653
Frankly speaking, it is not that hard task to right code to recognize date in formats provided in post#1. Sure, it is not extremely easy, but not hard either.
Apr 18 '09 #5

100+
P: 675
I have, over the years, collected a large number of photos of birds (30K+). I am currently attempting to standardize the names, then I will generate an Access database to cross-reference by location, season, plumage, sex, and age. The photographers or web designers are not always good at fully identifying all the data.

Date is important in that it gives clues to plumage. Birds molt twice a year, and knowing the date identifies plumage. But I don't have to have an exact date. I can work without the year, as the plumage in Feb 1988 would be the same as Feb 2008. Season (winter, spring, ...) would suffice if all birds molted at the same time.

I'm going to incorporate a date into the file name, with the form "99 XXX00", where 99 is the day, XXX is the month or season, and 00 is the year. Just about any piece of that pattern can be missing, except both month and year. If I only have year, I use 4 digit year.

To process this date into either a date field, or separate day,month,year fields is easy.

So, from various sources, I need to paste into a text field some not-quite-random-garbage and produce "3 August09" or "2003" or "April" or "???" or just an empty field. I have to also convert the bird's name to my standards, as well as the location, and any comments by the photographer or website.

Most of the code here is Basic, not using the capabilities of Access. Access does help with the bird names, and a photographers table so I know where "My neighbor's yard" is in terms of City (or park or area), Country.

Back to the date. It can be done, to some degree of accuracy, with a "brute force" method. 200 lines of code, each executed once. Nothing clever, like "Format(Join(Split(txtInput)), ....)"

Each picture must be done individually. There is not automatic process for this, as a picture may be identified as "Same as previous post". So I don't have to cover all possibilities. It's just that the more I can do without typing, the faster the task. I can handle about 2 pictures/minute now if they are by the same photographer, and in chronological order (lots of luck here).
Apr 18 '09 #6

FishVal
Expert 2.5K+
P: 2,653
Ok.

Brute force, a fortiori, could take advantage of Access as RDBMS.
Let us say, different possible formants could be stored in table and applied on input in a loop. This gives, BTW, an opportunity to create a self-teaching application.
Apr 18 '09 #7

100+
P: 675
My last email included
Another shot of the pretty Blue Tit in the Apple tree. March 24 th in my garden, in France. <name>
I replaced the name for privacy, but I'm already taking advantage of Access. Access has a table of photographers, and another related table of locations. I have every species of bird on the planet in a set of tables from Kingdom, Phylum, Class, Order, Family, Genus, Species by English Common Name

1) Message displayed in Outlook Express
2) Double-click Attachment to see if I even want picture. If no, delete email. Otherwise
3) Highlight entire message and cntl-C to clipboard
4) Toggle to my picture processing program BirdPix using cntl+tab or cntl+shift+tab
5) Click cmdNewPaste -
a: Clears all fields on form
b: Pastes into an unbound textbox, txtWorkspace, from clipboard
6) Highlight "Blue Tit" in txtWorkspace, using mouse
7) Press cmdSpecies
a: Copies "Blue Tit" to species lookup textbox
b: Fills Name with "Blue Tit (Cyanistes caeruleus)", the name I want
c: Fills PictureName textbox with "Blue Tit (Cyanistes caeruleus) ~ ~"
8) Select from listbox first letter of photographer's name from email
9) Select correct name from dropdown combobox, already dropped from 8)
10) Select "City, Country" from displayed listbox containing this photographer's Home Country, Home City/Country, Frequently Visited places, last countries traveled to, etc.
a: Fills PictureName textbox with "Blue Tit (Cyanistes caeruleus) ~ ~ Paris, Framce"
11) Highlight "March 24 th" in txtWorkspace
12) Press cmdDate
13) Fills txtDate
a: Fills PictureName textbox with "Blue Tit (Cyanistes caeruleus) ~ ~ Paris, France ~ 24 Mar09"
14) Press cmdFinalPicName
a: Copies "562983760 ~ Blue Tit (Cyanistes caeruleus) ~ ~ Paris, France ~ 24 Mar09.jpg" to clipboard
15) Use cntl+shift+tab to get back to picture
16) Press s
a: Save dialog displayed
b: cntl+V to paste name into save dialog
c: Press Enter to save picture and close dialog
17) If there is a second picture of this bird, or the next email contains another picture of this bird, then
18) Use cntl+tab to return to BirdPix
19) Press cmdFinalPicName
a: Generates same name with a different 9 digit number
b: Returns to picture for save
20) Loop as needed

Above seems lengthy, but each kept picture can, on average, be handled in 30 seconds. I'm not sure what I learn, or what I save, by adding a table for different date patterns.

The species name is bad enough. It will take species by common (English only) name, by latin (scientific) name, a combination, or a partial name. Just the name "Tit" produces a dropdown list of 55 species ending in the word "Tit"
Apr 18 '09 #8

JustJim
Expert 100+
P: 407
Hi OldBirdman,

It looks like you've got a very capable application developed there. My thinking on the date problem, is that if you have a text box bound to a field of datatype Date, Access is fairly good at interpreting some user input. For the others, would it be more convenient just to have a calendar control near the date text box?

I agree that code could be written to manage the examples you gave in the OP, it would be complicated, but not complex. I think the major downfall is that it would be fairly high maintenance for a while until you had covered all possible formats.

Since you get input from over the world, ambiguous dates (dd/mm or mm/dd where the day number <=12) would still need some input from you, I think.

All the best

Jim
Apr 18 '09 #9

NeoPa
Expert Mod 15k+
P: 31,707
I went through a large number of algorithmic approaches on Saturday trying to find a consistent way through this, but there were exceptions in the data for each one I tried.

As Jim said, it's not complex. Nothing mathematically convoluted, just laborious to work through. Very untidy.
Apr 20 '09 #10

Post your reply

Sign in to post your reply or Sign up for a free account.