472,971 Members | 2,387 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,971 software developers and data experts.

Parsing text into dates?

I'm developing a web-application where the user sometimes has to enter
dates in plain text, allthough a format may be provided to give clues.
On the server side this piece of text has to be parsed into a datetime
python-object. Does anybody have any pointers on this?

Besides the actual parsing, my main concern is the different locale
date formats and how to be able to parse those strange us-like
"month/day/year" compared to the clever and intuitive european-style
"day/month/year" etc.

I've searched google, but haven't found any good referances that helped
me solve this problem, especially with regards to the locale date
format issues.

Best regards,
Thomas

Jul 19 '05 #1
9 2716
On 16 May 2005 13:59:31 -0700, "Thomas W" <th***********@gmail.com>
wrote:
I'm developing a web-application where the user sometimes has to enter
dates in plain text, allthough a format may be provided to give clues.
On the server side this piece of text has to be parsed into a datetime
python-object. Does anybody have any pointers on this?

Besides the actual parsing, my main concern is the different locale
date formats and how to be able to parse those strange us-like
"month/day/year" compared to the clever and intuitive european-style
"day/month/year" etc.


<rant>
Well I'm from a locale that uses the dd/mm/yyyy style and I think it's
only marginally less stupid than the mm/dd/yyyy style.
</rant>

How much intuition is required to determine in an international
context what was meant by 01/12/2004? First of December or 12th of
January? The consequences of misinterpretation can be enormous.

If this application is being deployed from a central server where the
users can be worldwide, you have two options:

(a) try to work out somehow what the user's locale is, and then work
with dates in the legacy format "appropriate" to the locale.

(b) Use the considerably-less-stupid ISO 8601 standard format
yyyy-mm-dd (e.g. 2004-12-01) -- throughout your web-application, not
just in your data entry.

Having said all of that, [bottom-up question] how are you handling
locale differences in language, script, currency symbol, decimal
"point", thousands separator, postal address formats, surname /
given-name order, etc etc etc? [top-down question] What *is* your
target audience?
Jul 19 '05 #2
John Machin wrote:
If this application is being deployed from a central server where the
users can be worldwide, you have two options:

(a) try to work out somehow what the user's locale is, and then work
with dates in the legacy format "appropriate" to the locale.
And this inevitably screws a large number of Canadians (and probably
others), those poor conflicted folk caught between their European roots
and their American neighbours, some of whom use mm/dd/yy and others of
whom use dd/mm/yy on a regular basis. And some of us who switch
willy-nilly, much as we do between metric and imperial. :-(
(b) Use the considerably-less-stupid ISO 8601 standard format
yyyy-mm-dd (e.g. 2004-12-01) -- throughout your web-application, not
just in your data entry.


+1 (emphatically!) (I almost always use this form even on government
submissions, and nobody has complained yet. Of course, they haven't
started changing the forms yet, either...)

-Peter
Jul 19 '05 #3
"Thomas W" wrote:
I'm developing a web-application where the user sometimes has to enter dates in plain text, allthough a format may be provided to give clues. On the server side this piece of text has to be parsed into a datetime python-object. Does anybody have any pointers on this?

Besides the actual parsing, my main concern is the different locale
date formats and how to be able to parse those strange us-like
"month/day/year" compared to the clever and intuitive european-style
"day/month/year" etc.

I've searched google, but haven't found any good referances that helped me solve this problem, especially with regards to the locale date
format issues.

Best regards,
Thomas


Although it is not a solution to the general localization problem, you
may try the mx.DateTimeFrom() factory function
(http://www.egenix.com/files/python/m....html#DateTime) for the
parsing part. I had also written some time ago a more robust and
customized version of such parser. The ambiguous us/european style
dates are disambiguated by the provided optional argument USA (False by
default <wink>). Below is the doctest and the documentation (with
epydoc tags); mail me offlist if you'd like to check it out.

George

#================================================= ======

def parseDateTime(string, USA=False, implyCurrentDate=False,
yearHeuristic=_20thcenturyHeuristic):
'''Tries to parse a string as a valid date and/or time.

It recognizes most common (and less common) date and time formats.

Examples:
# doctest was run succesfully on...
str(datetime.date.today()) '2005-05-16' str(parseDateTime('21:23:39.91')) '21:23:39.910000' str(parseDateTime('16:15')) '16:15:00' str(parseDateTime('10am')) '10:00:00' str(parseDateTime('2:7:18.')) '02:07:18' str(parseDateTime('08:32:40 PM')) '20:32:40' str(parseDateTime('11:59pm')) '23:59:00' str(parseDateTime('12:32:9')) '12:32:09' str(parseDateTime('12:32:9', implyCurrentDate=True)) '2005-05-16 12:32:09' str(parseDateTime('93/7/18')) '1993-07-18' str(parseDateTime('15.6.2001')) '2001-06-15' str(parseDateTime('6.15.2001')) '2001-06-15' str(parseDateTime('1980, November 20')) '1980-11-20' str(parseDateTime('4 Mar 79')) '1979-03-04' str(parseDateTime('July 4')) '2005-07-04' str(parseDateTime('15/08')) '2005-08-15' str(parseDateTime('5 Mar 3:45pm')) '2005-03-05 15:45:00' str(parseDateTime('01 02 2003')) '2003-02-01' str(parseDateTime('01 02 2003', USA=True)) '2003-01-02' str(parseDateTime('3/4/92')) '1992-04-03' str(parseDateTime('3/4/92', USA=True)) '1992-03-04' str(parseDateTime('12:32:09 1-2-2003')) '2003-02-01 12:32:09' str(parseDateTime('12:32:09 1-2-2003', USA=True)) '2003-01-02 12:32:09' str(parseDateTime('3:45pm 5 12 2001')) '2001-12-05 15:45:00' str(parseDateTime('3:45pm 5 12 2001', USA=True))

'2001-05-12 15:45:00'

@param USA: Disambiguates strings that are valid dates in both
(month,
day, year) and (day, month, year) order (e.g. 05/03/2002). If
True,
the first format is assumed.
@param implyCurrentDate: If True and the date is not given, the
current
date is implied.
@param yearHeuristic: If not None, a callable f(year) that
transforms the
value of the given year. The default heuristic transforms
2-digit
years to 4-digit years assuming they are in the 20th century::
lambda year: (year >= 100 and year
or year >= 10 and 1900 + year
or None)
The heuristic should return None if the year is not considered
valid.
If yearHeuristic is None, no year transformation takes place.
@return:
- C{datetime.date} if only the date is recognized.
- C{datetime.time} if only the time is recognized and
implyCurrentDate
is False.
- C{datetime.datetime} if both date and time are recognized.
@raise ValueError: If the string cannot be parsed successfully.
'''

Jul 19 '05 #4
On 16 May 2005 17:51:31 -0700, "George Sakkis" <gs*****@rutgers.edu>
wrote:

#================================================ =======

def parseDateTime(string, USA=False, implyCurrentDate=False,
yearHeuristic=_20thcenturyHeuristic):
'''Tries to parse a string as a valid date and/or time.

It recognizes most common (and less common) date and time formats.
Impressive!


Examples:

[snip]
>>> str(parseDateTime('15.6.2001')) '2001-06-15' >>> str(parseDateTime('6.15.2001'))

'2001-06-15'


A dangerous heuristic -- 6.12.2001 (meaning 2001-12-06) can be easily
typoed into 6.13.2001 or 6.15.2001 on the numeric keypad.
Jul 19 '05 #5
"John Machin" <sj******@lexicon.net> wrote:
On 16 May 2005 17:51:31 -0700, "George Sakkis" <gs*****@rutgers.edu>
wrote:

#================================================ =======

def parseDateTime(string, USA=False, implyCurrentDate=False,
yearHeuristic=_20thcenturyHeuristic):
'''Tries to parse a string as a valid date and/or time.

It recognizes most common (and less common) date and time formats.

Impressive!


Examples:

[snip] >>> str(parseDateTime('15.6.2001'))

'2001-06-15'
>>> str(parseDateTime('6.15.2001'))

'2001-06-15'


A dangerous heuristic -- 6.12.2001 (meaning 2001-12-06) can be easily
typoed into 6.13.2001 or 6.15.2001 on the numeric keypad.


Sure, but how is this different from a typo of 2001-12-07 instead of
2001-12-06 ? There's no way you can catch all typos anyway by parsing
alone. Besides, 6.15.2001 is to be interpreted as 2001-06-15 in US
format. Currently the 'USA' flag is used only for ambiguous dates, but
that's easy to change to apply to all dates. Essentially you would gain
a little extra safety at the expense of a little lost recall over the
set of parseable dates.

George

Jul 19 '05 #6
"Thomas W" <th***********@gmail.com> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
I'm developing a web-application where the user sometimes has to enter
dates in plain text, allthough a format may be provided to give clues.
On the server side this piece of text has to be parsed into a datetime
python-object. Does anybody have any pointers on this?

Besides the actual parsing, my main concern is the different locale
date formats and how to be able to parse those strange us-like
"month/day/year" compared to the clever and intuitive european-style
"day/month/year" etc.

I've searched google, but haven't found any good referances that helped
me solve this problem, especially with regards to the locale date
format issues.
There is no easy answer if you want to be able to enter three
numbers. There are two answers that work, although there will
be a lot of complaining. One is to use the international yyyy-mm-dd
form, and the other is to accept a 4 digit year, an alphabetic month
and a two digit day in any order.

Otherwise, if you get 4 digits as the first component, and it passes your
validation (whatever that is) for reasonable years, you're probably
pretty safe to assume that you've got yyyy-mm-dd. Otherwise
if you can't get a clean answser (one is > 31, one is 12 < x < 32
and one is <= 12, just give them a list of possibilities and politely
suggest that they enter it as yyyy-mm-dd next time.

I don't validate separators. As long as there is something that isn't a
number or a letter, it's a separator and which one doesn't matter. At
times I've even taken the transition between a digit and a letter as
a separator.

John Roth
Best regards,
Thomas


Jul 19 '05 #7
The beautiful brand new cookbook2 has "Fuzzy parsing of Dates" using
dateutil.parser, which you run once you have a decent guess at locale
(page 127 of cookbook)

John Roth wrote:
"Thomas W" <th***********@gmail.com> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
I'm developing a web-application where the user sometimes has to enter dates in plain text, allthough a format may be provided to give clues. On the server side this piece of text has to be parsed into a datetime python-object. Does anybody have any pointers on this?

Besides the actual parsing, my main concern is the different locale
date formats and how to be able to parse those strange us-like
"month/day/year" compared to the clever and intuitive european-style "day/month/year" etc.

I've searched google, but haven't found any good referances that helped me solve this problem, especially with regards to the locale date
format issues.
There is no easy answer if you want to be able to enter three
numbers. There are two answers that work, although there will
be a lot of complaining. One is to use the international yyyy-mm-dd
form, and the other is to accept a 4 digit year, an alphabetic month
and a two digit day in any order.

Otherwise, if you get 4 digits as the first component, and it passes

your validation (whatever that is) for reasonable years, you're probably
pretty safe to assume that you've got yyyy-mm-dd. Otherwise
if you can't get a clean answser (one is > 31, one is 12 < x < 32
and one is <= 12, just give them a list of possibilities and politely
suggest that they enter it as yyyy-mm-dd next time.

I don't validate separators. As long as there is something that isn't a number or a letter, it's a separator and which one doesn't matter. At
times I've even taken the transition between a digit and a letter as
a separator.

John Roth
Best regards,
Thomas


Jul 19 '05 #8
"Thomas W" <th***********@gmail.com> writes:
I'm developing a web-application where the user sometimes has to enter
dates in plain text, allthough a format may be provided to give clues.
On the server side this piece of text has to be parsed into a datetime
python-object. Does anybody have any pointers on this?


Why are you making it possible for the users to screw this up? Don't
give them a text widget to fill in and you have to figure out what the
format is, give them three widgets so you *know* what's what.

In doing that, you can also go to dropdown widgets for month, with
month names (in a locale appropriate for the page language), and for
the days in the month. For the latter, you can use JScript to get the
number of days in the list right, but make sure you fill it in with a
full 31 in case the user has JScript disabled. Finally, if you are
dealing with a fixed range of years, you can use a dropdown list for
that as well, eliminating having to deal with any text from the user
at all.

If the spec calls for plain text entry and you've already tried to get
that changed to something sane, my apologies.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 19 '05 #9
On Tue, 17 May 2005 16:44:12 -0500, Mike Meyer <mw*@mired.org> wrote:
"Thomas W" <th***********@gmail.com> writes:
I'm developing a web-application where the user sometimes has to enter
dates in plain text, allthough a format may be provided to give clues.
On the server side this piece of text has to be parsed into a datetime
python-object. Does anybody have any pointers on this?


Why are you making it possible for the users to screw this up? Don't
give them a text widget to fill in and you have to figure out what the
format is, give them three widgets so you *know* what's what.

In doing that, you can also go to dropdown widgets for month, with
month names (in a locale appropriate for the page language), and for
the days in the month.


My experience: drop-down lists generate off-by-one errors. They also
annoy the bejaysus out of users -- e.g. year of birth, a 60+ element
list. It's quite possible of course that YMMV :-)

BTW: I have seen a web page with a drop-down list for year of birth
where the first 18 entries were <current year>, <current year - 1>,
etc for a transaction that wasn't for minors.


Jul 19 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
4
by: wrytat | last post by:
I have a form with a textbox for the user to enter a quantity and another textbox for the delivery date. I disabled this delivery date textbox such that the user has to press a calendar link next...
6
by: Tim N. van der Leeuw | last post by:
Hi, I want to parse strings containing date-time, which look like the following: "Mon Dec 19 11:06:12:333 CET 2005" That's a problem for strptime it seems, b/c I cannot find any...
1
by: yk | last post by:
I am setting up a simple hotel reservation application which have two dates, i.e. CheckIn and CheckOut. I would like to construct a graph based on a crosstab query. Therefore, I need to have...
0
by: Uncle Leo | last post by:
I created an OleDbDataAdapter with the wizard in Visual Studio 2003. It created a dataset, connectionstring etc. for me to work with. It also created a .xsd file where one of the columns type is...
5
by: moddster | last post by:
Hi Guys. I am a newbie to perl and need some help with a problem. PROBLEM: I have to parse an HTML file and get rid of all the HTML tags and count the number of sumbissions a person has through...
1
by: Malcolm Greene | last post by:
The locale module provides the ability to format dates, currency and numbers according to a specific locale. Is there a corresponding module for parsing locale's output to convert locale...
0
by: taa | last post by:
Hi there Iím trying to come up with a smart way of parsing content from textboxes in C#. I have about 7-10 boxes with different content; dates, times, numbers and text that has to be parsed and...
6
by: i_robot73 | last post by:
I have a file, containing hex values for dates (MMDDYYYY)<status code><??such as: ...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.