473,386 Members | 2,078 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

python ETL

Hi,
My company is involved in the development of many data marts and
data-warehouses, and I currently looking into migrating our old set of
tools (written in Korn) to a new, more dynamic and robust one. I am
looking into python as I have heard that it could be a good contestant
for the job, and wanted to know if anyone knew of an existing open
source project which implements ETL using python, or any libraries that
may ease the production of such tools.

Thanks.

Aug 1 '05 #1
3 4249
ar*****@gmail.com wrote:
Hi,
My company is involved in the development of many data marts and
data-warehouses, and I currently looking into migrating our old set of
tools (written in Korn) to a new, more dynamic and robust one. I am
looking into python as I have heard that it could be a good contestant
for the job, and wanted to know if anyone knew of an existing open
source project which implements ETL using python, or any libraries that
may ease the production of such tools.


I'm not an expert in such matters, I had to Google for the definition of
ETL ("extract, transform, and load" which appears to just be a buzzword
for "data munging"); but it seems to me that "ETL" is so utterly broad
in scope that we can't tell you anything until you give us some more
information.

What are your sources of data? What kind of data are you dealing with?
What kinds of munging do you want to do? What formats are the data going to?

However, given that your current toolset is written as Korn shell
scripts, I'm pretty confident that Python will be up to the task.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Aug 1 '05 #2
ar*****@gmail.com wrote:
Hi,
My company is involved in the development of many data marts and
data-warehouses, and I currently looking into migrating our old set of
tools (written in Korn) to a new, more dynamic and robust one. I am
looking into python as I have heard that it could be a good contestant
for the job, and wanted to know if anyone knew of an existing open
source project which implements ETL using python, or any libraries that
may ease the production of such tools.

Thanks.


Robert is right; you have not really given much information.

However, I would have to assume that if homebrew shell scripts have been
doing the work adequately, then the marts and warehouses are not very
large and the datasets are primarily text rather than binary.

If this is the case and you are only seeking incremental improvement,
then Python would be a very good choice. Perl would also do the job.
Just about any language would work. Yes, there are many reasons to
choose Python. However, you would have to build any scalability and
metadata management.

If you seek a radical improvement, it is available, but I do not know of
any free tools that will do it. A question like this will probably not
be answered in a newsgroup post or even the exchange of a few emails.

Choosing an effective tool for the organization is not a trivial
process. It requires knowledge of both the tools and the organization's
methodologies and processes. If you do not have staff who can do this,
most companies find it is much cheaper and faster to pay someone who
does know (a consultant) to assist them in assessing their requirements,
tool selection, and forming an implementation plan.

Yes, your company staff can learn a lot by experimenting and playing
with several tools, but shareholders might not view that approach as the
most effective.
Aug 1 '05 #3
On Mon, 01 Aug 2005 10:49:36 -0500, Paul Watson <pw*****@redlinepy.com> wrote:
ar*****@gmail.com wrote:
Hi,
My company is involved in the development of many data marts and
data-warehouses, and I currently looking into migrating our old set of
tools (written in Korn) to a new, more dynamic and robust one.
.... However, I would have to assume that if homebrew shell scripts have been
doing the work adequately, then the marts and warehouses are not very
large and the datasets are primarily text rather than binary.

If this is the case and you are only seeking incremental improvement,
then Python would be a very good choice. Perl would also do the job.
Just about any language would work. Yes, there are many reasons to
choose Python. However, you would have to build any scalability and
metadata management.

If you seek a radical improvement, it is available, but I do not know of
any free tools that will do it. A question like this will probably not
be answered in a newsgroup post or even the exchange of a few emails.

Choosing an effective tool for the organization is not a trivial
process. It requires knowledge of both the tools and the organization's
methodologies and processes. If you do not have staff who can do this,
most companies find it is much cheaper and faster to pay someone who
does know (a consultant) to assist them in assessing their requirements,
tool selection, and forming an implementation plan.


But remember: sometimes, a bunch of shell scripts or a Python script is the
right tool for the problem.

Sometimes, I think a bunch of shell scripts is the right tool for a lot of
the problems people throw XMLthis, XMLthat, .NET, SQL servers, consultants
and money at.

There is no real reason (with the little information we have[1]) to believe
that the original poster is making his employer a disservice by looking at
doing things himself, in plain old Python, instread of letting someome tear
down and rebuild whatever workflow/methodology/process stuff they have right
now.

/Jorgen
[1] Unless "ETL" and "data mart" carry some deep meaning which
I've missed, that is.

--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!
Aug 4 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.