Fill a Server - Python

bob_smith_17280

I think this is a silly task, but I have to do it. I have to fill a
file server (1 TB SATA RAID Array) with files. I wrote a Python script
to do this, but it's a bit slow... here it is:

import shutil
import os
import sys
import time

src = "G:"
des = "C:scratch"

os.chdir(src)
try:
for x in xrange(5000):
for root, dirs, files in os.walk(src):
for f in files:
shutil.copyfile(os.path.join(root, f),
"C:\scratch\%s%s" %(f,x))
print "Done!!!"

except Exception, e:
print e
time.sleep(15)
sys.exit()

The problem with this is that it only copies about 35 GB/hour. I would
like to copy at least 100 GB/hour... more if possible. I have tried to
copy from the IDE CD drive to the SATA array with the same results. I
understand the throughput on SATA to be roughly 60MB/sec which comes
out to 3.6 GB/min which should be 216 GB/hour. Can someone show me how
I might do this faster? Is shutil the problem?

Also, my first attempt at this did a recursive copy creating subdirs in
dirs as it copied. It would crash everytime it went 85 subdirs deep.
This is an NTFS filesystem. Would this limitation be in the filesystem
or Python?

Thanks
Bob Smith

"I once worked for a secret government agency in South Africa... among
other things, we developed and tested AIDS. I am now living in an
undisclosed location in North America. The guys who live with me never
let me drive the same route to work and they call me 'Bob Smith 17280'
as there were more before me." -- Bob Smith

Jul 18 '05 #1

Subscribe Reply

1283

Peter Hansen

bo*************@hotmail.com wrote:
[snip code involving copyfile:]

shutil.copyfile(os.path.join(root, f), The problem with this is that it only copies about 35 GB/hour. I would
like to copy at least 100 GB/hour... more if possible. I have tried to
copy from the IDE CD drive to the SATA array with the same results. I
understand the throughput on SATA to be roughly 60MB/sec which comes
out to 3.6 GB/min which should be 216 GB/hour. Can someone show me how
I might do this faster? Is shutil the problem?
Have you tried doing this from some kind of batch file, or
manually, measuring the results? Have you got any way to
achieve this throughput, or is it only a theory? I see
no reason to try to optimize something if there's no real
evidence that it *can* be optimized.
Also, my first attempt at this did a recursive copy creating subdirs in
dirs as it copied. It would crash everytime it went 85 subdirs deep.
This is an NTFS filesystem. Would this limitation be in the filesystem
or Python?

In general, when faced with the question "Is this a limitation
of Python or of this program X of Microsoft origin?", the answer
should be obvious... ;-)

More practically, perhaps: use your script to create one of those
massively nested folders. Wait for it to crash. Now go in
"manually" (with CD or your choice of fancy graphical browser)
to the lowest level folder and attempt to create a subfolder
with the same name the Python script was trying to use. Report
back here on your success, if any. ;-)

(Alternatively, describe your failure in terms other than "crash".
Python code rarely crashes. It does, sometimes, fail and print
out an exception traceback. These are printed for a very good
reason: they are more descriptive than the word "crash".)

-Peter

Jul 18 '05 #2

Fredrik Lundh

<bo*************@hotmail.com> wrote:

Also, my first attempt at this did a recursive copy creating subdirs in
dirs as it copied. It would crash everytime it went 85 subdirs deep.
This is an NTFS filesystem. Would this limitation be in the filesystem
or Python?

see the "Max File Name Length" on this page (random google link)
for an explanation:

http://www.ntfs.com/ntfs_vs_fat.htm

(assuming that "crash" meant "raise an exception", that is)

</F>

Jul 18 '05 #3

Fredrik Lundh

Also, my first attempt at this did a recursive copy creating subdirs in
dirs as it copied. It would crash everytime it went 85 subdirs deep.
This is an NTFS filesystem. Would this limitation be in the filesystem
or Python?

see the "Max File Name Length" on this page (random google link)
for an explanation:

http://www.ntfs.com/ntfs_vs_fat.htm

also:

print len(os.path.join("c:\\scratch", *map(str, range(85))))

</F>

Jul 18 '05 #4

bob_smith_17280

You are correct Peter, the exception read something like this:

"Folder 85 not found."

I am paraphrasing, but that is the crux of the error. It takes about an
hour to produce the error so if you want an exact quote from the
exception, let me know and give me awhile. I looked through the nested
dirs several times after the crash and they always went from 0 - 84...
sure enough, directory 85 had not been created... why I do not know.
Doesn't really matter now as the script I posted achieves similar
results witout crashing... still slow though.

As far as drive throughput, it's my understanding that SATA is
theorhetically capable of 150 MB/sec (google for it). However, in
practice, one can normally expect a sustained throughput of 60 to 70
MB/sec. The drives are 7,200 RPM... not the more expensive 10,000 RPM
drives. I have no idea how RAID 5 might impact performance either. It's
hardware RAID on a top-of-the-line DELL server. I am not a hardware
expert so I don't understand how *sustained* drive throughput, RPM and
RAID together fator into this scenario.

Jul 18 '05 #5

bob_smith_17280

I think you solved it Fredrik.

The first ten folders looked like this:

D:\0\1\2\3\4\5\6\7\8\9

22 Chars long.

The rest looked like this:

\10\11\12\13....\82\83\84

~ 222 CHars long.

Subdir 84 had one file in it named XXXXXXXXXXX.bat

That file broke the 255 limit, then subdir 85 wasn't created and when
the script tried to copy a file to 85, an exception was raised. Not
that it matters. Interesting to know that limits still exists and that
this is a NTFS issue.

Jul 18 '05 #6

Similar topics

1233

Fill stops /timeouts

by: Jeff Magouirk | last post by:

Dear Group, I am tring to use a command that calls the server to fill an adapter, it never seems to get to the adapter, command and the server either times out or does not respond. The timeout...

Microsoft SQL Server

1380

Fill() problem in VS .NET 2003 in Windows application C# with SQL Server

by: Auto | last post by:

I starting to use Visual Studio .NET 2003 creating C# Windows application with SQL Server and I get problem with method Fill() for which when running ends with System Error even with the most...

.NET Framework

1516

Fill() problem in VS .NET 2003 in Windows application C# with SQL Server

by: Auto | last post by:

I starting to use Visual Studio .NET 2003 creating C# Windows application with SQL Server and I get problem with method Fill() for which when running ends with System Error even with the most...

C# / C Sharp

1146

Security Errors trying to fill dataset

by: AndyAFCW | last post by:

I am developing my first .NET application that connects to a SQL Server 2000 database and I am having a total nightmare :x :evil: I am running Windows 2000 with Visual Studio .NET version...

ASP.NET

1273

TimeOut on SqlDataAdapter.Fill

by: Dan | last post by:

I've created a web form which fills a DataGrid with a DataSet generated from the SqlDataAdapter.Fill method. The adapter's query takes about 30 seconds to complete when I run it in the SQL Server...

ASP.NET

6058

DataAdapter.Fill(dataset): Null exception

by: Stanav | last post by:

Hello all, I'm developing a web application using VB.Net 2003 and Framework 1.1. This application queries an AS/400 database. I'm using the IBM OleDb provider that came with IBM Client Access for...

ASP.NET

3737

fill dataset/grid with multiple queries from multiple servers

by: Dave Edwards | last post by:

I understand that I can fill a datagrid with multiple queries, but I cannot figure out how to fill a dataset with the same query but run against multiple SQL servers, the query , table structure...

Visual Basic .NET

2284

Need to debug/trace ADO's Fill method

by: moondaddy | last post by:

I have a website where cataloge pages are populated by calling a stored procedure on sql server. I use the sql data adapter's fill method to call this stored procedure and fill the dataset. about...

Visual Basic .NET

4604

Problem with DataAdapter.Fill(Dataset)

by: Stanav | last post by:

Hello all, I'm developing a web application using VB.Net 2003 and Framework 1.1. This application queries an AS/400 database. I'm using the IBM OleDb provider that came with IBM Client Access for...

Visual Basic .NET

4189

Error message puzzle "da.Fill(ds, "Assets") "

by: slinky | last post by:

I'm getting a error when I open my . aspx in my browser... line 34: da.Fill(ds, "Assets") Here's the error and my entire code for this .aspx.vb is below that ... I need some clues as to what is...

ASP.NET

7216

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7098

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7303

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7367

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

7018

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7471

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5613

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

5028

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

407

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General