473,325 Members | 2,442 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

randomly write to a file

hi,
i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line

Please help.
Regards
Rohit

May 7 '07 #1
10 2271
On May 7, 2:51 pm, rohit <rohitsethi...@gmail.comwrote:
hi,
i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line

Please help.
Regards
Rohit
Hi,

Looking through the archives, it looks like some recommend reading the
file into a list and doing it that way. And if they file is too big,
than use a database. See links below:

http://mail.python.org/pipermail/tut...ch/045571.html
http://mail.python.org/pipermail/tut...ch/045572.html

I also found this interesting idea that explains what would be needed
to accomplish this task:

http://mail.python.org/pipermail/pyt...il/076890.html

Have fun!

Mike

May 7 '07 #2
En Mon, 07 May 2007 16:51:37 -0300, rohit <ro***********@gmail.com>
escribió:
i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line
You can only replace a line in-place with another of exactly the same
length. If the lengths differ, you have to write the modified line and all
the following ones.
If all your lines are of fixed length, you have a "record". To read record
N (counting from 0):
a_file.seek(N*record_length)
return a_file.read(record_length)
And then you are reinventing ISAM.

--
Gabriel Genellina

May 7 '07 #3
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and
higher. SQLite will do a nice job keeping track of the index. You can
easily find the line you need with a SQL query and your can write to
it as well. When you have a file and you write to one line of the
file, all of the rest of the lines will have to be shifted to
accommodate, the potentially larger new line.

-Nick Vatamaniuc
On May 7, 3:51 pm, rohit <rohitsethi...@gmail.comwrote:
hi,
i am developing a desktop search.For the index of the files i have
developed an algorithm with which
i should be able to read and write to a line if i know its line
number.
i can read a specified line by using the module linecache
but i am struck as to how to implement writing to the n(th) line in a
file EFFICIENTLY
which means i don't want to traverse the file sequentially to reach
the n(th) line

Please help.
Regards
Rohit

May 7 '07 #4
nick,
i just wanted to ask for time constrained applications like searching
won't sqlite be a expensive approach.
i mean searching and editing o the files is less expensive by the time
taken .
so i need an approach which will allow me writing randomly to a line
in file without using a database
On May 8, 2:41 am, Nick Vatamaniuc <vatam...@gmail.comwrote:
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and
higher. SQLite will do a nice job keeping track of the index. You can
easily find the line you need with a SQL query and your can write to
it as well. When you have a file and you write to one line of the
file, all of the rest of the lines will have to be shifted to
accommodate, the potentially larger new line.

-Nick Vatamaniuc
May 7 '07 #5
hi gabriel,
i am utilizing file names and their paths which are written to a file
on a singe line.
now if i use records that would be wasting too much space as there is
no limit on the no. of characters (at max) in the path.
next best approach i can think of is reading the file in memory
editing it and writing the portion that has just been altered and the
followiing lines
but is there a better approach you can highlight?
You can only replace a line in-place with another of exactly the same
length. If the lengths differ, you have to write the modified line and all
the following ones.
If all your lines are of fixed length, you have a "record". To read record
N (counting from 0):
a_file.seek(N*record_length)
return a_file.read(record_length)
And then you are reinventing ISAM.

--
Gabriel Genellina

May 7 '07 #6
On Mon, 07 May 2007 12:51:37 -0700, rohit wrote:
i can read a specified line by using the module linecache but i am
struck as to how to implement writing to the n(th) line in a file
EFFICIENTLY
which means i don't want to traverse the file sequentially to reach the
n(th) line
Unless you are lucky enough to be using an OS that supports random-access
line access to text files natively, if such a thing even exists, you
can't because you don't know how long each line will be.

If you can guarantee fixed-length lines, then you can use file.seek() to
jump to the appropriate byte position.

If the lines are random lengths, but you can control access to the files
so other applications can't write to them, you can keep an index table,
which you update as needed.

Otherwise, if the files are small enough, say up to 20 or 40MB each, just
read them entirely into memory.

Otherwise, you're out of luck.
--
Steven.
May 8 '07 #7
On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and higher.
SQLite will do a nice job keeping track of the index. You can easily
find the line you need with a SQL query and your can write to it as
well. When you have a file and you write to one line of the file, all of
the rest of the lines will have to be shifted to accommodate, the
potentially larger new line.

Using an database for tracking line number and byte position -- isn't
that a bit overkill?

I would have thought something as simple as a list of line lengths would
do:

offsets = [35, # first line is 35 bytes long
19, # second line is 19 bytes long...
45, 12, 108, 67]
To get to the nth line, you have to seek to byte position:

sum(offsets[:n])

--
Steven.
May 8 '07 #8
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrote:
On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and higher.
SQLite will do a nice job keeping track of the index. You can easily
find the line you need with a SQL query and your can write to it as
well. When you have a file and you write to one line of the file, all of
the rest of the lines will have to be shifted to accommodate, the
potentially larger new line.


Using an database for tracking line number and byte position -- isn't
that a bit overkill?

I would have thought something as simple as a list of line lengths would
do:

offsets = [35, # first line is 35 bytes long
19, # second line is 19 bytes long...
45, 12, 108, 67]
To get to the nth line, you have to seek to byte position:

sum(offsets[:n])
....and then you STILL can't write there (without reading and rewriting
all the succeeding part of the file) unless the line you're writing is
always the same length as the one you're overwriting, which doesn't seem
to be part of the constraints in the OP's original application. I'm
with Nick in recommending SQlite for the purpose -- it _IS_ quite
"lite", as its name suggests. BSD-DB (a DB that's much more complicated
to use, being far lower-level, but by the same token affords you
extremely fine-grained control of operations) might be an alternative
IF, after first having coded the application with SQLite, you can indeed
prove, profiler in hand, that it's a serious bottleneck. However,
premature optimization is the root of all evil in programming.
Alex
May 8 '07 #9
On Mon, 07 May 2007 20:00:57 -0700, Alex Martelli wrote:
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrote:
>On Mon, 07 May 2007 14:41:02 -0700, Nick Vatamaniuc wrote:
Rohit,

Consider using an SQLite database. It comes with Python 2.5 and
higher. SQLite will do a nice job keeping track of the index. You can
easily find the line you need with a SQL query and your can write to
it as well. When you have a file and you write to one line of the
file, all of the rest of the lines will have to be shifted to
accommodate, the potentially larger new line.


Using an database for tracking line number and byte position -- isn't
that a bit overkill?

I would have thought something as simple as a list of line lengths
would do:

offsets = [35, # first line is 35 bytes long
19, # second line is 19 bytes long... 45, 12, 108, 67]
To get to the nth line, you have to seek to byte position:

sum(offsets[:n])

...and then you STILL can't write there (without reading and rewriting
all the succeeding part of the file) unless the line you're writing is
always the same length as the one you're overwriting, which doesn't seem
to be part of the constraints in the OP's original application. I'm
with Nick in recommending SQlite for the purpose -- it _IS_ quite
"lite", as its name suggests.

Hang on, as I understand it, Nick just suggesting using SQlite for
holding indexes into the file! That's why I said it was overkill. So
whether the indexes are in a list or a database, you've _still_ got to
deal with writing to the file.

If I've misunderstood Nick's suggestion, if he actually meant to read the
entire text file into the database, well, that's just a heavier version
of reading the file into a list of strings, isn't it? If the database
gives you more and/or better functionality than file.readlines(), then I
have no problem with using the right tool for the job.
--
Steven.
May 8 '07 #10
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrote:
...
Hang on, as I understand it, Nick just suggesting using SQlite for
holding indexes into the file! That's why I said it was overkill. So
whether the indexes are in a list or a database, you've _still_ got to
deal with writing to the file.

If I've misunderstood Nick's suggestion, if he actually meant to read the
entire text file into the database, well, that's just a heavier version
of reading the file into a list of strings, isn't it? If the database
gives you more and/or better functionality than file.readlines(), then I
have no problem with using the right tool for the job.
Ah well, I may have misunderstood myself. I'd keep the whole thing in
an SQlite table, definitely NOT a table + an external file -- no, that's
not going to be heavier than reading things in memory, SQLite is smarter
than one might think:-). Obviously, I'm assuming that one's dealing
with an amount of data that doesn't just comfortably and easily fit in
memory, or at least one that gives pause at the thought of sucking it
all into memory and writing it back out again at every program run.
Alex
May 8 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Benny Ng | last post by:
Hi,All, Export Method: ------------------------------------------------------------------------- strFileNameExport = "Results" Response.Clear() Response.Buffer = True...
5
by: fbwhite | last post by:
I know this issue has been brought up many times, but I have tried many of the solutions to no avail. I wanted to give my specific case to see if someone could be of any help. We are using the...
3
by: brian | last post by:
Hello, Can someone tell me how I can randomly assign a file to a variable from a directory. Example: Dim File as string I need to search through 'C:/Comics/' and have the program randomly...
2
by: Scott Gordo | last post by:
<!--Newbie warning--> I've got a csv file with names, addresses, emails, etc. I've been asked to randomly select a name from a csv file. I've found plenty of RandNum examples, but I'm not sure how...
2
by: TPK | last post by:
I have an HTML document with Javascript where I have a portion of the page contents, a series of questions, being pulled from a XML file. When the page dynamically builds all the questions and...
1
by: antony | last post by:
It run w/o error but no image appears. Please help me. Here si the code I do " <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title>New Page...
9
by: Alan Isaac | last post by:
I need access to 2*n random choices for two types subject to a constraint that in the end I have drawn n of each. I first tried:: def random_types(n,typelist=): types = typelist*n...
4
by: bb nicole | last post by:
Hi.. i would like to create a page that provide users to randomly run phone number. I know it is use rand() to code it, but how if i porvide a box for user to key in how many phone number they want...
7
by: Man4ish | last post by:
I have one pblm for reading a file randomly for searching the value with in given range. e.g. offset allele id 19 G/T 2066803 20 C/T 2066804 ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.