emulating read and readline methods

Sean Davis

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line

def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

def readline(self,n=1):
return self._read().next()

def read(self,n=1):
return self._read().next()

def close(self):
self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""

for a 1000 line test file. Any ideas what is going on?

Thanks,
Sean

Sep 10 '08 #1

Subscribe Post Reply

3694

Diez B. Roggisch

Sean Davis schrieb:

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
def __init__(self):
#urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
self.fh = gzip.open("/tmp/gene_info.gz")
self.fh.readline() #deal with header line

def _read(self,n=1):
for line in self.fh:
if line=='':
break
line=line.strip()
line=re.sub("\t-","\t",line)
rowvals = line.split("\t")
yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

def readline(self,n=1):
return self._read().next()

def read(self,n=1):
return self._read().next()

def close(self):
self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: COPY gene_info, line 1000: ""

for a 1000 line test file. Any ideas what is going on?

I'm a bit lost why the above actually works - as _read() appears to be
re-created instead of re-used for each invocation, and thus can't work IMHO.

Anyway, I think the real problem is that you don't follow the
readline-protocol. it returns "" if there is no more line to read,
instead you raise a StopIteration

Diez

Sep 10 '08 #2

MRAB

On Sep 10, 6:59*pm, Sean Davis <seand...@gmail.comwrote:

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
* * def __init__(self):
* * * * #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
* * * * self.fh = gzip.open("/tmp/gene_info.gz")
* * * * self.fh.readline() #deal with header line

* * def _read(self,n=1):
* * * * for line in self.fh:
* * * * * * if line=='':
* * * * * * * * break
* * * * * * line=line.strip()
* * * * * * line=re.sub("\t-","\t",line)
* * * * * * rowvals = line.split("\t")
* * * * * * yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

* * def readline(self,n=1):
* * * * return self._read().next()

* * def read(self,n=1):
* * * * return self._read().next()

Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().

* * def close(self):
* * * * self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: *COPY gene_info, line 1000: ""

for a 1000 line test file. *Any ideas what is going on?

I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

Sep 10 '08 #3

MRAB

On Sep 10, 10:52*pm, "Diez B. Roggisch" <de...@nospam.web.dewrote:

Sean Davis schrieb:

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
* * def __init__(self):
* * * * #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
* * * * self.fh = gzip.open("/tmp/gene_info.gz")
* * * * self.fh.readline() #deal with header line

* * def _read(self,n=1):
* * * * for line in self.fh:
* * * * * * if line=='':
* * * * * * * * break
* * * * * * line=line.strip()
* * * * * * line=re.sub("\t-","\t",line)
* * * * * * rowvals = line.split("\t")
* * * * * * yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

* * def readline(self,n=1):
* * * * return self._read().next()

* * def read(self,n=1):
* * * * return self._read().next()

* * def close(self):
* * * * self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: *COPY gene_info, line 1000: ""

for a 1000 line test file. *Any ideas what is going on?

I'm a bit lost why the above actually works - as _read() appears to be
re-created instead of re-used for each invocation, and thus can't work IMHO.

Each generator that's created reads a single line from the file
(self.fh), yields the result, and is then discarded; none of the
individual generator read more than one line from the file.

Anyway, I think the real problem is that you don't follow the
readline-protocol. it returns "" if there is no more line to read,
instead you raise a StopIteration

Diez

Sep 10 '08 #4

John Machin

On Sep 11, 8:01*am, MRAB <goo...@mrabarnett.plus.comwrote:

On Sep 10, 6:59*pm, Sean Davis <seand...@gmail.comwrote:

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
* * def __init__(self):
* * * * #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
* * * * self.fh = gzip.open("/tmp/gene_info.gz")
* * * * self.fh.readline() #deal with header line

* * def _read(self,n=1):
* * * * for line in self.fh:
* * * * * * if line=='':
* * * * * * * * break
* * * * * * line=line.strip()
* * * * * * line=re.sub("\t-","\t",line)
* * * * * * rowvals = line.split("\t")
* * * * * * yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

* * def readline(self,n=1):
* * * * return self._read().next()

* * def read(self,n=1):
* * * * return self._read().next()

Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().

* * def close(self):
* * * * self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: *COPY gene_info, line 1000: ""

for a 1000 line test file. *Any ideas what is going on?

I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

Don't wonder; ReadTheFantasticManual:

read( [size])

.... An empty string is returned when EOF is encountered
immediately. ...

readline( [size])

... An empty string is returned only when EOF is encountered
immediately.

Sep 10 '08 #5

Sean Davis

On Sep 10, 7:54*pm, John Machin <sjmac...@lexicon.netwrote:

On Sep 11, 8:01*am, MRAB <goo...@mrabarnett.plus.comwrote:

On Sep 10, 6:59*pm, Sean Davis <seand...@gmail.comwrote:

I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
* * def __init__(self):
* * * * #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
* * * * self.fh = gzip.open("/tmp/gene_info.gz")
* * * * self.fh.readline() #deal with header line

* * def _read(self,n=1):
* * * * for line in self.fh:
* * * * * * if line=='':
* * * * * * * * break
* * * * * * line=line.strip()
* * * * * * line=re.sub("\t-","\t",line)
* * * * * * rowvals = line.split("\t")
* * * * * * yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

* * def readline(self,n=1):
* * * * return self._read().next()

* * def read(self,n=1):
* * * * return self._read().next()

Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().

* * def close(self):
* * * * self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: *COPY gene_info, line 1000: ""

for a 1000 line test file. *Any ideas what is going on?

I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

Don't wonder; ReadTheFantasticManual:

read( [size])

... An empty string is returned when EOF is encountered
immediately. ...

readline( [size])

*... An empty string is returned only when EOF is encountered
immediately.

Thanks. This was indeed my problem--not reading the manual closely
enough.

And the points about the iterator being re-instantiated were also
right on point. Interestingly, in this case, the code was working
because read() and readline() were still returning the next line each
time since the file handle was being read one line at a time.

Sean

Sep 11 '08 #6

MRAB

On Sep 11, 9:23*am, Sean Davis <seand...@gmail.comwrote:

On Sep 10, 7:54*pm, John Machin <sjmac...@lexicon.netwrote:

On Sep 11, 8:01*am, MRAB <goo...@mrabarnett.plus.comwrote:

On Sep 10, 6:59*pm, Sean Davis <seand...@gmail.comwrote:

I have a large file that I would like to transform and then feed toa
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
* * def __init__(self):
* * * * #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
* * * * self.fh = gzip.open("/tmp/gene_info.gz")
* * * * self.fh.readline() #deal with header line

* * def _read(self,n=1):
* * * * for line in self.fh:
* * * * * * if line=='':
* * * * * * * * break
* * * * * * line=line.strip()
* * * * * * line=re.sub("\t-","\t",line)
* * * * * * rowvals = line.split("\t")
* * * * * * yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

* * def readline(self,n=1):
* * * * return self._read().next()

* * def read(self,n=1):
* * * * return self._read().next()

Each time readline() and read() call self._read() they are creating a
new generator. They then get one value from the newly-created
generator and then discard that generator. What you should do is
create the generator in __init__ and then use it in readline() and
read().

* * def close(self):
* * * * self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT: *COPY gene_info, line 1000: ""

for a 1000 line test file. *Any ideas what is going on?

I wonder whether it's expecting readline() and read() to return an
empty string at the end of the file instead of raising StopIteration.

Don't wonder; ReadTheFantasticManual:

read( [size])

... An empty string is returned when EOF is encountered
immediately. ...

readline( [size])

*... An empty string is returned only when EOF is encountered
immediately.

Thanks. *This was indeed my problem--not reading the manual closely
enough.

And the points about the iterator being re-instantiated were also
right on point. *Interestingly, in this case, the code was working
because read() and readline() were still returning the next line each
time since the file handle was being read one line at a time.

After further thought, do you actually need a generator? read() and
readline() could just call _read(), which would read a line from the
file and return the result or an empty string. Or the processing could
be done in readline() and read() just could call readline().

Sep 11 '08 #7

by: Jochen Daum | last post by:

Hi, I have to emulate a "file upload" to a Java Servlet which is done with the class URLConnection. The java source basically does URLConnection conn =...

PHP

Emulating Python Inheritance Manually

by: Kamilche | last post by:

""" Emulating Python inheritance manually. By loading it from disk at run time, you can create new custom types without programmer intervention, and reload them on demand, without breaking...

Python

How to use streamreader to read ascii 8 charecters

by: G.Esmeijer | last post by:

Friends, Want to read a textfile with characters that go above 7F ( ascii > 128) and put the results line by line in a string to process it further I used a streamreader to do so but the...

C# / C Sharp

Read line in textfile

by: Tim Bücker | last post by:

Hello. Is there a way to read a specified line in a textfile? Something like TextReader.ReadTextLine(4); It seems very odd to use reader.ReadLine(); reader.ReadLine(); reader.ReadLine();...

C# / C Sharp

Emulating keyboard strokes in vb.net

by: Paulers | last post by:

Hello, I need to emulate keyboard strokes from a console application. The console application monitors a textfile and when something is matched in a text file I need the matched string outputted...

Visual Basic .NET

Read mail from the mail server (Pop3) using TCPClient

by: Prasanta | last post by:

Hello, Please cnay one can tell me how to read mail as formatted.... i have made some code using that able to read but not able to serialize..... so am i need to parse the HTML, or is there any...

C# / C Sharp

Would emulating private variables like this be wrong?

by: Ray | last post by:

Hello, What do you think about emulating private variables for a class this way? function Something() { var private; Something.prototype.getPrivate = function() { return private; }...

Javascript

Simple VB6 read and write converted to vb.net? How to Read CSV????

by: newsaboutgod | last post by:

I think VB.NET drives some people crazy because some simple VB6 things seem so hard. Here is some VB6 code: 'Write CSV File open "c:\test.csv" for output as #1 write#1, "1","2","3","4","5"...

Visual Basic .NET

read lines

by: Horacius ReX | last post by:

Hi, I have a text file like this; 1 -33.453579 2 -148.487125 3 -195.067172 4 -115.958374 5 -100.597841 6 -121.566441 7 -121.025381 8 -132.103507

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

emulating read and readline methods

Similar topics