By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,267 Members | 1,800 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,267 IT Pros & Developers. It's quick & easy.

how to convert gpr file to csv format: using python

P: 4
Hi
I am a beguinner, I would like to known how to convert a file in gpr format to csv format by using python.
Baber
Jan 11 '07 #1
Share this Question
Share on Google+
16 Replies


bartonc
Expert 5K+
P: 6,596
Hi
I am a beguinner, I would like to known how to convert a file in gpr format to csv format by using python.
Baber
Very well. Let's move this to the Python forum. Welcome to TSDN.
Jan 11 '07 #2

bartonc
Expert 5K+
P: 6,596
Hi
I am a beguinner, I would like to known how to convert a file in gpr format to csv format by using python.
Baber
Welcome to the Python Forum on TheScipts.com.
I don't recognize gpr. Is it some other text format or from a program?
Jan 11 '07 #3

Expert 100+
P: 511
well, you should help use to help you, by providing an example of gpr format, and your expected output, in which case, you are requiring csv.
looking up the gpr extension, i can only find that it relates to some modeling software system...
Jan 12 '07 #4

bartonc
Expert 5K+
P: 6,596
well, you should help use to help you, by providing an example of gpr format, and your expected output, in which case, you are requiring csv.
looking up the gpr extension, i can only find that it relates to some modeling software system...
Hey ghostdog! Where you been so long?
I actually found the GenePix Results format, but don't know if this is the correct one:
Expand|Select|Wrap|Line Numbers
  1. GPR Header
  2. A sample GPR file header and a description of each entry are shown below: 
  3.  
  4. Entry Description 
  5.  
  6. ATF     1.0 File type and version number. 
  7. 29       48 Number of optional header records and
  8. number of data fields (columns). 
  9. "Type=GenePix Results 3" Type of ATF file. 
  10. "DateTime=2002/02/09 17:15:48" Date and time when the image was acquired. 
  11. "Settings=C:\Genepix\Genepix.gps" The name of the settings file that was used for analysis. 
  12. "GalFile=C:\Genepix\Demo.gal" The GenePix Array List file used to associate Names and IDs to each entry. 
  13. "PixelSize=10" Resolution of each pixel in m. 
  14. "Wavelengths=635     532" Installed laser excitation sources in nm. 
  15. "ImageFiles=C:\Genepix\demo.tif 0
  16. C:\Genepix\Genepix.tif 1" The name and path of the associated TIF file(s). 
  17. "NormalizationMethod=None" The type of normalization method used, if applicable. 
  18. "NormalizationFactors=1    1" The normalization factor applied to each channel. 
  19. "JpegImage=C:\Genepix\demo.jpg" The name and path of the associated Jpeg image files. 
  20. "StdDev=Type 1" The type of standard deviation calculation selected in the Options settings. 
  21. "RatioFormulation=W1/W2 (635/532)" The ratio formulation of the ratio image, showing which image is numerator and which is denominator. 
  22. "Barcode=00331" The barcode symbols read from the image. 
  23. "BackgroundSubtraction=LocalFeature" The background subtraction method selected in the Options settings. 
  24. "ImageOrigin=0, 0" The origin of the image relative to the scan area. 
  25. "JpegOrigin=390, 4320" The origin of the Results JPEG image (the bounding box of the analysis Blocks) relative to the scan area origin. 
  26. "Creator=GenePix 4.1.1.4" The version of the GenePix Pro software used to create the Results file. 
  27. "Scanner=GenePix 4000B [serial number]" Type and serial number of scanner used to acquire the image. 
  28. "FocusPosition=0" The focus position setting used to acquire the image, in microns. 
  29. "Temperature=19.6127" The temperature of the scanner, in degrees C. 
  30. "LinesAveraged=1" The line average setting used to acquire the image. 
  31. "Comment=hyb 2673" User-entered file comment. 
  32. "PMTGain=500     600" The PMT settings during acquisition. 
  33. "ScanPower=100    100" The amount of laser transmission during acquisition. 
  34. "LaserPower=1    1" The power of each laser, in volts. 
  35. "LaserOnTime=5    5" The laser on-time for each laser, in minutes. 
  36. "Filters=<Empty>    <Empty>" Emission filters used during acquisition (GenePix 4100 and 4200 only.) 
  37. "ScanRegion=100,100,2000,2000" The coordinate values of the scan region used during acquisition, in pixels. 
  38. "Supplier=" Header field supplied in GAL file. 
  39. Data record column headings Column titles for each measurement (see below). 
  40. Data Records Extracted data. 
  41.  
  42.  
  43.  
  44.  
  45.  
  46. GPR Data
  47. The list below describes each column of data in the Results file. 
  48.  
  49. Column Title Description 
  50.  
  51. Block the block number of the feature. 
  52. Column the column number of the feature. 
  53. Row the row number of the feature. 
  54. Name the name of the feature derived from the Array List (up to 40 characters long, contained in quotation marks). 
  55. ID the unique identifier of the feature derived from the Array List (up to 40 characters long, contained in quotation marks). 
  56. X the X-coordinate in m of the center of the feature-indicator associated with the feature, where (0,0) is the top left of the image. 
  57. Y the Y-coordinate in m of the center of the feature-indicator associated with the feature, where (0,0) is the top left of the image. 
  58. Dia. the diameter in m of the feature-indicator. 
  59. F635 Median median feature pixel intensity at wavelength #1 (635 nm). 
  60. F635 Mean mean feature pixel intensity at wavelength #1 (635 nm). 
  61. F635 SD the standard deviation of the feature pixel intensity at wavelength #1 (635 nm). 
  62. B635 Median the median feature background intensity at wavelength #1 (635 nm). 
  63. B635 Mean the mean feature background intensity at wavelength #1 (635 nm). 
  64. B635 SD the standard deviation of the feature background intensity at wavelength #1 (635 nm). 
  65. % > B635 + 1 SD the percentage of feature pixels with intensities more than one standard deviation above the background pixel intensity, at wavelength #1 (635 nm). 
  66. % > B635 + 2 SD the percentage of feature pixels with intensities more than two standard deviations above the background pixel intensity, at wavelength #1 (635 nm). 
  67. F635 % Sat. the percentage of feature pixels at wavelength #1 that are saturated. 
  68. F532 Median median feature pixel intensity at wavelength #2 (532 nm). 
  69. F532 Mean mean feature pixel intensity at wavelength #2 (532 nm). 
  70. F532 SD the standard deviation of the feature intensity at wavelength #2 (532 nm). 
  71. B532 Median the median feature background intensity at wavelength #2 (532 nm). 
  72. B532 Mean the mean feature background intensity at wavelength #2 (532 nm). 
  73. B532 SD the standard deviation of the feature background intensity at wavelength #2 (532 nm). 
  74. % > B532 + 1 SD the percentage of feature pixels with intensities more than one standard deviation above the background pixel intensity, at wavelength #2 (532 nm). 
  75. % > B532 + 2 SD the percentage of feature pixels with intensities more than two standard deviations above the background pixel intensity, at wavelength #2 (532 nm). 
  76. F532 % Sat. the percentage of feature pixels at wavelength #2 that are saturated. 
  77. Ratio of Medians the ratio of the median intensities of each feature for each wavelength, with the median background subtracted. 
  78. Ratio of Means the ratio of the arithmetic mean intensities of each feature for each wavelength, with the median background subtracted. 
  79. Median of Ratios the median of pixel-by-pixel ratios of pixel intensities, with the median background subtracted. 
  80. Mean of Ratios the geometric mean of the pixel-by-pixel ratios of pixel intensities, with the median background subtracted. 
  81. Ratios SD the geometric standard deviation of the pixel intensity ratios. 
  82. Rgn Ratio the regression ratio of every pixel in a 2-feature-diameter circle around the center of the feature. 
  83. Rgn R the coefficient of determination for the current regression value. 
  84. F Pixels the total number of feature pixels. 
  85. B Pixels the total number of background pixels. 
  86. Sum of Medians the sum of the median intensities for each wavelength, with the median background subtracted. 
  87. Sum of Means the sum of the arithmetic mean intensities for each wavelength, with the median background subtracted. 
  88. Log Ratio log (base 2) transform of the ratio of the medians. 
  89. Flags the type of flag associated with a feature. 
  90. Normalize the normalization status of the feature (included/not included). 
  91. F1 Median - B1 the median feature pixel intensity at wavelength #1 with the median background subtracted. 
  92. F2 Median - B2 the median feature pixel intensity at wavelength #2 with the median background subtracted. 
  93. F1 Mean - B1  the mean feature pixel intensity at wavelength #1 with the median background subtracted. 
  94. F2 Mean - B2 the mean feature pixel intensity at wavelength #2 with the median background subtracted. 
  95. SNR 1 the signal-to-noise ratio at wavelength #1, defined by (Mean Foreground 1- Mean Background 1) / (Standard deviation of Background 1) 
  96. F1 Total Intensity the sum of feature pixel intensities at wavelength #1 
  97. Index the number of the feature as it occurs on the array. 
  98. "User Defined" user-defined feature data read from the GAL file (GenePix Pro 4.1). 
Jan 12 '07 #5

Expert 100+
P: 511
hey barton
i've been lurking around :-)...
anyway, thanks for the gpr format. if its correct, then now its up to OP to specify his requirements. :)
Jan 12 '07 #6

P: 4
hey barton
i've been lurking around :-)...
anyway, thanks for the gpr format. if its correct, then now its up to OP to specify his requirements. :)
This example of gpr file is a good one.
gpr format (microarray data file) is like this:

Description
line 1
line 2
Line n

col1 col2 ..... coln
line1 val1 val2 valn
line2 etc etc
line3 etc

Now, I want know how to convert gpr to csv with python ?
Jan 16 '07 #7

bvdet
Expert Mod 2.5K+
P: 2,851
This example of gpr file is a good one.
gpr format (microarray data file) is like this:

Description
line 1
line 2
Line n

col1 col2 ..... coln
line1 val1 val2 valn
line2 etc etc
line3 etc

Now, I want know how to convert gpr to csv with python ?
If I understand this format correctly, it is a tab delimited file. The script below will replace each tab with a comma and output to another file:
Expand|Select|Wrap|Line Numbers
  1. import os
  2.  
  3. def tab_to_csv(tab_name, csv_name):
  4.     try:
  5.         f1 = open(tab_name, 'r')
  6.         f2 = open(csv_name, 'w')
  7.         outList = []
  8.         for line in f1:
  9.             outList.append(line.replace('\t', ','))
  10.         f1.close()
  11.         f2.writelines(outList)
  12.         f2.close()
  13.         return True
  14.     except:
  15.         return False
  16.  
  17. if __name__ == '__main__':
  18.  
  19.     def run_script():
  20.  
  21.         gpr_file = (os.path.join('H:\\', 'TEMP', 'temsys', 'GPR.gpr'))
  22.         csv_file = (os.path.join('H:\\', 'TEMP', 'temsys', 'GPR.txt'))
  23.         if tab_to_csv(gpr_file, csv_file):
  24.             print 'Tab delimited file conversion to comma delimited file was successful'
  25.         else:
  26.             print 'There was an error'
  27.     run_script()
Jan 16 '07 #8

bvdet
Expert Mod 2.5K+
P: 2,851
Here's some more information I found on the gpr format:
ATF - Axon Text File format (*.atf)

ATF is a tab-delimited text file format that can be read by typical spreadsheet programs such as Microsoft Excel. It is used for GenePix Array List (GAL) files, and GenePix Results (GPR) files.

An ATF text file consists of records. Each line in the text file is a record. Each record may consist of several fields, separated by a field separator (column delimiter). The tab and comma characters are field separators. Space characters around a tab or comma are ignored and considered part of the field separator. Text strings are enclosed in quotation marks to ensure that any embedded spaces, commas and tabs are not mistaken for field separators.

The group of records at the beginning of the file is called the file header. The file header describes the file structure and includes column titles, units, and comments.
It would be great if baber could provide us with a sample gpr file so we could test it.
Jan 16 '07 #9

dshimer
Expert 100+
P: 136
1) This looks like a very straightforward text file in which you could read in all the lines, create a list of each line, evaluate the list based on their contents the just write it back out delimited by commas.

That said, I'll admit I'm still a bit confused by the format. Does this imply that each line "line 1" etc, is comprised of a bunch of data organized in columns? Or that there are N lines containing something, then a string of n entries of "col" data, followed by further strings of value data? In any case I can think of several ways to easily read and analyze the data, I just am not totally clear on what is being described.

This example of gpr file is a good one.
gpr format (microarray data file) is like this:

Description
line 1
line 2
Line n

col1 col2 ..... coln
line1 val1 val2 valn
line2 etc etc
line3 etc

Now, I want know how to convert gpr to csv with python ?
Jan 16 '07 #10

bartonc
Expert 5K+
P: 6,596
This example of gpr file is a good one.
gpr format (microarray data file) is like this:

Description
line 1
line 2
Line n

col1 col2 ..... coln
line1 val1 val2 valn
line2 etc etc
line3 etc

Now, I want know how to convert gpr to csv with python ?
So this IS GenePix, right?
Jan 17 '07 #11

Expert 100+
P: 511
This example of gpr file is a good one.
gpr format (microarray data file) is like this:

Description
line 1
line 2
Line n

col1 col2 ..... coln
line1 val1 val2 valn
line2 etc etc
line3 etc

Now, I want know how to convert gpr to csv with python ?
i don't really know what is your desired output, but by specifying csv, i guessed you just want a comma separated. Here's a bit of code
Expand|Select|Wrap|Line Numbers
  1. import fileinput
  2. for line in fileinput.FileInput("file",inplace=1):
  3.    print ','.join(line.split())
  4. >>>
  5.  

output:
Expand|Select|Wrap|Line Numbers
  1. line,1
  2. line,2
  3. Line,n
  4.  
  5. col1,col2,.....,coln
  6. line1,val1,val2,valn
  7. line2,etc,etc
  8. line3,etc
  9.  
  10.  
Jan 17 '07 #12

bvdet
Expert Mod 2.5K+
P: 2,851
i don't really know what is your desired output, but by specifying csv, i guessed you just want a comma separated. Here's a bit of code
Expand|Select|Wrap|Line Numbers
  1. import fileinput
  2. for line in fileinput.FileInput("file",inplace=1):
  3.    print ','.join(line.split())
  4. >>>
  5.  

output:
Expand|Select|Wrap|Line Numbers
  1. line,1
  2. line,2
  3. Line,n
  4.  
  5. col1,col2,.....,coln
  6. line1,val1,val2,valn
  7. line2,etc,etc
  8. line3,etc
  9.  
  10.  
It works except as indicated below. Before:
Expand|Select|Wrap|Line Numbers
  1. ATF    1            
  2. 8    5            
  3. Type=GenePix ArrayList V1.0                
  4. BlockCount=4                
  5. BlockType=0                
  6. URL=http://genome-www.stanford.edu/cgi-bin/dbrun/SacchDB?find+Locus+%22[ID]%22                
  7. "Block1= 400, 400, 100, 24, 175, 5, 175"                
  8. "Block2= 4896, 400, 100, 24, 175, 5, 175"                
  9. "Block3= 400, 4896, 100, 24, 175, 5, 175"                
  10. "Block4= 4896, 4896, 100, 24, 175, 5, 175"                
  11. Block    Column    Row    Name    ID
  12. 1    1    1    VPS8    YAL002W
  13. 1    2    1    NTG1    YAL015C
After:
Expand|Select|Wrap|Line Numbers
  1. ATF,1
  2. 8,5
  3. Type=GenePix ArrayList V1.0
  4. BlockCount=4
  5. BlockType=0
  6. URL=http://genome-www.stanford.edu/cgi-bin/dbrun/SacchDB?find+Locus+%22[ID]%22
  7. "Block1= 400, 400, 100, 24, 175, 5, 175"
  8. "Block2= 4896, 400, 100, 24, 175, 5, 175"
  9. "Block3= 400, 4896, 100, 24, 175, 5, 175"
  10. "Block4= 4896, 4896, 100, 24, 175, 5, 175"
  11. Block,Column,Row,Name,ID
  12. 1,1,1,VPS8,YAL002W
  13. 1,2,1,NTG1,YAL015C
To prevent duplicate commas at embedded spaces, strip trailing tab and newline characters and split on tabs:
Expand|Select|Wrap|Line Numbers
  1. for line in fileinput.input(gpr_file, True, '.bak'):
  2.    print ','.join(line.rstrip('\t\n').split('\t'))
Good post ghostdog. I did not know about fileinput.
Jan 17 '07 #13

P: 4
Thanks a lot, now I can convert .gpr to .csv.

Baber
Jan 22 '07 #14

bartonc
Expert 5K+
P: 6,596
Thanks a lot, now I can convert .gpr to .csv.

Baber
Awesome! Thanks for the update.
Jan 23 '07 #15

P: 1
well, you should help use to help you, by providing an example of gpr format, and your expected output, in which case, you are requiring csv.
looking up the gpr extension, i can only find that it relates to some modeling software system...
hi friends
i want to know how to get .gpr file (microarray data files) and how to run the file in matlab....
plz help me as soon as possible.i need if for my project...........
Mar 20 '07 #16

bvdet
Expert Mod 2.5K+
P: 2,851
hi friends
i want to know how to get .gpr file (microarray data files) and how to run the file in matlab....
plz help me as soon as possible.i need if for my project...........
Hello vijayachitra,

I don't know how to get GPR files. You can probably find some sample files on the internet. You have not given us enough information about what data you need to parse from a GPR file. Since you have found this thread, you can see that information can easily be extracted, but what information and in what format? How about this from our example:
Expand|Select|Wrap|Line Numbers
  1. import re
  2. def readBlockData(fn):
  3.     dd = {}
  4.     fList = open(fn).readlines()
  5.     for line in fList:
  6.         line = line.strip('"\n\t')
  7.         if re.match('Block\d', line):
  8.             tem = line.split('=')
  9.             dd[tem[0]] = [int(i) for i in tem[1].strip().split(', ')]
  10.     return dd           
  11.  
  12. if __name__ == '__main__':
  13.  
  14.     dd = readBlockData('your_file))
  15.     for key in dd:
  16.         print '%s = %s' % (key, dd[key])
  17.  
  18. '''
  19. Block4 = [4896, 4896, 100, 24, 175, 5, 175]
  20. Block3 = [400, 4896, 100, 24, 175, 5, 175]
  21. Block2 = [4896, 400, 100, 24, 175, 5, 175]
  22. Block1 = [400, 400, 100, 24, 175, 5, 175]
  23. '''
Mar 21 '07 #17

Post your reply

Sign in to post your reply or Sign up for a free account.