473,396 Members | 2,016 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

repeated delimiters with cvs.DictReader

Hi All,
Just a quick question. I have some text files that are space-delimited. Some of the columns in this file have been padded with zeros so that when you open the text file in a simple text editor, the columns all line up and it is easy to read. I am using csv,DictReader to read in the files, so that it automatically generates a dictionary based off the first row. The problem is that when I read this file in with csv.DictReader, I end up with a lot of blank columns.

here is my read-in line:
isc = csv.DictReader(open(inisc), delimiter=' ')

Is there a way to specify that if python encounters one or more spaces in a row, that they should be treated as 1 delimiter rather than multiple ones?

I tried:
isc = csv.DictReader(open(inisc), delimiter=' '+)
and some similar variations (though I do not remember them all now) to no avail.

If I alter the input text files to remove the padding and have only one space between each column, this works great. I can stick with this method, it is just that for readability of the text files (which can be quite big), the padding is nice. These files are used for other purposes than just input to my code, so if there is a way to keep the padding and make csv.DictReader happy, that would be best.

Thanks,
Monica
Mar 21 '10 #1
4 4939
bvdet
2,851 Expert Mod 2GB
Monica,

Can you post a sample of the text file? You could parse the file without the csv module if the file is consistently formatted.

BV
Mar 21 '10 #2
Hi BV,
Attached is a small sample of one of the text files (I tried copying and pasting it below, but it actually messed up the format). The problem is that some of the columns contain position data (lat, long). In some cases, the lat and long are negative, and in others they are positive. This means that not only does the number of units in each entry change as position changes, but also some entries have a '-' in front of them. These entries are padded so they remain aligned. Unfortunately, this means the format is not consistent through out the file.

You will also see that the month, day, hour, min, sec, columns are not consistent. I have padded these with zeros, so they now follow the more standard yyyy, mm, dd, hh, mm, ss.ss format, so these columns are no longer a problem. Padding the position field with zeros as well solves my delimiter problem while keeping the fields aligned, however, it looks horrible and makes the positions hard to read.
Attached Files
File Type: txt ex_text.txt (1.9 KB, 518 views)
Mar 21 '10 #3
bvdet
2,851 Expert Mod 2GB
This code will create a dictionary with line 1 as the keys and the columnar data as the values:
Expand|Select|Wrap|Line Numbers
  1. fn = 'ex_text-2.txt'
  2. f = open(fn)
  3.  
  4. labels = [item for item in f.readline().strip().split() if item]
  5.  
  6. dd = {}
  7.  
  8. for line in f:
  9.     # strip new line character
  10.     line = line.strip()
  11.     # in case a blank line is encountered
  12.     if line:
  13.         lineList = [item for item in line.split() if item]
  14.         for i, item in enumerate(lineList):
  15.             dd.setdefault(labels[i], []).append(item)
  16.  
  17. f.close()
To print the results:
Expand|Select|Wrap|Line Numbers
  1. for label in labels:
  2.     print "%s: %s" % (label, dd[label])
As you can see, it is very straightforward to read a formatted file into a list or dictionary. I used a list comprehension to eliminate the empty items.
Mar 21 '10 #4
Thanks bvdet!
This does seem to read in everything from the text files directly, regardless of the white-space padding.

Now I just need to figure out how to integrate it into the rest of my code so that the data is written to the SQL database correctly. I am a newbie to Python, so it sometimes takes me a while to figure out how I need to proceed. If I can though, I want to figure out how to tweak my code myself to use your solution, since that is how i will learn the most.
Mar 23 '10 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: Markus Elfring | last post by:
Hello, I try to use alternative delimiters for a regular expression. When will it be supported? www@mike:/home/www > /usr/local/bin/php -a Interactive mode enabled <?php...
3
by: Justin L. Kennedy | last post by:
I am looking for a function that takes in a string and splits it using a list of other strings (delimiters) and can return the delimiters as well as the extra parts of the string. I was trying the...
2
by: Bill Moran | last post by:
I'm having some problems using \copy I have a directory full of test data that I want to be installed automatically when "make database" is issued. While the Makefile rules would seem simple,...
10
by: Jeff Blaine | last post by:
It's been a year or so since I written Python code, so maybe I am just doing something really dumb, but... Documentation ============= class DictReader(csvfile]]]]) Create an object...
6
m6s
by: m6s | last post by:
1. After hours of researching, I used these snippets : void Object::TokenizeLines(const string& str, vector<string>& tokens, const string& delimiters) // Skip delimiters at beginning....
4
by: bearophileHUGS | last post by:
This is the best praise of semantic indentation I have read so far, by Chris Okasaki: http://okasaki.blogspot.com/2008/02/in-praise-of-mandatory-indentation-for.html A quotation: I have...
1
by: zeny | last post by:
Hey folks! Can anyone tell me how to add data to a table using "copy" with several delimiters? Is it possible? What i mean is: Copy <table_name> from <file_directory> using delimiters...
5
by: gpaps87 | last post by:
hi, i wanted to know whether we can use strtok command to mark delimiters as tokens as well.In Java,we have a command: StringTokennizer(String str, String delimiters, boolean delimAsToken) ...
4
by: brnstrmrs | last post by:
I am trying to use the dictionary reader to import the data from a csv file and create a dictnary from it but just can't seem to figure it out. Here is my code: my csv files looks like this:...
4
by: Marco Trapanese | last post by:
Hi, I'm trying to parse strings on an Atmel AVR device. I use the WinAVR C Compiler (GCC) The strings to parse are like this: command -par0 -par1 -parn I use strok_r function:
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.