473,385 Members | 1,610 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

parsing tab-delimited text file into arrays

1
Hi, I'm new to Python and have a task of reading a user input text file that is tab-delimited and contains 4 columns in each line: Authors, Year, Title and Journal.

I currently am just able to open a file, and now I don't know how to begin parsing the data.

The recommended way of sorting the data is to use the following three lists (which I set as):
Expand|Select|Wrap|Line Numbers
  1. authorsList = []
  2. journalsList = []
  3. papersList = []
In the papersList, each paper's entry is its title, year published, the index of each author(s) and the index of the journal; in this way the name of each journal and author is only stored in one place.

What I learned to do in Python: basic I/O, loops and conditions, defining functions and little exception handling. I've been going through google but a lot of answers to the same question I have, have been using the csv module and regular expressions, which I tried to learn myself but couldn't understand the code that was suggested. Is there a way to do it without the csv and re module?

I was thinking of doing something like this:
Expand|Select|Wrap|Line Numbers
  1. for line in openfile:
  2.    a, b, c, d = line.split("\t")
  3.    authorsList.append(a)
  4.    papersList.append(b, c)
  5.    journalsList.append(d)
but dont think that is right at all.
Any suggestions or tips?
Thanks for your time and consideration.
May 2 '10 #1
2 2734
erbrose
58
this is what i've been doing... im pretty new to python too though...
Expand|Select|Wrap|Line Numbers
  1. TmpArr = []
  2.  
  3. for line in openfile:
  4.     #strips line
  5.     line = line.strip()
  6.     TmpArr.append(line.split('\t'))
now you have a multidimensional list (TmpArr)... you can sort by columns by doing something like this

Expand|Select|Wrap|Line Numbers
  1. TmpArr.sort(key=lambda a:(a[0]))
if say you wanted to sort by authorlist
May 3 '10 #2
Glenton
391 Expert 256MB
We'll assume that the authors are in a nice consistent format.

Then something like the following (untested) code should work.

Expand|Select|Wrap|Line Numbers
  1. authorsList = []
  2. journalsList = []
  3. papersList = []
  4. for line in openfile:
  5.     authors, yea, tit, jou = line.split("\t")
  6.     authInd=[]  #This is the index we will add to the papers list.
  7.     #suppose authors is a list of authors.  
  8.     #Then we need to go through each one
  9.     for a in authors:
  10.         #first check if it's in authorsList
  11.         if a not in authorsList:
  12.             #if a is not in authorsList, then add it
  13.             authorsList.append(a)
  14.         #add the index to the authInd
  15.         authInd.append(authorsList.index(a))
  16.     papersList.append(authInd)
  17.  
Etc. Since you need similar stuff for the journal, you might want to write a function that does the necessary work, and then pass both the author and journal stuff to the function.

Good luck.
May 4 '10 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: greg | last post by:
heres my text file(well the first two lines of it) ######################################### # Default BMUS server CFG file # this file is written with the...
1
by: Scott | last post by:
I am new to perl, and have not found any good examples of parsing to help me out. I have a text file that I am reading into an array that has to be parsed out and put into another file. I have not...
5
by: Amos Soma | last post by:
Has anyone written a utility (preferrably in C#) that takes a text file or string and extracts any SQL statements (e.g., Select, Insert, Update, Delete, Alter etc.) contained in the file or...
7
by: Alan | last post by:
Hi. I have programmed in C++ before, but I`m a couple of years out of practice. I am seeking some advice on getting started on a quickie project. . . . I have to read a 54MB text file and do a...
8
by: pradeepss | last post by:
Guys, I have a text file which is comma delimited and information. Each information is ended by end of line and started again with comma delimited i.e. 1,2,3,a,4 2,s,4,5,6,7,8,h...
2
by: studentng | last post by:
Hi evry one, i need someone to help me a vb code that can access a notpad text file, this file contain some generated pin numbers (e.g 6167735623465659ps) in a row, this program will access this...
7
by: ebmt2006 | last post by:
I have this data inside a file saved as “filename.txt” with bunch of other data what I want to do is I want to sort the following data and have an output as shown below These lines are inside my...
0
by: pankaj982 | last post by:
I want my code to parse a text file and find the word "error" in it and trace back a few lines to '<< Loan-' to get the loan number and dump the loan loannumber in a table in sqlserver2000. Here...
22
by: JJ | last post by:
Whats the best way for me to pull out records from a tab delimited text file? Or rather HOW do I parse the text, knowing that the tabs are field delimiters and a return (I image) signifies a new...
13
by: jhamb | last post by:
Hi, This code is in Perl (just a trial, not tested) to parse a text file and output to another file. It is used to delete lines that are not required and output lines that the user wants, to a new...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.