473,388 Members | 956 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

Formatting Results so that They Can be Nicely Imported into a Spreadsheet.

Hi All.
Let's say I have some badly formatted text called doc:
doc=
"""
friendid
Female

23 years old

Los Gatos

United States
friendid
Male

24 years old

San Francisco, California

United States
"""

How would I get these results to be displayed in a format similar to:
friendid;Female;23 years old;Los Gatos;United States
friendid;Male; 24 years old;San Francisco, California;United States
The latter is a lot easier to organize and can be quickly imported
into Excel's column format.

Thanks Much,
Sam

Aug 4 '07 #1
8 1251
On Aug 5, 9:35 am, SMERSH009 <SMERSH0...@gmail.comwrote:
Hi All.
Let's say I have some badly formatted text called doc:

doc=
"""
friendid
Female

23 years old

Los Gatos

United States
friendid
Male

24 years old

San Francisco, California

United States
"""

How would I get these results to be displayed in a format similar to:
friendid;Female;23 years old;Los Gatos;United States
friendid;Male; 24 years old;San Francisco, California;United States

The latter is a lot easier to organize and can be quickly imported
into Excel's column format.
You write a script to read your input file and write it out either in
CSV format, using the Python csv module, or just use
';'.join(output_row) if you are sure that there are no ';' characters
in the data.

Is this a homework question?

Aug 5 '07 #2
On Aug 4, 6:35?pm, SMERSH009 <SMERSH0...@gmail.comwrote:
Hi All.
Let's say I have some badly formatted text called doc:

doc=
"""
friendid
Female

23 years old

Los Gatos

United States
friendid
Male

24 years old

San Francisco, California

United States
"""

How would I get these results to be displayed in a format similar to:
friendid;Female;23 years old;Los Gatos;United States
friendid;Male; 24 years old;San Francisco, California;United States

The latter is a lot easier to organize and can be quickly imported
into Excel's column format.

Thanks Much,
Sam
d = doc.split('\n')

f = [i.split() for i in d if i]

g = [' '.join(i) for i in f]

rec = []
temprec = []
for i in g:
if i:
if i == 'friendid':
rec.append(temprec)
temprec = [i]
else:
temprec.append(i)
rec.append(temprec)

output = [';'.join(i) for i in rec if i]

for i in output: print i

## friendid;Female;23 years old;Los Gatos;United States
## friendid;Male;24 years old;San Francisco, California;United
States

Aug 5 '07 #3
<me********@aol.comwrote in message
news:11*********************@z24g2000prh.googlegro ups.com...
On Aug 4, 6:35?pm, SMERSH009 <SMERSH0...@gmail.comwrote:
>Hi All.
Let's say I have some badly formatted text called doc:

doc=
"""
friendid
Female

23 years old

Los Gatos

United States
friendid
Male

24 years old

San Francisco, California

United States
"""

How would I get these results to be displayed in a format similar to:
friendid;Female;23 years old;Los Gatos;United States
friendid;Male; 24 years old;San Francisco, California;United States

The latter is a lot easier to organize and can be quickly imported
into Excel's column format.

Thanks Much,
Sam

d = doc.split('\n')

f = [i.split() for i in d if i]

g = [' '.join(i) for i in f]

rec = []
temprec = []
for i in g:
if i:
if i == 'friendid':
rec.append(temprec)
temprec = [i]
else:
temprec.append(i)
rec.append(temprec)

output = [';'.join(i) for i in rec if i]

for i in output: print i

## friendid;Female;23 years old;Los Gatos;United States
## friendid;Male;24 years old;San Francisco, California;United States
also, I would suggest you use CSV format. CSV stands for "Comma Seperated
Variable" and Excel can load such a sheet directly.

Instead of seperating using ; seperate using , Of course, this provides a
problem when there is a , in a string. Resolution is to quote the string.
Being such, you can just go ahead and quote all strings. So you would want
the output to be:

"friendid","Female","23 years old","Los Gatos","United States"
"friendid","Male","24 years old","San Francisco, California","United States"

Numbers should not be quoted if you wish to treat them as numeric and not
text.
Aug 5 '07 #4
On Aug 4, 9:21?pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
<mensana...@aol.comwrote in message

news:11*********************@z24g2000prh.googlegro ups.com...


On Aug 4, 6:35?pm, SMERSH009 <SMERSH0...@gmail.comwrote:
Hi All.
Let's say I have some badly formatted text called doc:
doc=
"""
friendid
Female
23 years old
Los Gatos
United States
friendid
Male
24 years old
San Francisco, California
United States
"""
How would I get these results to be displayed in a format similar to:
friendid;Female;23 years old;Los Gatos;United States
friendid;Male; 24 years old;San Francisco, California;United States
The latter is a lot easier to organize and can be quickly imported
into Excel's column format.
Thanks Much,
Sam
d = doc.split('\n')
f = [i.split() for i in d if i]
g = [' '.join(i) for i in f]
rec = []
temprec = []
for i in g:
if i:
if i == 'friendid':
rec.append(temprec)
temprec = [i]
else:
temprec.append(i)
rec.append(temprec)
output = [';'.join(i) for i in rec if i]
for i in output: print i
## friendid;Female;23 years old;Los Gatos;United States
## friendid;Male;24 years old;San Francisco, California;United States

also, I would suggest you use CSV format.
Well, the OP asked for a specific format. One is not
always at liberty to change it.
CSV stands for "Comma Seperated
Variable" and Excel can load such a sheet directly.
And Excel can load the shown format directly also,
just specify the delimiter.
>
Instead of seperating using ; seperate using , Of course, this provides a
problem when there is a , in a string.
Which explains the popularity of using tabs as delimiters.
The data deliverable specification I use at work
uses the pipe character | which never appears as data
in this particular application.
Resolution is to quote the string.
Which makes the file bigger and isn't necessary
when tabs and pipes are used as delimiters.
Being such, you can just go ahead and quote all strings. So you would want
the output to be:

"friendid","Female","23 years old","Los Gatos","United States"
"friendid","Male","24 years old","San Francisco, California","United States"
Which I would do if I had a specification that
demanded it or was making files for others. For my
own use, I wouldn't bother as it's unnecessary work.
>
Numbers should not be quoted if you wish to treat them as numeric and not
text.
A good reason not to use quotes at all. Besides which,
Excel can handle that also.

Aug 5 '07 #5
On Aug 4, 8:25 pm, "mensana...@aol.com" <mensana...@aol.comwrote:
On Aug 4, 9:21?pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
<mensana...@aol.comwrote in message
news:11*********************@z24g2000prh.googlegro ups.com...
On Aug 4, 6:35?pm, SMERSH009 <SMERSH0...@gmail.comwrote:
>Hi All.
>Let's say I have some badly formatted text called doc:
>doc=
>"""
>friendid
>Female
> 23 years old
> Los Gatos
> United States
>friendid
>Male
> 24 years old
> San Francisco, California
> United States
>"""
>How would I get these results to be displayed in a format similar to:
>friendid;Female;23 years old;Los Gatos;United States
>friendid;Male; 24 years old;San Francisco, California;United States
>The latter is a lot easier to organize and can be quickly imported
>into Excel's column format.
>Thanks Much,
>Sam
d = doc.split('\n')
f = [i.split() for i in d if i]
g = [' '.join(i) for i in f]
rec = []
temprec = []
for i in g:
if i:
if i == 'friendid':
rec.append(temprec)
temprec = [i]
else:
temprec.append(i)
rec.append(temprec)
output = [';'.join(i) for i in rec if i]
for i in output: print i
## friendid;Female;23 years old;Los Gatos;United States
## friendid;Male;24 years old;San Francisco, California;United States
also, I would suggest you use CSV format.

Well, the OP asked for a specific format. One is not
always at liberty to change it.
CSV stands for "Comma Seperated
Variable" and Excel can load such a sheet directly.

And Excel can load the shown format directly also,
just specify the delimiter.
Instead of seperating using ; seperate using , Of course, this provides a
problem when there is a , in a string.

Which explains the popularity of using tabs as delimiters.
The data deliverable specification I use at work
uses the pipe character | which never appears as data
in this particular application.
Resolution is to quote the string.

Which makes the file bigger and isn't necessary
when tabs and pipes are used as delimiters.
Being such, you can just go ahead and quote all strings. So you would want
the output to be:
"friendid","Female","23 years old","Los Gatos","United States"
"friendid","Male","24 years old","San Francisco, California","United States"

Which I would do if I had a specification that
demanded it or was making files for others. For my
own use, I wouldn't bother as it's unnecessary work.
Numbers should not be quoted if you wish to treat them as numeric and not
text.

A good reason not to use quotes at all. Besides which,
Excel can handle that also.
Thanks for all your posts guys.
mensanator's was the most helpful, and I only ended up needing to use
a few lines from that code.
The only question that remains for me--and this is just for my
knowledge-- what does the "if i" mean in this code snippet?
f = [i.split() for i in d if i]
How is it helpful to leave a dangling "if i"? Why not just f =
[i.split() for i in d]?

And yes John, this was indeed a "homework question." It was for my
daughter's preschool. You are going to help her ace her beginner
Python class! (No, this was not a homework question).

Aug 5 '07 #6
On Aug 5, 4:06?am, SMERSH009 <SMERSH0...@gmail.comwrote:
On Aug 4, 8:25 pm, "mensana...@aol.com" <mensana...@aol.comwrote:


On Aug 4, 9:21?pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
<mensana...@aol.comwrote in message
>news:11*********************@z24g2000prh.googlegr oups.com...
On Aug 4, 6:35?pm, SMERSH009 <SMERSH0...@gmail.comwrote:
Hi All.
Let's say I have some badly formatted text called doc:
doc=
"""
friendid
Female
23 years old
Los Gatos
United States
friendid
Male
24 years old
San Francisco, California
United States
"""
How would I get these results to be displayed in a format similar to:
friendid;Female;23 years old;Los Gatos;United States
friendid;Male; 24 years old;San Francisco, California;United States
The latter is a lot easier to organize and can be quickly imported
into Excel's column format.
Thanks Much,
Sam
d = doc.split('\n')
f = [i.split() for i in d if i]
g = [' '.join(i) for i in f]
rec = []
temprec = []
for i in g:
if i:
if i == 'friendid':
rec.append(temprec)
temprec = [i]
else:
temprec.append(i)
rec.append(temprec)
output = [';'.join(i) for i in rec if i]
for i in output: print i
## friendid;Female;23 years old;Los Gatos;United States
## friendid;Male;24 years old;San Francisco, California;United States
also, I would suggest you use CSV format.
Well, the OP asked for a specific format. One is not
always at liberty to change it.
CSV stands for "Comma Seperated
Variable" and Excel can load such a sheet directly.
And Excel can load the shown format directly also,
just specify the delimiter.
Instead of seperating using ; seperate using , Of course, this provides a
problem when there is a , in a string.
Which explains the popularity of using tabs as delimiters.
The data deliverable specification I use at work
uses the pipe character | which never appears as data
in this particular application.
Resolution is to quote the string.
Which makes the file bigger and isn't necessary
when tabs and pipes are used as delimiters.
Being such, you can just go ahead and quote all strings. So you would want
the output to be:
"friendid","Female","23 years old","Los Gatos","United States"
"friendid","Male","24 years old","San Francisco, California","United States"
Which I would do if I had a specification that
demanded it or was making files for others. For my
own use, I wouldn't bother as it's unnecessary work.
Numbers should not be quoted if you wish to treat them as numeric and not
text.
A good reason not to use quotes at all. Besides which,
Excel can handle that also.

Thanks for all your posts guys.
mensanator's was the most helpful, and I only ended up needing to use
a few lines from that code.
The only question that remains for me--and this is just for my
knowledge-- what does the "if i" mean in this code snippet?
f = [i.split() for i in d if i]
d is a list of strings, some of which are empty (caused
by the previous split). "if i" evaluates False for
an empty string, so that particular i will not get
split and ends up ommitted from f.
How is it helpful to leave a dangling "if i"? Why not just f =
[i.split() for i in d]?
Compare the result of having "if i":
[[], ['friendid'], ['Female'], ['23', 'years', 'old'], ['Los',
'Gatos'], ['United', 'States'], ['friendid'], ['Male'], ['24',
'years', 'old'], ['San', 'Francisco,', 'California'], ['United',
'States']]

to not having it:
>>ff = [i.split() for i in d]
ff
[[], ['friendid'], ['Female'], [], [], ['23', 'years', 'old'], [], [],
['Los', 'Gatos'], [], [], ['United', 'States'], ['friendid'],
['Male'], [], [], ['24', 'years', 'old'], [], [], ['San',
'Francisco,', 'California'], [], [], ['United', 'States'], []]

All the blank lines became empty lists which actually
wouldn't change anything if they get joined. But that
won't always be the case. Knowing how to use conditionals
in list comprehensions is worth knowing.

Even with the "if i" included, we end up with an
empty list at the start. This because the first "blank"
line wasn't blank, it was a space, so it passes the
"if i" test.
>
And yes John, this was indeed a "homework question." It was for my
daughter's preschool. You are going to help her ace her beginner
Python class! (No, this was not a homework question).
Aug 5 '07 #7
me********@aol.com <me********@aol.comwrote:
...
Even with the "if i" included, we end up with an
empty list at the start. This because the first "blank"
line wasn't blank, it was a space, so it passes the
"if i" test.
....and you can fix that by changing the test to [... if i.split()].
Alex
Aug 5 '07 #8
En Sun, 05 Aug 2007 06:06:54 -0300, SMERSH009 <SM********@gmail.com>
escribió:
The only question that remains for me--and this is just for my
knowledge-- what does the "if i" mean in this code snippet?
f = [i.split() for i in d if i]
How is it helpful to leave a dangling "if i"? Why not just f =
[i.split() for i in d]?
`if i` means `if i is considered true`. In this case we are talking about
strings: "" is false and all other strings are true. So this is a way to
say `if i is not the empty string`, effectively filtering out empty lines.
Perhaps using more meaningful names for variables makes the code more
clear:

exploded_lines = [line.split() for line in lines if line]

--
Gabriel Genellina

Aug 12 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: scooterm | last post by:
### Question Can anyone name a text editor or XML editor (prefer open source) that can open a simple table XML file in a spreadsheet-type GUI, without adding any extra formatting or junk to the...
1
by: Josh Schmidt | last post by:
I'm using the Excel application object to create a spreadsheet from a recordset. Everything seems to work fine except any date fields are not displaying properly. For example 4/18/2004 is...
2
by: Colleyville Alan | last post by:
I am using Access and have embedded the ActiveX control Formula One that came with Office 2000. (ver 3.04). I have created and formatted a spreadsheet and now I want to copy the info with...
0
by: Stewart Allen | last post by:
Hi there Opinions on the best way to do this. I want to import data from an Excel spreadsheet into Access and want the data to be tested before the tables are updated. At the moment all the...
0
by: ImraneA | last post by:
Hi there Have a database, where front-end interface allows user to select a ms access database. From there, standard tables are linked. Routine, that creates a spreadsheet, for each table a...
1
by: Will | last post by:
1. I have a list of products in an excel spreadsheet, the fields are: Prefix, Stock Code, Suffix and Description. I want to import them into Access and then merge the 3 fields Prefix, Stock Code,...
7
by: Middletree | last post by:
I am trying to display dates in a spreadsheet, but the dates need to be in a format that will allow them to be sorted in Excel. The datatype in the SQL Server database is datetime. In this case, I...
6
by: DeniseY | last post by:
I have an Access report that is created on the fly by the user selecting the fields to be included. The Access report comes out fine, but I want it to automatically output to an Excel spreadsheet....
8
by: victor.herasme | last post by:
Hi, i am building a little script and i want to output a series of columns more or less like this: 1 5 6 2 2 8 2 9 5 The matter is that i don't know in advance how many columns...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.