473,406 Members | 2,698 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Parsing problems: A journey from a text file to a directory tree

Hi everybody,

Some of my colleagues want me to write a script for easy folder and
subfolder creation on the Mac.

The script is supposed to scan a text file containing directory trees
in the following format:

[New client]
|-Invoices
|-Offers
|--Denied
|--Accepted
|-Delivery notes

As you can see, the folder hierarchy is expressed by the amounts of
minuses, each section header framed by brackets (like in Windows
config files).

After the scan process, the script is supposed to show a dialog, where
the user can choose from the different sections (e.g. 'Alphabet',
'Months', 'New client' etc.). Then the script will create the
corresponding folder hierarchy in the currently selected folder (done
via AppleScript).

But currently I simply don't know how to parse these folder lists and
how to save them in an array accordingly.

First I thought of an array like this:

dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
'Accpeted': {}}, 'Delivery notes': {}}}

But this doesn't do the trick, as I also have to save the hierarchy
level of the current folder as well...

Argh, I really don't get my head around this problem and I need your
help. I have the feeling, that the answer is not that complicated, but
I just don't get it right now...

Your desperate,

Martin

Sep 16 '07 #1
3 1295
Since you are going to need to do a dialog, I would use wxWindows tree
control. It already knows how to do what you describe. Then you can
just walk all the branches and create the folders.

-Larry

Martin M. wrote:
Hi everybody,

Some of my colleagues want me to write a script for easy folder and
subfolder creation on the Mac.

The script is supposed to scan a text file containing directory trees
in the following format:

[New client]
|-Invoices
|-Offers
|--Denied
|--Accepted
|-Delivery notes

As you can see, the folder hierarchy is expressed by the amounts of
minuses, each section header framed by brackets (like in Windows
config files).

After the scan process, the script is supposed to show a dialog, where
the user can choose from the different sections (e.g. 'Alphabet',
'Months', 'New client' etc.). Then the script will create the
corresponding folder hierarchy in the currently selected folder (done
via AppleScript).

But currently I simply don't know how to parse these folder lists and
how to save them in an array accordingly.

First I thought of an array like this:

dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
'Accpeted': {}}, 'Delivery notes': {}}}

But this doesn't do the trick, as I also have to save the hierarchy
level of the current folder as well...

Argh, I really don't get my head around this problem and I need your
help. I have the feeling, that the answer is not that complicated, but
I just don't get it right now...

Your desperate,

Martin
Sep 17 '07 #2
On Sep 19, 4:51 am, "Michael J. Fromberger"
<Michael.J.Fromber...@Clothing.Dartmouth.EDUwrot e:
.
. # This expression matches "header" lines, defining a new section.
. new_re = re.compile(r'\[([\w ]+)\]\s*$')
Directory names can contain more different characters than those which
match [\w ] ... and which ones depends on the OS; might as well just
allow anything, and leave it to the OS to complain. Also consider
using line.rstrip() (usually a handy precaution on ANY input text
file) instead of having \s*$ at the end of your regex.
.
. while new_level < len(state):
. state.pop()
Hmmm ... consider rewriting that as the slightly less obfuscatory

while len(state) new_level:
state.pop()

If you really want to make the reader slow down and think, try this:

del state[new_level:]

A warning message if there are too many "-" characters might be a good
idea:

[foo]
|-bar
|-zot
|---plugh
.
. state[-1][key] = {}
. state.append(state[-1][key])
.
And if the input line matches neither regex?
. return out

To call this, pass a file-like object to parse_folders(), e.g.:

test1 = '''
[New client].
Won't work with the dot on the end.
Michael J. Fromberger | Lecturer, Dept. of Computer Science

Sep 18 '07 #3
Hi, John,

Your comments below are all reasonable. However, I would like to point
out that the purpose of my example was to provide a demonstration of an
algorithm, not an industrial-grade solution to every aspect of the
original poster's problem. I am confident the original poster can deal
with these aspects of his problem space on his own.

In article <11**********************@q3g2000prf.googlegroups. com>,
John Machin <sj******@lexicon.netwrote:
[...]
. while new_level < len(state):
. state.pop()

Hmmm ... consider rewriting that as the slightly less obfuscatory

while len(state) new_level:
state.pop()
This seems to me to be an aesthetic consideration only; I'm not sure I
understand your rationale for reversing the sense of the comparison.
Since it does not change the functionality, it's hardly worthy of
complaint, but I don't see any improvement, either.
A warning message if there are too many "-" characters might be a good
idea:

[foo]
|-bar
|-zot
|---plugh
Perhaps so. Again, the original poster will have to decide what should
be the correct response to input of this sort; at present, the
implementation is tolerant of such variations, without loss of
generality.
And if the input line matches neither regex?
I believe it should be clear that such lines are ignored. Again, this
is an opportunity for the original poster to determine an alternative
response -- perhaps an exception could be raised, if that is his desire.
The problem specification did not constrain this case.
To call this, pass a file-like object to parse_folders(), e.g.:

test1 = '''
[New client].

Won't work with the dot on the end.
My mistake. The period was a copy-and-paste artifact, which I missed.

Cheers,
-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
Sep 19 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

303
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b....
16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
3
by: Eric Lilja | last post by:
Hello, I'm creating a small utility for an online game. It involves parsing a text file of "tradesskill recipes" and inserting these recipes in a gui tree widget (similar to gui file browsers if...
3
by: Greg Sabino Mullane | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 A bad link in the ftp source directory inspired me to check the rest of it out: * Main README file needs updating. The URL is given as: ...
6
by: bfowlkes | last post by:
Hello, I am trying to parse two pre-formatted text files and write them to a different files formatted in a different way. The story about this is I was hired along with about 20 other people...
1
by: Thomas Kowalski | last post by:
Hi, I have to parse a plain, ascii text file (on local HD). Since the file might be many millions lines long I want to improve the efficiency of my parsing process. The resulting data structure...
2
by: nicky123 | last post by:
Hi everyone, This is a brief description that I have provided for parsing & displaying an XML document using DOM API. Please feel free to post your own comments & views regarding...
7
by: Benjamin | last post by:
I'm trying to parse an HTML file. I want to retrieve all of the text inside a certain tag that I find with XPath. The DOM seems to make this available with the innerHTML element, but I haven't...
2
by: =?ISO-8859-1?Q?Andr=E9?= | last post by:
Hi everyone, I would like to implement a parser for a mini-language and would appreciate some pointers. The type of text I would like to parse is an extension of: ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.