473,769 Members | 6,120 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

different string representation (buffer gap)

Hi all --

I'm contemplating the idea of writing a simple emacs-like editor in
python (for fun and the experience of doing so). In reading through
Craig Finseth's "The Craft of Text Editing":

http://www.finseth.com/~fin/craft/

, I've come across the "buffer gap" representation for the text data
of the buffer. Very briefly, it keeps the unallocated memory of the
character array at the editing point, so that as long as there is
memory available, an insert/delete is very low (constant) cost. Of
course, moving the editing point means copying some character data so
that the gap moves with you, but...

Anyway, I'm wondering what straightforward ways to leverage /
implement this representation in Python. Ideally, it would be great
if one could use a BufferGap class in all the places you'ld use a
python string transparently, to use standard regular expressions, for
example. Glancing quickly at the regexmodule.c, and its use of
PyString_Whatev er, I'm not certain this is easy to do efficiently
(must one copy the buffer's contents into a Python-native string
before one can use something like a regular expression match on the
buffer's contents?).

Anyone have ideas / suggestions on how one would represent an editing
buffer in a way that would remain most (transparently) compatible with
the Python standard library string operations (and yet remain
efficient for editing)? If one is embedding the interpreter in the
editor (or writing the editor in pure python), and using python for
editor extensibility, it seems desireable to keep complexity down for
extension writers , and to allow them to think of the buffer as a
string.

Thanks...
Jul 18 '05 #1
6 2224
[snip]

My advice: find some pre-existing editor component and adapt it to suit
your needs. Writing an editor component, whether it be for a GUI or
console, is a pain in the ass.

If you are looking for a GUI editor component, stick with wxPython and
the wxStyledTextCtr l, it is a binding to the fabulous scintilla editing
component: http://scintilla.org

I have no advice for a console editor, but I would suggest you take a
look at curses or ncurses, whichever one is the open source version.

Once you have that thing embedded in your application, the rest should
basically write itself. How do I know? Because I wrote an editor with
it myself: http://pype.sourceforge.net
- Josiah
Jul 18 '05 #2
ma************* *@yahoo.com (manders2k) writes:
Anyway, I'm wondering what straightforward ways to leverage /
implement this representation in Python. Ideally, it would be great
if one could use a BufferGap class in all the places you'ld use a
python string transparently, to use standard regular expressions, for
example. Glancing quickly at the regexmodule.c, and its use of
PyString_Whatev er, I'm not certain this is easy to do efficiently
(must one copy the buffer's contents into a Python-native string
before one can use something like a regular expression match on the
buffer's contents?).


You can do some of that with the array module, but Python's regexp
library doesn't give any way to search backwards for a regexp, so that's
another problem you'll face trying to write an editor in Python.
Jul 18 '05 #3
Josiah Carlson <jc******@nospa m.uci.edu> wrote in message news:<bv******* ***@news.servic e.uci.edu>...
My advice: find some pre-existing editor component and adapt it to suit
your needs. Writing an editor component, whether it be for a GUI or
console, is a pain in the ass.


:-) It might be a pain in the ass, but it sounds like the most
edifying (and probably the most fun) part of the process to me.

Reading up on what's involved, getting a basic editor put together
sounds actually quite easy. A few tens of hours of work, maybe.
Making the editor feature-rich, extending it in a substantial way,
well doesn't sound hard just work-intensive.

Thanks for the pointers though...
Jul 18 '05 #4
Paul Rubin <http://ph****@NOSPAM.i nvalid> wrote in message news:<7x******* *****@ruckus.br ouhaha.com>...
You can do some of that with the array module, but Python's regexp
library doesn't give any way to search backwards for a regexp, so that's
another problem you'll face trying to write an editor in Python.


Yeah, I have a feeling that it might be easier to code up the buffer
in C/C++, and embed it in the interpreter. I'm not sure how much of a
performance bottleneck having this very low-level component written in
python will be on modern machines; probably not such a big deal.
Writing a buffer class and fiddling with pointers and whatnot actually
sounds easier to do in C++ than in emulating this style of thing in
Python (then again, I'm a heck of a lot more comfortable with C++ than
Python at this point, so that might not speak to the difficulty of the
task).

What I guess I wish were the case is that I could implement the
"string interface" on my BufferGap, so that everywhere that Python (at
the C API level) expects a string, a BufferGap could be used instead.
That way, all the libraries that inspect and operate on strings would
work transparently, without having to be recoded (copy / paste, end up
with a lot of mostly identical, redundant code) to operate on this
other string representation. Maybe this just isn't possible with the
current C-Python implementation. I suspect it would be with many
possible C++-Python implementations , but we don't have one of those
lying around so...

I'm pretty sure that for modest size buffers (even a megabyte of
text), copying the contents of the buffer into a python-string
representation before operating on it with python-libraries would be
transparently fast. It just seems...wastefu l, and potentially very
bad news if someone ever tried to do a regex search on a buffer that
occupied more than half of the physical memory of the machine or
somesuch.
Jul 18 '05 #5
manders2k:
I'm not sure how much of a
performance bottleneck having this very low-level component written in
python will be on modern machines; probably not such a big deal.
The performance bottleneck in split buffers is often the cost of copying
array ranges. I once wrote a patch for Python's array class to provide
copying within an array but the patch contents didn't make it to SourceForge
and I haven't had time to follow it up.

http://mail.python.org/pipermail/pat...il/012043.html
Writing a buffer class and fiddling with pointers and whatnot actually
sounds easier to do in C++ than in emulating this style of thing in
Python (then again, I'm a heck of a lot more comfortable with C++ than
Python at this point, so that might not speak to the difficulty of the
task).
Split buffers don't need to use pointers. I have written several split
buffer implementations including

* the implementation in Scintilla (scintilla/src/CellBuffer.[h,cxx])
http://cvs.sourceforge.net/viewcvs.p...lla/scintilla/

* a templated C++ implementation
http://mailman.lyra.org/pipermail/sc...ch/000903.html

* a generic implementation that is part of my SinkWorld project written in a
subset of C++ that can be automatically translated into Java or C#
http://cvs.sourceforge.net/viewcvs.p...lla/sinkworld/

Also in SinkWorld is a split buffer based data structure for partitioning
a document into segments such as lines called lv which is in lv.h. While the
line starts could be stored in a standard split buffer, inserting text would
then lead to adding to all following line start positions. To fix this,
there is also a 'step', with all positions after the step position adding
the step value to their values. The step is moved to the position where text
is being inserted or deleted but due to locality of modification, the move
is mostly short.
What I guess I wish were the case is that I could implement the
"string interface" on my BufferGap, so that everywhere that Python (at
the C API level) expects a string, a BufferGap could be used instead.
IIRC, at one stage there was explicit support in Python (perhaps in the
buffer class) for multiple segment buffers but it was never used so has
probably rotted.
That way, all the libraries that inspect and operate on strings would
work transparently, without having to be recoded (copy / paste, end up
with a lot of mostly identical, redundant code) to operate on this
other string representation.


I'd like to see this implemented and have been meaning to look into it
myself.

Neil
Jul 18 '05 #6

"manders2k" <ma************ **@yahoo.com> wrote in message

[snip]
What I guess I wish were the case is that I could implement the
"string interface" on my BufferGap, so that everywhere that Python (at
the C API level) expects a string, a BufferGap could be used instead.
That way, all the libraries that inspect and operate on strings would
work transparently, without having to be recoded (copy / paste, end up
with a lot of mostly identical, redundant code) to operate on this
other string representation. Maybe this just isn't possible with the
current C-Python implementation.


Have you looked at the mmap standard module?
http://www.python.org/doc/current/lib/module-mmap.html
"Memory-mapped file objects behave like both strings and like file objects.
Unlike normal string objects, however, these are mutable. You can use mmap
objects in most places where strings are expected; for example, you can use
the re module to search through a memory-mapped file"

Michael
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
13359
by: Marco Herrn | last post by:
Hi, I need to serialize an object into a string representation to store it into a database. So the SOAPFormatter seems to be the right formatter for this purpose. Now I have the problem that this formatter writes into a stream. And I am not used enough to C# to convert this to a string. I tried the following code: MemoryStream stream= new MemoryStream() ; IFormatter formatter = new SoapFormatter();
7
7603
by: Eric | last post by:
Hi All, I need to XOR two same-length Strings against each other. I'm assuming that, in order to do so, I'll need to convert each String to a BitArray. Thus, my question is this: is there an easy way to convert a String to a BitArray (and back again)? I explained the ultimate goal (XORing two Strings) so that, if anyone has a better idea of how to go about this they may (hopefully) bring that up...?
2
5290
by: Alpha | last post by:
Hi, I'm able to make connection to a server using socket connection. However, when I send a command string the server just ignores it. All command string needs to start with "0xF9" at Byte 0. During the run-time debug, I see it to be "u" with a "~" on top of it. Is that OxF9? Can someone tell me what I'm doing wrong? Thanks, Alpha private void Connect(String server, String message) { //Socket Connect
33
3679
by: Jordan Tiona | last post by:
How can I make one of these? I'm trying to get my program to store a string into a variable, but it only stores one line. -- "No eye has seen, no ear has heard, no mind can conceive what God has prepared for those who love him" 1 Cor 2:9
14
11961
by: ern | last post by:
Does a function exist to convert a 128-bit hex number to a string?
232
13343
by: robert maas, see http://tinyurl.com/uh3t | last post by:
I'm working on examples of programming in several languages, all (except PHP) running under CGI so that I can show both the source files and the actually running of the examples online. The first set of examples, after decoding the HTML FORM contents, merely verifies the text within a field to make sure it is a valid representation of an integer, without any junk thrown in, i.e. it must satisfy the regular expression: ^ *?+ *$ If the...
6
3178
by: Javier | last post by:
Hello people, I'm recoding a library that made a few months ago, and now that I'm reading what I wrote I have some questions. My program reads black and white images from a bitmap (BMP 24bpp without compression). It has it's words and dwords stored in little- endian, so I do a conversion to big-endian when reading full words or dwords. I have done this because my system is big-endian. But now... what if one compiles the library in a...
14
2171
by: Aman JIANG | last post by:
hi i need a fast way to do lots of conversion that between string and numerical value(integer, float, double...), and boost::lexical_cast is useless, because it runs for a long time, (about 60 times slower than corresponding C functions) it's too expensive for my program. is there any way( library?) to do this fast and safely, please ?
17
2156
by: let_the_best_man_win | last post by:
How do I print a pointer address into a string buffer and then read it back on a 64 bit machine ? Compulsion is to use string (char * buffer) only. printing with %d does not capture the full 64-bits of the pointer. does %l exist in both printf and scanf for this purpose ?
0
9424
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10223
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10051
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10000
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9866
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8879
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7413
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6675
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3968
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.