By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,187 Members | 1,070 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,187 IT Pros & Developers. It's quick & easy.

Extracting Rich Text data formats from win32clipboard

P: n/a
Hi,

I'm trying to use Mark Hammond's win32clipboard module to extract more
complex data than just plain ASCII text from the Windows clipboard.
For instance, when you select all the content on web page, you can
paste it into an app like Frontpage, or something Rich Text-aware, and
it will preserve all the formatting, HTML, etc. I'd like to include
that behavior in the application I'm writing.

In the interactive session below, before I run the clipboard_grab()
function, I've selected all of the www.google.com homepage in IE and
hit Control-C. The function cycles through all the formats stored on
the clipboard and loads up a data list with each type it finds.

Here's where it gets interesting: while data[2] is the textual data
that I would expect to see if I pasted the clipboard in a Notepad
file, data[0] and data[1] are in a weird, non-ASCII (binary?) format.
Are these pointers to (or metadata for) the actual HTML or rich text?
How do I use this data? Is there a reference I can use that will help
me decipher this information? Any help would be greatly appreciated.

Thanks!

----

Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
import win32clipboard

def clipboard_grab(): .... global format, formats, data
.... win32clipboard.OpenClipboard()
.... format = 1
.... formats = []
.... data = []
.... while 1:
.... format = win32clipboard.EnumClipboardFormats(format)
.... print "FORMAT:", format
.... if not format:
.... break
.... try:
.... datum = win32clipboard.GetClipboardData(format)
.... formats.append(format)
.... data.append(datum)
.... except:
.... print format, traceback.format_exception(sys.exc_type,
sys.exc_value, sys.exc_traceback)
.... win32clipboard.EmptyClipboard()
.... win32clipboard.CloseClipboard()
....

clipboard_grab() FORMAT: 49171
FORMAT: 16
FORMAT: 7
FORMAT: 0 len(data) 3 data[0] '\x00\x00\x00\x00\x18\x01\x00\x00\x01\x00\x00\x00\ x06\x00\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\xe3\xc0\xc2w\x00\x00\x00\x00\x01 \x00\x00\x00\xff\xff\xff\xff\x
01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\xa2\xc0\xe9\x02\x
00\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff\x01 \x00\x00\x00\x01\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00\x00K\xc1\xc2w\x00\x00\x 00\x00\x01\x00\x00\x00\xff\xff
\xff\xff\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x 00\x00\x00\x00\x00\x00L\xc1\xc
2w\x00\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff \x01\x00\x00\x00\x01\x00\x00\x
00\x00\x00\x00\x00\x00\x00\x00\x00\r\x00\xc2w\x00\ x00\x00\x00\x01\x00\x00\x00\xf
f\xff\xff\xff\x01\x00\x00\x00\x01\x00\x00\x00\x00\ x00\x00\x00\x00\x00\x00\x00\x0
1\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\ xff\xff\xff\x01\x00\x00\x00\x0
1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x00\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x00\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x00\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x00\x00' data[1] '\t\x04\x00\x00' data[2] '\r\n \tWeb\t \tImages\t \tGroups\t \tDirectory\t \tNews\t \r\n\r\n
\t\r\n\t \x0
7 Advanced Search\r\n \x07 Preferences\r\n \x07 Language
Tools\r\n\r\n\r\nAdvert
ise with Us - Business Solutions - Services & Tools - Jobs, Press, &
Help\r\n\r\
nc2003 Google - Searching 3,307,998,701 web pages'

Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
> >>> clipboard_grab()
FORMAT: 49171
FORMAT: 16
FORMAT: 7


7 = CF_OEMTEXT
16 = CF_LOCALE
49171 = 0xC013 = apparently OLE private data

That should help you with some searches. Basically the CF_OEMTEXT is the
only one that's going to be useful for you, unless you can figure out what
to do with the OLE private data.

-Mike
Jul 18 '05 #2

P: n/a
Thanks for your help, Neil! Your example code gave me an idea what I
should be seeing when the HTML/RTF stuff is working properly. I'd
been using a non-IE browser (Firebird) for testing, and it wasn't
giving me those results. Thanks for getting me on track! Trader

"Neil Hodgson" <nh******@bigpond.net.au> wrote in message news:<rl*******************@news-server.bigpond.net.au>...
Trader:
>> clipboard_grab()

FORMAT: 49171
FORMAT: 16
FORMAT: 7
FORMAT: 0


Now add in:

for f in formats:
if f >= 0xC000:
print win32clipboard.GetClipboardFormatName(f)

Formats above 0xC000 are dynamically registered clipboard types. I get:

FORMAT: 13
FORMAT: 49278
FORMAT: 49245
FORMAT: 49171
FORMAT: 16
FORMAT: 7
FORMAT: 0

HTML Format
Rich Text Format
Ole Private Data

The HTML has a prologue and then some HTML:

Version:1.0
StartHTML:000000195
EndHTML:000001891
StartFragment:000001597
EndFragment:000001710
StartSelection:000001597
EndSelection:000001710
SourceURL:http://sydney.citysearch.com.au/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD><TITLE>CitySearch.com.au Australia - Your guide to the city of
Sydney</TITLE>
...

The RTF looks normal:

{\rtf1\ansi\ansicpg-1\deff0\deflang3081{\fonttbl{\f0\froman\fcharset0 Times
New Roman;}{\f1\ftech\fcharset0 Symbol;}{\f2\fswiss\fcharset0
Arial;}{\f3\fswiss\fcharset0 Courier New;}{\f4\ftech\fcharset0
Wingdings;}}{\colortbl\red0\green0\blue0;\red0\gre en0\blue255;\red0\green255
\blue255;\red0\green255\blue0;\red255\green0\blue2 55;\red255\green0\blue0;\r
ed255\green255\blue0;\red255\green255\blue255;\
...

Neil

Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.