473,757 Members | 5,404 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Replacing a string inside of a PDF

I am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode.

Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf

I can do this in notepad and it works fine, but when I start getting in
to reading the files I think it has some encoding problem. I tried
saving the file with every encoding option. When I open a PDF in the
text editor I normally use it says it is ANSI with Mac style carriage
returns. Winmerge will not let me compare the files because it says
they are binary.

Anyone know what I have to do?

Jul 20 '06 #1
14 1800
Please explain, are you trying to read the file using a binary string and
then using a binary string you try to write another file

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
>I am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode.

Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf

I can do this in notepad and it works fine, but when I start getting in
to reading the files I think it has some encoding problem. I tried
saving the file with every encoding option. When I open a PDF in the
text editor I normally use it says it is ANSI with Mac style carriage
returns. Winmerge will not let me compare the files because it says
they are binary.

Anyone know what I have to do?

Jul 20 '06 #2
Samuel,

I have tried it several ways. The end goal is just to end up with an
edited PDF. If I have to overwrite the original file that is fine.

Samuel Shulman wrote:
Please explain, are you trying to read the file using a binary string and
then using a binary string you try to write another file

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
I am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode.

Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf

I can do this in notepad and it works fine, but when I start getting in
to reading the files I think it has some encoding problem. I tried
saving the file with every encoding option. When I open a PDF in the
text editor I normally use it says it is ANSI with Mac style carriage
returns. Winmerge will not let me compare the files because it says
they are binary.

Anyone know what I have to do?
Jul 21 '06 #3
I'm assuming that I should somehow be using a binaryreader and a
binarywriter, I just don't know how to work with the data inside as
strings and then put it back in to the writer.

Josh Baltzell wrote:
Samuel,

I have tried it several ways. The end goal is just to end up with an
edited PDF. If I have to overwrite the original file that is fine.

Samuel Shulman wrote:
Please explain, are you trying to read the file using a binary string and
then using a binary string you try to write another file

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
>I am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode.
>
Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf
>
I can do this in notepad and it works fine, but when I start getting in
to reading the files I think it has some encoding problem. I tried
saving the file with every encoding option. When I open a PDF in the
text editor I normally use it says it is ANSI with Mac style carriage
returns. Winmerge will not let me compare the files because it says
they are binary.
>
Anyone know what I have to do?
>
Jul 21 '06 #4
I think that the key to your question is how to actually read the file (I
should have realized before that this is the main issue),

Did you manage to read parts of the file only if you can do that you can
replace the text

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** *************@m 79g2000cwm.goog legroups.com...
I'm assuming that I should somehow be using a binaryreader and a
binarywriter, I just don't know how to work with the data inside as
strings and then put it back in to the writer.

Josh Baltzell wrote:
>Samuel,

I have tried it several ways. The end goal is just to end up with an
edited PDF. If I have to overwrite the original file that is fine.

Samuel Shulman wrote:
Please explain, are you trying to read the file using a binary string
and
then using a binary string you try to write another file

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
I am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode.

Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf

I can do this in notepad and it works fine, but when I start getting
in
to reading the files I think it has some encoding problem. I tried
saving the file with every encoding option. When I open a PDF in the
text editor I normally use it says it is ANSI with Mac style carriage
returns. Winmerge will not let me compare the files because it says
they are binary.

Anyone know what I have to do?

Jul 21 '06 #5
may this link will be useful
http://groups.google.com/group/micro...ace184fa716b5a

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** *************@m 79g2000cwm.goog legroups.com...
I'm assuming that I should somehow be using a binaryreader and a
binarywriter, I just don't know how to work with the data inside as
strings and then put it back in to the writer.

Josh Baltzell wrote:
>Samuel,

I have tried it several ways. The end goal is just to end up with an
edited PDF. If I have to overwrite the original file that is fine.

Samuel Shulman wrote:
Please explain, are you trying to read the file using a binary string
and
then using a binary string you try to write another file

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
I am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode.

Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf

I can do this in notepad and it works fine, but when I start getting
in
to reading the files I think it has some encoding problem. I tried
saving the file with every encoding option. When I open a PDF in the
text editor I normally use it says it is ANSI with Mac style carriage
returns. Winmerge will not let me compare the files because it says
they are binary.

Anyone know what I have to do?

Jul 21 '06 #6
I have written the code to at least read the internals of the file as a
string or a stream and then I can find the chunk I want to replace easy
enough, but I think it loses some special characters, or maybe screws
up the line endings (PDF files have mac style CR only instead of CR LF
like a lot of windows based files have I believe.)

So I guess my problem is actually reading and writing. I can write
code that looks like I am reading it with a streamreader, but I think I
am really losing data. I can write code that reads it as binary, but
then I have trouble working with the contents. After all that is
worked out I have to figure out how to write the edited file back to
disk (I believe the binary writer will do that, but I have not tested
much.)

I'm not sure what else I can tell you, This is just a matter of me not
fully understanding how I am supposed to read and edit a file like this
as opposed to the other formats that I have worked with that were all
plain text.

Thanks a lot for the feedback. I looked at the other post you linked
to and read the linked page. I think that would be useful to me if the
PDFs were compressed, but I can open these in Notepad and find my
string right now (and that works when I do the edit that way.)

Samuel Shulman wrote:
I think that the key to your question is how to actually read the file (I
should have realized before that this is the main issue),

Did you manage to read parts of the file only if you can do that you can
replace the text

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** *************@m 79g2000cwm.goog legroups.com...
I'm assuming that I should somehow be using a binaryreader and a
binarywriter, I just don't know how to work with the data inside as
strings and then put it back in to the writer.

Josh Baltzell wrote:
Samuel,

I have tried it several ways. The end goal is just to end up with an
edited PDF. If I have to overwrite the original file that is fine.

Samuel Shulman wrote:
Please explain, are you trying to read the file using a binary string
and
then using a binary string you try to write another file

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
>I am having a lot more trouble with this than I thought I would. Here
is what I want to do in pseudocode.
>
Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf
>
I can do this in notepad and it works fine, but when I start getting
in
to reading the files I think it has some encoding problem. I tried
saving the file with every encoding option. When I open a PDF in the
text editor I normally use it says it is ANSI with Mac style carriage
returns. Winmerge will not let me compare the files because it says
they are binary.
>
Anyone know what I have to do?
>
Jul 21 '06 #7
You may be able to create identical string to the one that you want to
replace then send it to a binary stream (it doesn't have to be a file) then
look for such a binary sequence within the main binary stream (binary
buffer) that holds the pdf file and replace it with another binary stream
created from the string you wanted to use for the replacement
You still have the problem of the funny characters which you can imitate by
adding CR instead of the CRLF (or what is the normal)

And finally, once the code will work please send it over it seems
interesting to me (if it is OK with you/your company)

Regards,
Samuel

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** *************@i 3g2000cwc.googl egroups.com...
>I have written the code to at least read the internals of the file as a
string or a stream and then I can find the chunk I want to replace easy
enough, but I think it loses some special characters, or maybe screws
up the line endings (PDF files have mac style CR only instead of CR LF
like a lot of windows based files have I believe.)

So I guess my problem is actually reading and writing. I can write
code that looks like I am reading it with a streamreader, but I think I
am really losing data. I can write code that reads it as binary, but
then I have trouble working with the contents. After all that is
worked out I have to figure out how to write the edited file back to
disk (I believe the binary writer will do that, but I have not tested
much.)

I'm not sure what else I can tell you, This is just a matter of me not
fully understanding how I am supposed to read and edit a file like this
as opposed to the other formats that I have worked with that were all
plain text.

Thanks a lot for the feedback. I looked at the other post you linked
to and read the linked page. I think that would be useful to me if the
PDFs were compressed, but I can open these in Notepad and find my
string right now (and that works when I do the edit that way.)

Samuel Shulman wrote:
>I think that the key to your question is how to actually read the file (I
should have realized before that this is the main issue),

Did you manage to read parts of the file only if you can do that you can
replace the text

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******* **************@ m79g2000cwm.goo glegroups.com.. .
I'm assuming that I should somehow be using a binaryreader and a
binarywriter, I just don't know how to work with the data inside as
strings and then put it back in to the writer.

Josh Baltzell wrote:
Samuel,

I have tried it several ways. The end goal is just to end up with an
edited PDF. If I have to overwrite the original file that is fine.

Samuel Shulman wrote:
Please explain, are you trying to read the file using a binary
string
and
then using a binary string you try to write another file

"Josh Baltzell" <jo**********@g mail.comwrote in message
news:11******** **************@ h48g2000cwc.goo glegroups.com.. .
I am having a lot more trouble with this than I thought I would.
Here
is what I want to do in pseudocode.

Open c:\some.pdf
Replace "Replace this" with "Replaced!"
Save c:\some_edited. pdf

I can do this in notepad and it works fine, but when I start
getting
in
to reading the files I think it has some encoding problem. I
tried
saving the file with every encoding option. When I open a PDF in
the
text editor I normally use it says it is ANSI with Mac style
carriage
returns. Winmerge will not let me compare the files because it
says
they are binary.

Anyone know what I have to do?


Jul 21 '06 #8
I'm not sure I know how to do what you are saying, but here is a test I
made to write the file using a string converted in to a bytearray.
This is not working.

::::::::::::::: ::::::::::::::: ::::::::::::::: ::::::
Public Function ByteTest()
Dim PDFFile As String
Dim PDFFolder As IO.Directory

Response.Write( "Start Byte:" & DateTime.Now.To LongTimeString &
":" & Now.Millisecond & "<br>")

For Each PDFFile In PDFFolder.GetFi les(Server.MapP ath("PDF"))
'Open the file
Dim FileStream As IO.StreamReader
FileStream = IO.File.OpenTex t(PDFFile)

'Load the file in to a string
Dim Contents As String = FileStream.Read ToEnd

'Replace text in string
Contents = Contents.Replac e("ABC123456789 0",
"ABC1111111111" )

'Close stream
FileStream.Clos e()

'Create byte based output file
Dim OutputFileName As String = Server.MapPath( "PDFOutput\ "
& DateTime.Now.To FileTimeUtc.ToS tring & "BYTE.pdf")
Dim fs As FileStream = File.Create(Out putFileName)
fs.Close()

'Convert the string to bytes
Dim info As Byte() = New
System.Text.UTF 8Encoding(True) .GetBytes(Conte nts)

'Write string as bytes to output file
fs = File.OpenWrite( OutputFileName)
fs.Write(info, 0, info.Length)
fs.Close()

Next

Response.Write( "Stop Byte:" & DateTime.Now.To LongTimeString &
":" & Now.Millisecond & "<br>")

End Function
::::::::::::::: ::::::::::::::: ::::::::::::::: ::::::

Jul 21 '06 #9
Here is another test I wrote that sucessfully generates a bunch of
useless files encoded in different ways.

::::::::::::::: ::::::::::::::: ::::::::::::::: ::::::::::::::: :
Public Function StringTest()
Dim PDFFile As String
Dim PDFFolder As IO.Directory

Response.Write( "Start String:" & DateTime.Now.To LongTimeString
& ":" & Now.Millisecond & "<br>")

For Each PDFFile In PDFFolder.GetFi les(Server.MapP ath("PDF"))
'Open the file
Dim FileStream As IO.StreamReader
FileStream = IO.File.OpenTex t(PDFFile)

'Load the file in to a string
Dim Contents As String = FileStream.Read ToEnd

'Replace text in string
Contents = Contents.Replac e("ABC123456789 0",
"ABC1111111111" )

'Close stream
FileStream.Clos e()

'Create ASCII output file
Dim OutputFileName As String = Server.MapPath( "PDFOutput\ "
& DateTime.Now.To FileTimeUtc.ToS tring & "STRING-ASCII.pdf")
Dim fs As FileStream = File.Create(Out putFileName)
Dim PDFStream As StreamWriter = New StreamWriter(fs ,
System.Text.Enc oding.ASCII)
PDFStream.Write (Contents)
PDFStream.Close ()
fs.Close()

'Create BigEndianUnicod e output file
OutputFileName = Server.MapPath( "PDFOutput\ " &
DateTime.Now.To FileTimeUtc.ToS tring & "STRING-BigEndianUnicod e.pdf")
fs = File.Create(Out putFileName)
PDFStream = New StreamWriter(fs ,
System.Text.Enc oding.BigEndian Unicode)
PDFStream.Write (Contents)
PDFStream.Close ()
fs.Close()

'Create default formatted output file
OutputFileName = Server.MapPath( "PDFOutput\ " &
DateTime.Now.To FileTimeUtc.ToS tring & "STRING-Default.pdf")
fs = File.Create(Out putFileName)
PDFStream = New StreamWriter(fs ,
System.Text.Enc oding.Default)
PDFStream.Write (Contents)
PDFStream.Close ()
fs.Close()

'Create Unicode output file
OutputFileName = Server.MapPath( "PDFOutput\ " &
DateTime.Now.To FileTimeUtc.ToS tring & "STRING-Unicode.pdf")
fs = File.Create(Out putFileName)
PDFStream = New StreamWriter(fs ,
System.Text.Enc oding.Unicode)
PDFStream.Write (Contents)
PDFStream.Close ()
fs.Close()

'Create UTF7 output file
OutputFileName = Server.MapPath( "PDFOutput\ " &
DateTime.Now.To FileTimeUtc.ToS tring & "STRING-UTF7.pdf")
fs = File.Create(Out putFileName)
PDFStream = New StreamWriter(fs , System.Text.Enc oding.UTF7)
PDFStream.Write (Contents)
PDFStream.Close ()
fs.Close()

'Create UTF8 output file
OutputFileName = Server.MapPath( "PDFOutput\ " &
DateTime.Now.To FileTimeUtc.ToS tring & "STRING-UTF8.pdf")
fs = File.Create(Out putFileName)
PDFStream = New StreamWriter(fs , System.Text.Enc oding.UTF8)
PDFStream.Write (Contents)
PDFStream.Close ()
fs.Close()

Next

Response.Write( "Stop String:" & DateTime.Now.To LongTimeString &
":" & Now.Millisecond & "<br>")

End Function
::::::::::::::: ::::::::::::::: ::::::::::::::: ::::::::::::::: :

Jul 21 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2253
by: dornick | last post by:
So I want to do the above, and I really, REALLY don't want to rewrite the entire file. I've been working on it for a while now, and can't for the life of me get it functioning. Basically, I want to replace the last text character of a certain line. So far all I've done has centered around trying to put the "put" pointer write before the character to write (ban pun, I know). But when I tried to use put(), nothing happened and a call to...
3
1807
by: Son of Sam | last post by:
Hi, I just want to open a file, let it replace 1 string (which occurs a few times) and write it to the same file, it must be binary. I tryed following code but it doesnt work as I thought: #include <iostream> #include <fstream> #include <string> #include <tchar.h> #include <stdio.h>
5
9122
by: Tim Quon | last post by:
Hi I have a pointer to char and need to replace a string inside with another string. Something like that: char* str_replace(const char* oldString, const char* toBeReplaced, const char* replaceWith); How can I code the function if the string is of dynamic length?
2
3412
by: Christopher Beltran | last post by:
I am currently trying to replace certain strings, not single characters, with other strings inside a word document which is then sent to a browser as a binary file. Right now, I read in the word file, convert the FileStream into a string using Unicode encoding, then do a replace, then convert the string back to a byte using Unicode encoding which i then Response.WriteBinary(bytes) to the browser. This works fine although the actual...
5
6011
by: D | last post by:
hi there , i want to do something fairly simple (well it was simple in PERL) using the replace function of Regex... but i cannot find the docs to help me on it... i want to use a regex to find a string: ^HOST=(.+);*$ and then replace group 1 (inside the .+) with another string... say the variable strReplace.
2
2885
by: Tim_Mac | last post by:
hi, i have a tricky problem and my regex expertise has reached its limit. i have read other posts on this newsgroup that pull out the plain text from a html string, but that won't work for me because i want to preserve the html, and replace some of the plain text. i basically want to show the user's search terms highlighted in the page, like google does, but i want to do this server side (i have the mechanics of intercepting the html...
1
4506
by: patelgaurav85 | last post by:
Hi, I want to convert xml in one format into another xml format shown below Input xml : <Name> <Name1> <Name11>Name11</Name11> <Name12>Name12</Name12>
10
15090
blazedaces
by: blazedaces | last post by:
Alright guys, so the title explains exactly my goal. The truth is I'm going to be reading in a lot of data from an xml file. The file is too large and there's too much data to store in arraylists without running out of memory, so I'm reading and as I'm reading I'm going to write to a file. This is the thing though, I already can do this and have it done, but I want to modify the program so you can choose what data you want to take out. To...
1
1644
by: Matt Herzog | last post by:
Hey All. I'm learning some python with the seemingly simple task of updating a firewall config file with the new IP address when my dhcpd server hands one out. Yeah, I know it's dangerous to edit such a file "in place" but this is just an exercise at this point. I would not mind using file handles except they seem to add complexity. The only apparent problem I have with my script so far is that it's adding lots of blank lines to the file...
0
9487
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9297
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9904
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9735
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8736
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5324
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3828
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3395
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2697
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.