On Jul 2, 5:05*pm, "Gillard" <gillard_geor...@hotmail.comwrote:
* 1 *get this to convert pdf2textftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl2-win32.zip
2 use this sub
Sub Pdf2Txt(ByVal options As String, ByVal pdfFile As String, ByVal txtFile
As String)
* * * * Dim arguments As String = options & " " & pdfFile & " "& txtFile
* * * * 'make sure to provide the path with the pdfFile and the txtFile
* * * * System.Diagnostics.Process.Start("pdftotext.exe", arguments)
* * End Sub
"SteveB" <stephen.b...@usbank.comwrote in message
news:a9**********************************@34g2000h sh.googlegroups.com...
I have posted this question in the Visual Basic 2005 and Visual
Basic .Net 2005 discussion groups, also.
Hi. *I am developing an application/web page with VB.Net that will
populate a SQL database from text extracted from PDF documents.
However, I am having a difficult time finding or developing the
appropriate code to convert the PDF streams into text strings. *Has
anyone developed code to convert PDF's to Text?
I was able write a Perl script that would call a PDF to text
conversion application, but, I am having difficulty writing a
similiar
shell command in VB. Any ideas?
Once I have the text strings, I can parse the data easily into the
SQL
database tables.- Hide quoted text -
- Show quoted text -
I tried your suggestion and this app works great from a command line.
However, when I try to call pdftotext as you sugeested, I keep getting
an exception this error:
System.ComponentModel.Win32Exception was unhandled by user code
ErrorCode=-2147467259
Message="The system cannot find the file specified"
Source="System"
StackTrace:
at
System.Diagnostics.Process.StartWithShellExecuteEx (ProcessStartInfo
startInfo)
at System.Diagnostics.Process.Start()
at System.Diagnostics.Process.Start(ProcessStartInfo startInfo)
at System.Diagnostics.Process.Start(String fileName)
at _Default.Pdf2Txt(String options, String pdffile, String
textfile) in D:\documents and settings\srbray\My Documents\Visual
Studio 2005\Websites\RegCC\FRB.aspx.vb:line 48
at _Default.Submit1_Click(Object sender, EventArgs e) in D:
\documents and settings\srbray\My Documents\Visual Studio 2005\Websites
\RegCC\FRB.aspx.vb:line 27
at System.Web.UI.WebControls.Button.OnClick(EventArgs e)
at System.Web.UI.WebControls.Button.RaisePostBackEven t(String
eventArgument)
at
System.Web.UI.WebControls.Button.System.Web.UI.IPo stBackEventHandler.RaisePostBackEvent(String
eventArgument)
at System.Web.UI.Page.RaisePostBackEvent(IPostBackEve ntHandler
sourceControl, String eventArgument)
at System.Web.UI.Page.RaisePostBackEvent(NameValueCol lection
postData)
at System.Web.UI.Page.ProcessRequestMain(Boolean
includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
This is my code:
Protected Sub Submit1_Click(ByVal sender As Object, ByVal e As
System.EventArgs) Handles Submit1.Click
Dim Path As String =
System.IO.Path.GetDirectoryName(File1.PostedFile.F ileName)
Dim FileName As String
Dim MyText() As String
Dim NewFileName As String
Dim DataPath As String = "D:\Documents and Settings\srbray\My
Documents\Visual Studio 2005\WebSites\RegCC\Data\"
Dim ArchivePath As String = "D:\Documents and Settings\srbray
\My Documents\Visual Studio 2005\WebSites\RegCC\Archive\"
Dim MMM As String = MonthName(Month(Now()), True)
Dim YYYY As String = Year(Now())
'Create new archive directory.
My.Computer.FileSystem.CreateDirectory(ArchivePath & YYYY &
"\" & MMM)
ArchivePath = ArchivePath & YYYY & "\" & MMM & "\"
System.IO.Directory.SetCurrentDirectory(DataPath)
If Not File1.PostedFile Is Nothing And
File1.PostedFile.ContentLength 0 Then
For Each oneFile As String In
My.Computer.FileSystem.GetFiles(Path,
FileIO.SearchOption.SearchTopLevelOnly, "*.pdf")
FileName = System.IO.Path.GetFileName(oneFile)
MyText = Split(FileName, ".")
NewFileName = MyText(0) & ".txt"
movepdffile(oneFile, DataPath & FileName)
Pdf2Txt("-layout", DataPath & FileName, DataPath &
NewFileName)
Next oneFile
Else
MsgBox("Please select the file(s) to upload.")
End If
'Insert code here to:
'Convert .pdf documents into .txt documents with
additional code to
'import data into the Float Reg CC database.
'Move .pdf files from working directory to archive
directory and delete .txt files.
'My.Computer.FileSystem.MoveFile(DataPath & FileName,
ArchivePath & FileName, True)
'My.Computer.FileSystem.DeleteFile(DataPath &
NewFileName)
End Sub
Sub Pdf2Txt(ByVal options As String, ByVal pdffile As String,
ByVal textfile As String)
Dim exe As String = "D:\xpdf-win32\pdftotext.exe"
Dim cmd As String = ("'" & exe & "' " & options & " '" &
pdffile & "' '" & textfile & "'")
MsgBox(cmd)
System.Diagnostics.Process.Start(cmd)
End Sub
Sub movepdffile(ByVal origin As String, ByVal destination As
String)
Try
My.Computer.FileSystem.MoveFile(origin, destination,
false)
Catch Exc As Exception
MsgBox("Error: " & Exc.Message)
End Try
MsgBox("Move is successful.")
End Sub
I believe I can make this work, but I am missing something minor....