473,387 Members | 3,684 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Conversion of pdf codings to xml codings

Hi,

I am new to Asp.net with C#.Can anyone tell me the steps in creating tool which converts PDF codings to XML codings automatically.Please help me and give me the best solutions.
Dec 27 '10 #1
7 2625
Rabbit
12,516 Expert Mod 8TB
The first step is to learn the PDF specification. It can be found on the adobe website. Once you learn that then we can work on the implementation.
Dec 27 '10 #2
ok sir.,thank you.please keep suggesting me
Dec 27 '10 #3
hi Sir.i studied pdf specification..what is the next step?.please guide me to my destiny.
Jan 1 '11 #4
Rabbit
12,516 Expert Mod 8TB
Once you know the PDF format, you can start writing your code. You'll want to create a filestream to read the PDF file and then a different filestream to write out the XML. After the file is created, you loop through the PDF file and use your new knowledge of the specification to interpret it into the format you want.

This is just the overview, if you get stuck on a particular part, post your code, tell us what it should do, what it's doing wrong and we will take a look at the code.
Jan 1 '11 #5
I tried to read the contents of pdf file,but its throwing invalid exception at "ContentScanner.TextWrapper text = (ContentScanner.TextWrapper)level.CurrentWrapper;" below...


Expand|Select|Wrap|Line Numbers
  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Text;
  5. using it.stefanochizzolini.clown.documents;
  6. using it.stefanochizzolini.clown.files;
  7. using it.stefanochizzolini.clown.documents.contents;
  8. using it.stefanochizzolini.clown.documents.contents.objects;
  9. using it.stefanochizzolini.clown.tools;
  10. using it.stefanochizzolini.clown.documents.contents.composition;
  11. using it.stefanochizzolini.clown.documents.contents.fonts;
  12. namespace ConsoleApplication1
  13. {
  14.     class Program
  15.     {
  16.         static void Main(string[] args)
  17.         {
  18.             string filePath = @"C:\Documents and Settings\XML\Desktop\Copyright.pdf";
  19.  
  20.             File file;
  21.             Document document;
  22.             try
  23.             {
  24.                 // Open the PDF file!
  25.                 file = new File(filePath);
  26.  
  27.                 // Get the PDF document!
  28.                 document = file.Document;
  29.  
  30.             }
  31.             catch
  32.             {
  33.                 Console.WriteLine("Sorry, Some Errors in File");
  34.                 for (; ; )
  35.                 {
  36.                     if (Console.ReadLine() == "")
  37.                         break;
  38.                 }
  39.                 return;
  40.             }
  41.  
  42.             //Page stamper is used to draw contents on existing pages.
  43.             PageStamper stamper = new PageStamper();
  44.  
  45.  
  46.             foreach (Page page in document.Pages)
  47.             {
  48.                 Console.WriteLine("\nScanning page " + (page.Index + 1) + "...\n");
  49.  
  50.                 stamper.Page = page;
  51.  
  52.                 // Wraps the page contents into a scanner.
  53.                 Extract(new ContentScanner(page), stamper.Foreground);
  54.  
  55.                 stamper.Flush();
  56.             }
  57.  
  58.             for (; ; )
  59.             {
  60.                 if (Console.ReadLine() == "")
  61.                     break;
  62.             }
  63.         }
  64.  
  65.  
  66.         private static void Extract(ContentScanner level, PrimitiveFilter builder)
  67.         {
  68.             if (level == null)
  69.                 return;
  70.  
  71.             while (level.MoveNext())
  72.             {
  73.                 ContentObject content = level.Current;
  74.                 if (content is Text)
  75.                 {
  76.                     ContentScanner.TextWrapper text = (ContentScanner.TextWrapper)level.CurrentWrapper;
  77.                     //ContentScanner.GraphicsState test = level.getState();
  78.                     foreach (ContentScanner.TextStringWrapper textString in text.TextStrings)
  79.                     {
  80.                         Console.WriteLine("Text [font size: " + textString.Style.FontSize + " ], [font Name: " +
  81.                             textString.Style.Font.Name + " ]: " + textString.Text);
  82.                     }
  83.                 }
  84.                 else if (content is ShowText)
  85.                 {
  86.                     Font font = level.State.Font;
  87.                     Console.WriteLine(font.Decode(((ShowText)content).Text));
  88.                 }
  89.                 else if (content is ContainerObject)
  90.                 {
  91.                     // Scan the inner level!
  92.                     Extract(level.ChildLevel, builder);
  93.                 }
  94.  
  95.             }
  96.  
  97.         }
  98.  
  99.     }
  100. }
Jan 6 '11 #6
Rabbit
12,516 Expert Mod 8TB
Please use code tags. What's the rest of the error text? It looks like you're using some custom library. It would be hard for someone to figure out what's going on with the Library unless they've used it before.
Jan 6 '11 #7
1. Extract the text and replace special characters with Unicode entities and wrap the content with style information (Use stack data structure to store font information of chunk of strings of paragraphs).
2. List all styles used.
3. List Tags using DTD or Schema.
4. Map styles to Tags or open any saved template of mapped styles-tags.
5. Validate the mapping process using DTD or Schema.
6. Save the mapped styles to tag as template.
7. Convert the Content wrapped with style information into Content wrapped with tags according to mapped styles-tags.

Here are the modules i wrote.,i did the 2nd module...now can you help me to do the 3rd module?..how to list the tags?
Jan 18 '11 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: NsonHo | last post by:
May i know the coding for Insert, Update and Delete of MS SQL? Any good websites to search for Codings?
3
by: NsonHo | last post by:
Hi.. can i know what is the .Net Framework coding for Insert, Update, Delete for MS SQL? Any websites to find codings? Please reply asap.. thx...
0
by: wajja | last post by:
I Need to know how to assign passwords to access tables.please show me some SQL sample codings.and what are SQL Access Controls
6
by: sreekandan | last post by:
Hi Now im doing one project to maint the login and logout time of the organization. So I need the VB codings to get the Current time and Date with detail. So kindly reply me anyone.
18
by: sreekandan | last post by:
hi Now im doing one simple project.For that I want to insert the gender of the individuals into the database( i used two checklist one is forMale,another one is for Female).So i want the...
8
by: sreekandan | last post by:
Hi I want the VB6 codings to take the print out. So kindly reply me
1
by: sreekandan | last post by:
Hi everybody, I have used the following code to retrieve and display the details from the database table. rs3.Open "select * from tbl where empid = '" & Text1.Text & "' AND dat Between #" &...
13
by: PerumalSamy | last post by:
Hi, i am developing software in asp.net using vb coding and MS ACCESS database. I created a crystal report for printing sales bill. Now i need to convert currency into words and need to print....
7
by: =?Utf-8?B?UGV0ZXI=?= | last post by:
I'm new to Visual Studio 2005. I'm creating a windows application using Visual Basic. After I added a control to a form and added some codings to the control, I want to rename the control. ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.