How to modify the xml structure internally to work the program?

Dear Friends,

I have an application in Python which take input as an XML document. The XML document is supplied externally and cannot change it structure . But there is problem in alignment of XML. I am using xml minidom for parsing purpose.

There is simple position change is enough. But I have no idea how to change the element of DOM i.e self.tree = MD.parse(fichero) Please advise a good way ...

Please refer the problematic html and normal html structure attached here ...
N.B We have no option to edit the source HTML, because it may come from CD also.

Thanks
Anes

Attached Files

	working html.txt (1.9 KB, 755 views)
	problem html.txt (2.2 KB, 801 views)

Jan 15 '16 #1

Subscribe Post Reply

1262

dwblas

626

Expert 512MB

What is the problem and what do you want to extract? It would possibly be easier to process this as a plain text file and split/groupby the <h1>, <h2>, & <span> tags depending. Will post some code later tonight time permitting.

Jan 15 '16 #2

dwblas

626

Expert 512MB

This code should be self explanatory. The combined record(s) are printed, but you could also search for string within the record, or write them to a file.

Expand|Select|Wrap|Line Numbers

 def process_group(group_in):

    print " ".join(group_in)
 
with open("problem_or_working_html.txt", "r") as fp_in:

    starters=["<h1", "<h2", "<span", "</body"]

    this_group=[]

    for rec in fp_in:

        rec=rec.strip()

        for start_lit in starters:

            if rec.startswith(start_lit):

                process_group(this_group)

                this_group=[]

        this_group.append(rec)
 
## process last group

process_group(this_group)

Jan 15 '16 #3

amskape

Dear dwblas,
Thanks for your fantastic answer . It works fine with small indentation changes.

Expand|Select|Wrap|Line Numbers

 
#!/bin/python  

def process_group(group_in):

    print " ".join(group_in)

with open("problem_html.txt", "r") as fp_in:

    starters = ["<h1", "<h2", "<span", "</body"]

    this_group = []

    for rec in fp_in:

        rec = rec.strip()

        for start_lit in starters:

            if rec.startswith(start_lit):

                process_group(this_group)

            #this_group = []

        this_group.append(rec)
 
# process last group

process_group(this_group) #function invoking...

But current situation I got the result as DOM element with a normal python print show as

Expand|Select|Wrap|Line Numbers

[<DOM Element: body at 0xb199054c>]

So the Node list element . In node list we cannot apply this strip() method. Please advise a solution in this case...

With lots of gratitude

Anes

Jan 16 '16 #4

Similar topics

Select a row to delete/modify

by: Franco Fellico' | last post by:

Hi. Suppose to have read and displayed (using PHP) a group of row of a DB table on a dinamyc table on a HTML/PHP page. The number of row displayed could be from 1 to n. Each row contains...

PHP

Stringbuilder, how does it internally work ?

by: | last post by:

I know how to use a StringBuilder, which supposedly does not create a new copy of it each time you modify it contents by adding or removing text. But, I wonder how does it do that internally ? I...

C# / C Sharp

modify structure array pointer in the function

by: s88 | last post by:

Howdy: the follows is my program, I wanna change my structure array pointer in the function "testfunc", but I fail..., I also try to call the testfunc by reference, but the compiler says...

C / C++

FTP server program using sockets in C++

by: verge | last post by:

hello everyone! how's it going? like everyone in here im in need of some help and good friendship along the way...take a look at this: //MODIFIED SO IT DEALS WITH WINDOWS FTP USING ACTIVE...

C / C++

structure of a program

by: gordon | last post by:

Hi I am still fairly new to C#.net and I sometimes make basic program design mistakes - particularyly in the context of paying attention to OOP principles. At the moment I am working on an...

C# / C Sharp

Treat a DataSet as a class structure

by: SteveT | last post by:

Can someone point me in the right direction? Somewhere I read that you reference a strongly typed dataset as if it were a class structure. For example, <SomeTests> <TestsGroups> <Group>...

C# / C Sharp

109

Can't get my inventory program to compile

by: zaidalin79 | last post by:

I have a java class that goes for another week or so, and I am going to fail if I can't figure out this simple program. I can't get anything to compile to at least get a few points... Here are the...

Java

insert comand can modify structure of table?

by: juve11 | last post by:

hello, i have an app that imports csv files into mysql tables (not my app,so i dont have source code).a week ago that program worked.now,the soft doesnt works any more :The field is too small to...

Visual Basic 4 / 5 / 6

Can't get Stephan Lebans save-restore-modify relationships to work

by: terrybell105 | last post by:

I downloaded Stephan's utility from his website but can't get it to work - or maybe I'm not driving it properly! The form works OK with the existing 3 "views" - I can switch between them and they...

Microsoft Access / VBA

How to modify javascript to work on google chrome and mozilla ?

by: leutrim | last post by:

This code dosen't function properly on google chrome, and mozilla firefox, just on IE. The problem is that, it allways put's the box on the left uppercorner, I need to the box to appear to the place...

Javascript

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing