|
P: 80
|
is there a way to keep the format of a html source code for example when you view a source code of a website, is there anyway to keep the same structure, i have tried using read() readlines() i tried saving the source code then opening it up in my program and still failed to keep the same format, is there anyway around this problem ??
| |
Share this Question
| Expert Mod 2.5K+
P: 2,509
|
HTML was designed for data display with a focus on how it looks. It won't display the same in a GUI widget as it does in a browser. A browser, such as Google Chrome, uses the HTML tags to interpret page content. Is that what you are referring to? Maybe you could parse the source code with BeautifulSoup or lxml.html and display the content only.
| | |
P: 80
|
for example this is the first line of google.com source code and it is shown in one line
<!doctype html><html itemscope itemtype="http://schema.org/WebPage"><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><meta
i want this to be displayed all in one line unlike two lines as sown in the above example
hope that makes sense
| | | 100+
P: 121
|
As far as I can tell you don't actually have an issue.
And if you have an issue I can not reproduce it. Since I believe it does not have anything to do with Python. But rather think its an issue with width limitation for characters in whatever you use to view the output. Because in Python it's correctly stored as one line when I use readlines.
| | |
P: 80
|
are you outputting the source code in the tkinter text widget ??
| | |
P: 80
|
plus readline outputs the whole source code in one line which i do not want
| | | Expert 100+
P: 335
|
You will have to post some code for a better answer. Incomplete questions equal incomplete answers. This works for me - import urllib2
-
fp= urllib2.urlopen('http://www.python.org/')
-
-
for rec in fp:
-
print rec
-
print "-"*60
| | |
P: 80
|
to bvdet i have downloaded the lxml parser how would you parse the url?
to dwlabs i have trouble explaining this here is my full code if you run it and type in the any url it will get the source code and display it in a text box when you compare the sourcecode from the website and the source code been copied into the text box the format is different the only exception is python.org, -
-
#!/usr/bin/env python
-
from tkinter import *
-
from tkinter import ttk
-
import tkinter.messagebox
-
import os
-
import urllib.request
-
-
sourcecode = ""
-
-
def geturl(*args): #accept an argument for return
-
path = urlname.get()
-
if path == "":
-
tkinter.messagebox.showinfo("error", "please enter a url")
-
if "http://" not in path:
-
http ="http://"
-
path = http + path
-
with urllib.request.urlopen(path) as url:
-
sourcecode = url.read()
-
global storecode
-
storecode = sourcecode
-
string2 = "source code copied from : " + path
-
tkinter.messagebox.showinfo("copied", string2)
-
Text.delete(1.0, END)#delete currently in text box
-
Text.insert(tkinter.END,storecode)
-
return
-
-
#find white spaces in source code
-
def count_white_space():
-
path = urlname.get()
-
if path == "":
-
tkinter.messagebox.showinfo("error", "please enter a url")
-
-
if "http://" not in path:
-
http ="http://"
-
path = http + path
-
with urllib.request.urlopen(path) as url:
-
sourcecode = url.readlines()
-
global storecode
-
storecode = sourcecode
-
whitespace = 0
-
for item in str(sourcecode):
-
if item == ' ':
-
whitespace +=1
-
string1 = "There are " + str(whitespace) + " white spaces in: " + path
-
tkinter.messagebox.showinfo("whitespace", string1)
-
-
-
app = Tk()
-
app.title(" text editor")
-
-
content = ttk.Frame(app, padding=(3,3,12,12))
-
content.grid(column=0, row=0,sticky=(N, S, E, W))
-
-
#creat label
-
labeltext = StringVar()
-
labeltext.set("enter url:")
-
label1 = ttk.Label(content, textvariable=labeltext).grid(column=0, row=1, columnspan=1, rowspan=1, sticky=(N, W), padx=5)
-
-
#create text box
-
urlname = StringVar()# text being enterd in tht text box is stored in urlname
-
-
urlname_entry = ttk.Entry(content, textvariable=urlname, width=67)
-
urlname_entry.grid(column=1, row=1, columnspan=3,rowspan=5, sticky=(N,W) )
-
#focus in the text box so user dont have to click on
-
urlname_entry.focus()
-
#create button
-
-
button1 = ttk.Button(content,text="get source", command=geturl)
-
button2 = ttk.Button(content,text="count white spaces", command=count_white_space)
-
button1.grid(column=3,row=0, columnspan=1, rowspan=2, sticky=(N,W))
-
button2.grid(column=4,row=0,columnspan=2, rowspan=2, sticky=(N,W))
-
-
-
scroll = tkinter.Scrollbar(content,borderwidth=2)
-
Text = tkinter.Text(content,wrap=CHAR, width=50, height=20)
-
scrollh = tkinter.Scrollbar(content,borderwidth=2, orient=HORIZONTAL)
-
-
-
scrollh.config(command=Text.xview)
-
Text.config(xscrollcommand=scrollh.set)
-
-
scroll.config(command=Text.yview)
-
Text.config(yscrollcommand=scroll.set, wrap=tkinter.NONE,)
-
-
Text.grid(row=2, column=1,columnspan=1, rowspan=3, sticky=(N))
-
scroll.grid(row=2,column=2, sticky='ns', rowspan=3)
-
scrollh.grid(row=6, rowspan=1, column=1, sticky='ew')
-
-
app.columnconfigure(0, weight=1)
-
app.rowconfigure(0, weight=1)
-
content.columnconfigure(0, weight=3)
-
content.columnconfigure(1, weight=3)
-
content.columnconfigure(2, weight=3)
-
content.columnconfigure(3, weight=1)
-
content.columnconfigure(4, weight=1)
-
content.rowconfigure(1, weight=1)
-
-
-
-
-
#text = Text(app, width=80,height=40, wrap='none').grid(row=2, column=2)
-
-
-
-
-
-
-
#adds spacing between widgets
-
for child in app.winfo_children(): child.grid_configure(padx=5, pady=5)
-
for child in content.winfo_children(): child.grid_configure(padx=5, pady=5)
-
-
app.bind('<Return>',geturl) #enter can also be hit
-
-
-
-
app.mainloop()
-
-
-
by the way im using python 3.2
| | | Expert Mod 2.5K+
P: 2,509
|
Parsing the XML won't help you obtain the output you want. I tried it and the first line was 863 characters long. I am using Python 2.7.2.
| | |
P: 80
|
so guessing there is no way around this problem
| | Post your reply Help answer this question
Didn't find the answer to your Python question?
You can also browse similar questions: Python | | Question stats - viewed: 305
- replies: 9
- date asked: Feb 15 '12
|