364,111 Members | 2009 Browsing Online
Community for Developers & IT Professionals
Bytes IT Community

i used read() and readlines() but format is not the same ??

moroccanplaya
P: 80
is there a way to keep the format of a html source code for example when you view a source code of a website, is there anyway to keep the same structure, i have tried using read() readlines() i tried saving the source code then opening it up in my program and still failed to keep the same format, is there anyway around this problem ??
Feb 15 '12 #1
Share this Question
Share on Google+
9 Replies


bvdet
Expert Mod 2.5K+
P: 2,509
HTML was designed for data display with a focus on how it looks. It won't display the same in a GUI widget as it does in a browser. A browser, such as Google Chrome, uses the HTML tags to interpret page content. Is that what you are referring to? Maybe you could parse the source code with BeautifulSoup or lxml.html and display the content only.
Feb 16 '12 #2

moroccanplaya
P: 80
for example this is the first line of google.com source code and it is shown in one line

<!doctype html><html itemscope itemtype="http://schema.org/WebPage"><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><meta

i want this to be displayed all in one line unlike two lines as sown in the above example

hope that makes sense
Feb 17 '12 #3

Smygis
100+
P: 121
As far as I can tell you don't actually have an issue.

And if you have an issue I can not reproduce it. Since I believe it does not have anything to do with Python. But rather think its an issue with width limitation for characters in whatever you use to view the output. Because in Python it's correctly stored as one line when I use readlines.
Feb 17 '12 #4

moroccanplaya
P: 80
are you outputting the source code in the tkinter text widget ??
Feb 17 '12 #5

moroccanplaya
P: 80
plus readline outputs the whole source code in one line which i do not want
Feb 17 '12 #6

dwblas
Expert 100+
P: 335
You will have to post some code for a better answer. Incomplete questions equal incomplete answers. This works for me
Expand|Select|Wrap|Line Numbers
  1. import urllib2
  2. fp= urllib2.urlopen('http://www.python.org/')
  3.  
  4. for rec in fp:
  5.     print rec
  6.     print "-"*60 
Feb 17 '12 #7

moroccanplaya
P: 80
to bvdet i have downloaded the lxml parser how would you parse the url?

to dwlabs i have trouble explaining this here is my full code if you run it and type in the any url it will get the source code and display it in a text box when you compare the sourcecode from the website and the source code been copied into the text box the format is different the only exception is python.org,

Expand|Select|Wrap|Line Numbers
  1.  
  2. #!/usr/bin/env python
  3. from tkinter import *
  4. from tkinter import ttk
  5. import tkinter.messagebox
  6. import os
  7. import urllib.request
  8.  
  9. sourcecode = ""
  10.  
  11. def geturl(*args): #accept an argument for return
  12.     path = urlname.get()
  13.     if path == "":
  14.            tkinter.messagebox.showinfo("error", "please enter a url")
  15.     if "http://" not in path:
  16.         http ="http://"
  17.         path = http + path
  18.     with urllib.request.urlopen(path) as url:
  19.         sourcecode = url.read()
  20.         global storecode
  21.         storecode = sourcecode
  22.         string2 = "source code copied from : " + path
  23.         tkinter.messagebox.showinfo("copied", string2)
  24.         Text.delete(1.0, END)#delete currently in text box
  25.         Text.insert(tkinter.END,storecode)
  26.         return
  27.  
  28.             #find white spaces in source code
  29. def count_white_space():
  30.     path = urlname.get()
  31.     if path == "":
  32.            tkinter.messagebox.showinfo("error", "please enter a url")
  33.  
  34.     if "http://" not in path:
  35.         http ="http://"
  36.         path = http + path
  37.         with urllib.request.urlopen(path) as url:
  38.             sourcecode = url.readlines()
  39.             global storecode
  40.             storecode = sourcecode
  41.             whitespace = 0
  42.             for item in str(sourcecode):
  43.                 if item == ' ':
  44.                     whitespace +=1
  45.             string1 = "There are " + str(whitespace) + " white spaces in: " + path
  46.             tkinter.messagebox.showinfo("whitespace", string1)
  47.  
  48.  
  49. app = Tk()
  50. app.title(" text editor")
  51.  
  52. content = ttk.Frame(app, padding=(3,3,12,12))
  53. content.grid(column=0, row=0,sticky=(N, S, E, W))
  54.  
  55. #creat label
  56. labeltext = StringVar()
  57. labeltext.set("enter url:")
  58. label1 = ttk.Label(content, textvariable=labeltext).grid(column=0, row=1, columnspan=1, rowspan=1, sticky=(N, W), padx=5)
  59.  
  60. #create text box
  61. urlname = StringVar()# text being enterd in tht text box is stored in urlname
  62.  
  63. urlname_entry = ttk.Entry(content, textvariable=urlname, width=67)
  64. urlname_entry.grid(column=1, row=1, columnspan=3,rowspan=5, sticky=(N,W) )
  65. #focus in the text box so user dont have to click on
  66. urlname_entry.focus()
  67. #create button
  68.  
  69. button1 = ttk.Button(content,text="get source", command=geturl)
  70. button2 = ttk.Button(content,text="count white spaces", command=count_white_space)
  71. button1.grid(column=3,row=0, columnspan=1, rowspan=2, sticky=(N,W))
  72. button2.grid(column=4,row=0,columnspan=2, rowspan=2, sticky=(N,W))
  73.  
  74.  
  75. scroll = tkinter.Scrollbar(content,borderwidth=2)
  76. Text = tkinter.Text(content,wrap=CHAR, width=50, height=20)
  77. scrollh = tkinter.Scrollbar(content,borderwidth=2, orient=HORIZONTAL)
  78.  
  79.  
  80. scrollh.config(command=Text.xview)
  81. Text.config(xscrollcommand=scrollh.set)
  82.  
  83. scroll.config(command=Text.yview)
  84. Text.config(yscrollcommand=scroll.set, wrap=tkinter.NONE,)
  85.  
  86. Text.grid(row=2, column=1,columnspan=1, rowspan=3, sticky=(N))
  87. scroll.grid(row=2,column=2, sticky='ns', rowspan=3)
  88. scrollh.grid(row=6, rowspan=1, column=1, sticky='ew')
  89.  
  90. app.columnconfigure(0, weight=1)
  91. app.rowconfigure(0, weight=1)
  92. content.columnconfigure(0, weight=3)
  93. content.columnconfigure(1, weight=3)
  94. content.columnconfigure(2, weight=3)
  95. content.columnconfigure(3, weight=1)
  96. content.columnconfigure(4, weight=1)
  97. content.rowconfigure(1, weight=1)
  98.  
  99.  
  100.  
  101.  
  102. #text = Text(app, width=80,height=40, wrap='none').grid(row=2, column=2)
  103.  
  104.  
  105.  
  106.  
  107.  
  108.  
  109. #adds spacing between widgets
  110. for child in app.winfo_children(): child.grid_configure(padx=5, pady=5)
  111. for child in content.winfo_children(): child.grid_configure(padx=5, pady=5)
  112.  
  113. app.bind('<Return>',geturl) #enter can also be hit
  114.  
  115.  
  116.  
  117. app.mainloop()
  118.  
  119.  
  120.  
by the way im using python 3.2
Feb 17 '12 #8

bvdet
Expert Mod 2.5K+
P: 2,509
Parsing the XML won't help you obtain the output you want. I tried it and the first line was 863 characters long. I am using Python 2.7.2.
Feb 18 '12 #9

moroccanplaya
P: 80
so guessing there is no way around this problem
Feb 18 '12 #10

Post your reply

Help answer this question



Didn't find the answer to your Python question?

You can also browse similar questions: Python