473,492 Members | 4,301 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

HTTP Request, character encoding and fsockopen

Hi guys,

This is a weird problem, and I'm not sure if I got it right.

Just a practical example, that will describe my problem:

I'm connecting to google.com host on port 80 using fsock open, and I
send a regular GET header without any specific HTTP headers regarding
the type of encoding accepted, cookies, accepted charset, conditional
headers etc

What happens, is after sending the headers to this stream opened using
fsockopen, I start grabbing the headers, and then, comes the body of
the web page, everything seems logic until this point.

The problem is, just after the headers are received, the body of the
page, contains few odd alphanumeric values , about 4 elements in
length, and it seems it's a hexa value. e.g.. 2A, or two values
maybe: 8c9d... then comes the regular HTML code of the page if any.

At the end of the grabbed content, there's also one of these
alphanumeric groups, or a "0" (zero).

For some reason I tend to believe the characters right after the
headers are sent are used by browsers to identify the type of the
encoding of the stream? e.g. bytes that decide that my page is going
to come as UTF-8 encoding?

Anyways, the problem is, how to make sure I get the page right, and
why the file_Get_contents (url_goes_here) doesn't grab those
alphanumeric characters, considering they're stripping the returned
headers of the request already.

I am still thinking it's some sort of "stream's first byte" that
informs the app about the encoding of the content, but I'm here to
hear your input and solution on this.

Thank you,

Vladimir Ghetau

http://www.Vladimirated.com/
Jan 19 '08 #1
2 4370
Hi,

You could try using HTTP/1.0 or simply leaving off the HTTP version.

HTTP/1.1 clients must be able to handle "chunked transfer coding",
which is the encoding you're seeing. Each segment is preceded by it's
size in hex.

Details:

http://www.w3.org/Protocols/rfc2616/....html#sec3.6.1

Peace,
John Peters

On Jan 19, 3:18 am, Vladimir Ghetau <vladi...@pixeltomorrow.com>
wrote:
Hi guys,

This is a weird problem, and I'm not sure if I got it right.

Just a practical example, that will describe my problem:

I'm connecting to google.com host on port 80 using fsock open, and I
send a regular GET header without any specific HTTP headers regarding
the type of encoding accepted, cookies, accepted charset, conditional
headers etc

What happens, is after sending the headers to this stream opened using
fsockopen, I start grabbing the headers, and then, comes the body of
the web page, everything seems logic until this point.

The problem is, just after the headers are received, the body of the
page, contains few odd alphanumeric values , about 4 elements in
length, and it seems it's a hexa value. e.g.. 2A, or two values
maybe: 8c9d... then comes the regular HTML code of the page if any.

At the end of the grabbed content, there's also one of these
alphanumeric groups, or a "0" (zero).

For some reason I tend to believe the characters right after the
headers are sent are used by browsers to identify the type of the
encoding of the stream? e.g. bytes that decide that my page is going
to come as UTF-8 encoding?

Anyways, the problem is, how to make sure I get the page right, and
why the file_Get_contents (url_goes_here) doesn't grab those
alphanumeric characters, considering they're stripping the returned
headers of the request already.

I am still thinking it's some sort of "stream's first byte" that
informs the app about the encoding of the content, but I'm here to
hear your input and solution on this.

Thank you,

Vladimir Ghetau

http://www.Vladimirated.com/
Jan 19 '08 #2
Hello,

on 01/19/2008 06:18 AM Vladimir Ghetau said the following:
I'm connecting to google.com host on port 80 using fsock open, and I
send a regular GET header without any specific HTTP headers regarding
the type of encoding accepted, cookies, accepted charset, conditional
headers etc

What happens, is after sending the headers to this stream opened using
fsockopen, I start grabbing the headers, and then, comes the body of
the web page, everything seems logic until this point.

The problem is, just after the headers are received, the body of the
page, contains few odd alphanumeric values , about 4 elements in
length, and it seems it's a hexa value. e.g.. 2A, or two values
maybe: 8c9d... then comes the regular HTML code of the page if any.

At the end of the grabbed content, there's also one of these
alphanumeric groups, or a "0" (zero).

For some reason I tend to believe the characters right after the
headers are sent are used by browsers to identify the type of the
encoding of the stream? e.g. bytes that decide that my page is going
to come as UTF-8 encoding?

Anyways, the problem is, how to make sure I get the page right, and
why the file_Get_contents (url_goes_here) doesn't grab those
alphanumeric characters, considering they're stripping the returned
headers of the request already.

I am still thinking it's some sort of "stream's first byte" that
informs the app about the encoding of the content, but I'm here to
hear your input and solution on this.
Those are chunked transfer encoding blocks. You need to decode and
assemble the blocks. They are useful to know when the server response
has ended for responses with unpredicted length, like for instance those
generated by dynamically generated pages with PHP.

You may want to take a look at this HTTP client class to learn how to
decode them:

http://www.phpclasses.org/httpclient
--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Jan 19 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
4712
by: Michael T. Peterson | last post by:
I am trying to get the file referenced by the following url: http://waterdata.usgs.gov/wa/nwis/uv?dd_cd=01&dd_cd=02&format=rdb&period=1&site_no=12149000 I'm using parse_url to get the host,...
2
4884
by: Mike Verdone | last post by:
Hello all, I'm trying to implement a PHP program that can handle streaming HTTP data through Apache. I need to somehow get access to the incoming data of the HTTP request as it arrives (i.e. I...
4
6640
by: Lu | last post by:
Hi, i am currently working on ASP.Net v1.0 and is encountering the following problem. In javascript, I'm passing in: "somepage.aspx?QSParameter=<RowID>Chèques</RowID>" as part of the query...
0
3852
by: WIWA | last post by:
Hi, I want to login to a password protected website and fetch the content of the page behind. I have based my code on http://weblogs.asp.net/jdennany/archive/2005/04/23/403971.aspx. When I use...
2
7380
by: lazypig06 | last post by:
Hi ! Yesterday, I posted a topic regarding to XML problem that I've been having. The old topic can be found at:...
3
2133
by: webEater | last post by:
Hey, I am writing a file that reads in an external file in the web and prints it out including the response header of the http protocol. I do this to enable cross domain XMLHttpRequests. I...
6
32708
by: Boldgeek | last post by:
I am trying to develop an app that will allow automatic updating of a web form which uses multipart/form-data enctype (as it MIGHT be sending an image) I have an example form, which when...
5
3251
by: xieliwei | last post by:
I have a freshly installed openSuSe 10.2 with PHP4 from http://download.opensuse.org/repositories/home:/michal-m:/php4/openSUSE_10.2/ (openSuSe abandoned PHP4 since version 10, but I have customers...
1
2660
by: beau.moore | last post by:
Hi all, I need some help, I am trying to access a service that binds to the loopback address on my linux FC8 server. I think PHP functions but I am unable to get any response back from the...
0
6980
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7192
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6862
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7364
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
4579
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3087
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3078
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1397
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
637
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.