python - Getting size of pdf without downloading -

is possible know size of pdf e.g. http://example.com/abc.pdf using requests module in python without downloading it. writing application if net speed slow , if size of pdf big postpone download future

use http-head request

response shall provide in headers more details of file download without fetching total file.

>>> url = "http://www.pdf995.com/samples/pdf.pdf" >>> req = requests.head(url) >>> req.content '' >>> req.headers["content-length"] '433994'

or seek streaming read

>>> req = requests.get(url, stream=true) >>> res = req.iter_content(30) >>> res <generator object generate @ 0x7f9ad3270320> >>> res.next() '%pdf-1.3\n%\xc7\xec\x8f\xa2\n30 0 obj\n<</len' >>> res.next() 'gth 31 0 r/filter /flatedecode' >>> res.next() '>>\nstream\nx\x9c\xed}\xdd\x93%\xb7m\xef\xfb\xfc\x15s\xf7%nu\xf6\xb8'

you can decode pdf size initial pdf file bytes , decide go on or not.

use range request header

http allows asking retrieval range of bytes.

if server supports that, can utilize trick, inquire range of bytes available big files. if bytes (and status ok), know, file large.

if exception chunkedencodingerror: incompleteread(0 bytes read), know, file smaller.

call this:

>>> headers = {"range": "bytes=999500-999600"} >>> req = requests.get(url, headers=headers)

this work only, if server allows serving partial content.

python http-headers request

Search This Blog

Three

python - Getting size of pdf without downloading -

Comments

Post a Comment