python - Getting size of pdf without downloading -



python - Getting size of pdf without downloading -

is possible know size of pdf e.g. http://example.com/abc.pdf using requests module in python without downloading it. writing application if net speed slow , if size of pdf big postpone download future

use http-head request

response shall provide in headers more details of file download without fetching total file.

>>> url = "http://www.pdf995.com/samples/pdf.pdf" >>> req = requests.head(url) >>> req.content '' >>> req.headers["content-length"] '433994' or seek streaming read >>> req = requests.get(url, stream=true) >>> res = req.iter_content(30) >>> res <generator object generate @ 0x7f9ad3270320> >>> res.next() '%pdf-1.3\n%\xc7\xec\x8f\xa2\n30 0 obj\n<</len' >>> res.next() 'gth 31 0 r/filter /flatedecode' >>> res.next() '>>\nstream\nx\x9c\xed}\xdd\x93%\xb7m\xef\xfb\xfc\x15s\xf7%nu\xf6\xb8'

you can decode pdf size initial pdf file bytes , decide go on or not.

use range request header

http allows asking retrieval range of bytes.

if server supports that, can utilize trick, inquire range of bytes available big files. if bytes (and status ok), know, file large.

if exception chunkedencodingerror: incompleteread(0 bytes read), know, file smaller.

call this:

>>> headers = {"range": "bytes=999500-999600"} >>> req = requests.get(url, headers=headers)

this work only, if server allows serving partial content.

python http-headers request

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

django - Access session in user model .save() -

php - .htaccess Multiple Rewrite Rules / Prioritizing -