python - enumerating numbers in strings -
python - enumerating numbers in strings -
this seems problem have straight-forward answer; sadly, not fluent in python i'm still learning, , have not been able find helpful on google.
my goal enumerate numbers in string based on how much padding number has. think best way describe example:
0-file
enumerated 0-file
9-file
but 000-file
enumerated 000-file
999-file
. ultimately want able [number][a-z]
, [a-z][number]
, , [a-z][number].*
(so file10name.so
wouldn't match), think can figure part out myself regex later on.
so, question boils downwards this:
how length of 'padding' in file? how identify in string number is, can replace it? how add together padding when i'm iterating (i'm assumingzfill
, i'm interested if there's improve method). quick edit: yes, 'psudo regex' that. concept conveyed, hence why wouldn't match things "-". padding number, not 0, thats alright. both answers received far perfect. can adapt them needs. im handeling total paths, great have there other people see in future. :)
you should figure out right specification files you're trying match before coding up. pseudo-regexps gave filenames trying match ("[number][a-z]
or [a-z][number]
") don't include examples gave, such 0-file
.
however, taking stated specification @ face value, assuming wish include uppercase latin letters well, here's simple function match [number][a-z]
or [a-z][number]
, , homecoming appropriate prefix, suffix, , number of numeric digits.
import re def find_number_in_filename(fn): m = re.match(r"(\d+)([a-za-z]+)$", fn) if m: prefix, suffix, num_length = "", m.group(2), len(m.group(1)) homecoming prefix, suffix, num_length m = re.match(r"([a-za-z]+)(\d+)$", fn) if m: prefix, suffix, num_length = m.group(1), "", len(m.group(2)) homecoming prefix, suffix, num_length homecoming fn, "", 0 example_fn = ("000foo", "bar14", "baz0", "file10name") fn in example_fn: prefix, suffix, num_length = find_number_in_filename(fn) if num_length == 0: print "%s: not match" % fn else: print "%s -> %s[%d-digits]%s" % (fn, prefix, num_length, suffix) all_numbered_versions = [("%s%0"+str(num_length)+"d%s") % (prefix, ii, suffix) ii in range(0,10**num_length)] print "\t", all_numbered_versions[0], "through", all_numbered_versions[-1]
the output be:
000foo -> [3-digits]foo 000foo through 999foo bar14 -> bar[2-digits] bar00 through bar99 baz0 -> baz[1-digits] baz0 through baz9 file10name: not match
notice i'm using standard printf
-style string format convert numbers 0-padded strings, e.g. %03d
3-digit numbers 0-padding. using newer str.format
may preferable future-proofing.
if input includes total paths , filenames extensions (e.g. /home/someone/project/foo000.txt
) , want match based on lastly piece of path only, utilize os.path.split
, .splitext
trick.
update: fixed missing path separator
import re import os.path def find_number_in_filename(path): # remove path , extension head, tail = os.path.split(path) head = os.path.join(head, "") # include / or \ on end of head if it's missing fn, ext = os.path.splitext(tail) m = re.match(r"(\d+)([a-za-z]+)$", fn) if m: prefix, suffix, num_length = head, m.group(2)+ext, len(m.group(1)) homecoming prefix, suffix, num_length m = re.match(r"([a-za-z]+)(\d+)$", fn) if m: prefix, suffix, num_length = head+m.group(1), ext, len(m.group(2)) homecoming prefix, suffix, num_length homecoming path, "", 0 example_paths = ("/tmp/bar14.so", "/home/someone/0000baz.txt", "/home/someone/baz00bar.zip") path in example_paths: prefix, suffix, num_length = find_number_in_filename(path) if num_length == 0: print "%s: not match" % path else: print "%s -> %s[%d-digits]%s" % (path, prefix, num_length, suffix) all_numbered_versions = [("%s%0"+str(num_length)+"d%s") % (prefix, ii, suffix) ii in range(0,10**num_length)] print "\t", all_numbered_versions[0], "through", all_numbered_versions[-1]
python string replace enumeration
Comments
Post a Comment