python: insert newline with regular expression (sub | substitute)

Python allows to use regular expressions on multiline strings.
You can use re.MULTILINE flag to make ‘^’ and ‘$’ match resp. the beginning and end of each line.

The problem is how to insert newline character doing substitutions ?
Usually, substitution pattern is a raw string
ex: print re.sub(‘mytext’, ‘t(.)x’, r’T\1X’) # displays ‘myTeXt’
Here we use a raw string for substitution pattern (string prefixed with ‘r’ char), otherwise we would need to double the escape character (r’T\1X’ => ‘T\\1X’)
Problem with raw strings is that content is not interpreted, particularly ‘\n’ is not interpreted and is considered being two characters. => to insert newlines, we simply need NOT to use raw string in replacement pattern.

Let’s continue with our example, let’s say we want to insert a new line before captured group (letter ‘e’ in our case):

  • print re.sub(‘mytext’, ‘t(.)x’, r’\n\1′) # displays ‘myt\nxt’ since we’re using raw string for replacement pattern
  • print re.sub(‘mytext’, ‘t(.)x’, ‘\n\\1’) # displays ‘myt’ <newline> ‘xt’

[sources]

  • http://docs.python.org/lib/module-re.html
  • http://www.amk.ca/python/howto/regex/regex.html read part 2.2 on raw strings

Post a Comment

Your email is never published nor shared. You're allow to say what you want...