Fun with Java RegEx String Replacement


In the Pachyderm presentation publishing code, there is a step where it compiles the various versions of images for use in the final product - resizing as needed, wrapping in the .swf file and burning in the metadata. It uses an XML format, provided by JSwiff, to replace the freeze-dried content of a templated flash file wrapper with the dynamically defined data (image and text).

We just do a simple find-and-replace, looking for special tags that we've placed in the templated .swf xml version - looking for things like "{tombstoneTitleShort}" and replacing it with "My Most Excellent Photo". Seems simple. But I just came across a case where it failed. The extended text for an image included a $ - which would be fine, but it's a Magic Regex Character, symbolizing the end of a line of text (likewise, ^ symbolizes the beginning of a line). And, it was unescaped (and not at the end of a line), so the String.replaceAll() method was barfing appropriately. I think this is what was happening... Looked like it from the debug output, anyway...

I just switched that bit of string replacement to use the Apache Commons Lang StringUtils replacement method (org.apache.commons.lang.StringUtils.replace(String, String, String) ), and all appears fine now.

As an aside, there are a lot of handy little goodnesses in the Apache Commons Lang library (as in the other Commons libraries). I need to make sure I'm taking advantage of it more, rather than writing my own utility code...

Update: Nope. That didn't work as cleanly as I'd hoped. Now it was complaining that some of the replaced characters were invalid UTF-8. So, I'm now replacing characters in my replacement string to attempt to escape them properly before feeding them to String.replaceAll( pattern, value ), using the "oldReplace" method from this tip.


comments powered by Disqus