Wednesday, November 27, 2013

How to clean away control character from a text file

I found this little example on the internet and saved it as a shorthand - it has come in handy many times. Unfortunately I have forgotten the name of the original author.

When files are sent from windows to unix, you will sometimes see that the when you open the files in a unix text editor has ^M's in them.

For example
------------------------------------------------------------------
import java.util.Hashtable; ^M
import java.util.Properties; ^Mimport java.io.IOException;
import org.xml.sax.AttributeList; ^M
import org.xml.sax.HandlerBase; ^Mimport org.xml.sax.SAXException;

/**^M
  * XMLHandler: This class parses the elements contained^M
  * within a XML message and builds a Hashtable^M

------------------------------------------------------------------
Notice that some programs are not consistent in the way they insert the line breaks so you end up with some lines that have both a carrage return and a ^M and some
lines that have a ^M and no carrage return (and so blend into one). There are two steps to clean this up.

1. replace all extraneous ^M:

:%s/^M$//g

BE SURE YOU MAKE the ^M USING "CTRL-V CTRL-M" NOT BY TYPING "CARROT M"! This expression will replace all the ^M's that have carriage returns
after them with nothing. (The dollar ties the search to the end of a line)

2. replace all ^M's that need to have carriage returns:

:%s/^M/ /g

Once again: BE SURE YOU MAKE the ^M USING "CTRL-V CTRL-M" NOT BY TYPING "CARROT M"! This expression will replace all the ^M's that didn't have
carriage returns after them with a carriage return.

Voila! Clean file. Map this to something if you do it frequently.

No comments:

Post a Comment