Cleaning up after Microsoft

mswordscreenshotI spend a depressing amount of time cleaning up after Microsoft. Specifically, cleaning up the “helpful” HTML code generated by MS Word and/or Internet Exploder on Windows when people copy content from MS Word and paste it into a WYSIWYG editor in Internet Explorer. Helpful, in that it tries (and fails so spectacularly that it boggles my mind how such a “feature” was designed) and more often than not completely borks whatever website is the unsuspecting recipient of the control-V-of-death.

I’m not going to tell people not to use MS Word. It’s what people use. Trying to get them to switch to anything else would be tilting at windmills. People use Word.

I’m not going to tell people not to use Internet Explorer. I don’t use it. Nobody I work with uses it. But people do – most often, it’s people who don’t really know what a “browser” is, or that there are options, or that IE is a dangerous beast. They use IE. Fine.

But… I just found a plugin for WordPress that should at least mitigate the damage of the Word/IE duopoly.

Here’s an example. I just worked up a simple document in Word. It’s pretty fantastic. I’m proud of it. Teacher will give me an A+, for sure. It looks like this:

msword_content

It’s a work of art. Now, I copy the contents of that fantastic piece of literature, and hit control-C to copy it. I switch over to Internet Explorer, and paste it into the Visual editor on a WordPress site. And it looks kinda like hell. The source code of the pasted content looks like this:

borkedmsonormalmarkup

WTF? MsoNormal? margins? font-size and font-family? For the love of Xenu, why do you bork my content like this? Now, most people just see the result and say “Man, does WordPress suck. I’m not going to use THAT again.” – they don’t realize that it’s Word/IE that’s borking their content, and that it would be equally borked on any web-based content management system that offers a visual wysiwyg editor.

So, after activating the plugin, pasting the same content from my most awesome Word Document into the Visual editor of a WordPress site generates code like this:

cleanedupmarkup

It’s not perfect, but it’s cleaner. Some of the formatting won’t be exactly what was in the MS Word document, but that’s probably for the better. Apparently, if I used proper styles to define Headings in my document, it would convert them to h1/h2/etc… in the pasted markup. Ahhh… much better.

If you’re using WordPress with people that are using MS Word and/or Internet Explorer, get the plugin. You’ll be doing them a favour, and saving yourself some grief.

3 thoughts on “Cleaning up after Microsoft”

  1. Wow, thanks! My students like to write in Word because it has a French spellcheck built in, but then they get tag gumbo when copied across to WPMU, because our division mandates the use of IE for students (grr). Anyway, this should help a bunch, thanks!

  2. A lot of my work as a DBA has involved cleaning up Microsoft text. It disturbs me that databases will store text in that character set, because migrating it to plaintext or UTF8 sometimes requires a lot of stupid work. There are scripts to convert those weird characters they use for single-quotes and ellipses and whatnot, but they don’t always work or catch everything. Grr. Pisses me off.

    I’ve worked with people who write html (directly) using Word and then wonders why cutting and pasting the text breaks things. Why anyone would code anything using Word is beyond me.

  3. Came across your blog after landing on your WPMU .htaccess signup trick.

    Thanks for this nifty tip! This plugin should come in handy… I’ve always hated to clean after Microsoft (like yourself).

Comments are closed.


The spammers win. I've disabled comments. Again. It's just not worth having to deworm my site from the inane autospam jabber that trickles through the spam filters. Sorry. I can be contacted via the Contact form here on the site, or out on the internets.

BUT I WANT TO POST A COMMENT HERE. WITNESS THE OPPRESSION INHERENT IN THE SYSTEM.

If you need to post a response, trackbacks are enabled and will be displayed normally.

The Trackback URL for this post is:
http://darcynorman.net/2009/04/01/cleaning-up-after-microsoft/trackback/