WordPress plugin to clean up pasted Word content

The basic installation of WordPress 2.5.1 doesn’t enable the “clean up Word content” options of its WYSIWYG editor, TinyMCE, by default. This means that if you copy some content from Microsoft Word, it is not cleaned up when it is pasted into WordPress.

I have found that the tidying of pasted content is a very, very good thing to enable, especially for corporate users who are copying content from Word and pasting it into WordPress using Microsoft Internet Explorer. So, I have created a WordPress plugin to enable content cleanup by default. This article describes the plugin and how to install it.

The options enabled by this plugin will only work if you are pasting into WordPress using Microsoft Internet Explorer. So, if you aren’t using MSIE, then the plugin won’t be of much use to you.

The plugin enables two paste options that are normally disabled by default:

paste_auto_cleanup_on_paste
When enabled, this option means that all pasted content from Microsoft Word will be tidied up before it is pasted. This option is disabled by default in TinyMCE, but my plugin enables it.

The traditional way to use this option in TinyMCE is to add an extra button to TinyMCE, offering a special “paste from Word” option. If the user selected the “Paste from Word” option, then the content would first be tidied.

In my experience, corporate users (who are often not very fluent in Word, let alone web content management) will simply “copy” from Word, and “paste” into TinyMCE, ignoring any special “Paste from Word” option. So, my plugin enables Word tidying by default, to avoid the chance of users pasting Word content without using a special “paste from Word” option.

paste_convert_headers_to_strong
This option converts h1 to h6 elements to strong elements on paste. This option is disabled by default in TinyMCE, but my plugin enables it.

I have seen several examples of corporate users creating content in Word, and correctly using Word’s “Header 1″ to “Header 6″ styles to mark up the headers in their content. Whilst this is good practice, and should generally be encouraged, it can in some circumstances lead to problems when the content is pasted into TinyMCE. By default, TinyMCE converts “Header 1″ into an h1 tag, and so on for “Header 2″ to h2 etc. This is also a good thing in theory – but in practice, it can lead to one person’s content (created in Word with “Header 1″ styles throughout) being pasted with very large h1 headers throughout, and another person’s content (created in Word using “Bold” and “font size” to indicate headers) being pasted with normal-sized strong headers in TinyMCE.

The ideal would be to train all users to create and structure their content using the appropriate multi-level header styles in Word. The reality is that it is sometimes better to convert all headers to strong tags, to cope with all levels of Word ability. The resulting web content may not be structurally ideal, but at least it is more consistent across several Word users. So, I have chosen to enable this option by default. If you don’t want to enable this option, simply delete the following line from the plugin:

$init['paste_convert_headers_to_strong'] = true;

Other options
TinyMCE offers other paste options too, which you can add into a customised version of this plugin very easily. For example, if you turn on paste_auto_cleanup_on_paste (which my plugin does – see above), then the following two options will also take effect (as their default value is to be enabled in TinyMCE):

paste_remove_spans, which removes span elements from pasted Word content
paste_remove_styles, which removes style attributes from pasted Word content

You could, for example, turn one of these options off, by adding a line such as this to my plugin:

$init['paste_remove_styles'] = false;

I prefer to leave these two options enabled. A full list of all of the paste options can be found on the TinyMCE wiki.

If this plugin sounds useful for your own WordPress installation, then you should download v1.0 of my TinyMCE Paste Options plugin. You can use the plugin in any way, and customise it in any way. The only thing I would request is that you don’t redistribute it, but link to this page instead.

Once you have downloaded the plugin, unzip the downloaded zip file, and copy the tinymce_paste_options.php file to the /wp-content/plugins/ folder of your WordPress site. You will then need to enable the plugin via the WordPress “Manage Plugins” admin interface. Don’t forget that the Word paste options only effect MSIE!

18 thoughts on “WordPress plugin to clean up pasted Word content

  1. Pingback: { s } » themes/plugins for 06-11-2008

  2. I don’t user IE for the many many reasons as to why any sensible and responsible blogger/computer user wouldn’t use IE, but I’m glad that there are people like you to help fix the many problems it creates for the blogger community.

    This is a good idea and I hope it helps ease the frustration of the people who use IE in conjunction with WordPress.

  3. Pingback: Remove Microsoft Word Garbage from WordPress Posts, by Default · Pressed Words

  4. I am one of those computer users who is less than savvy. I know this is what I need to do (when I cut and paste from Word it kicks my right side bar to the basement every time)…but what you are explaining is a foreign language to me. I desperately need this help, as I have much content to import from Word. Can you please translate for my pea brain? I don’t have any clue what “Tiny…whatever” is. =( Thanks so much for any help.
    Diane Heeney

  5. Pingback: Bookmarks for July 30th through August 5th → Stevey.com

  6. Hello Dave,

    Can you tell me if your plugin – WordPress plugin to clean up pasted Word content is compatible with Wordress’ MU (multi-user) version?

    Thanks!

  7. Hi spicer,

    I’ve not tried it with WordPress MU, but I would expect so, going by this FAQ on their site:

    http://mu.wordpress.org/faq/

    Specifically:

    “Do plugins work?”

    “Plugins work just like in regular WordPress, they can be activated and deactivated on a per-blog level. We have something extra called “mu-plugins” which auto-executes any PHP file in that directory, like plugins that are enabled by default. Most plugins work, but some that modify core tables or create tables of their own in the DB might have difficulties, depending on how they’re coded. Best way to find out is to test!”

  8. Hi levani,

    As mentioned in the article above, my plugin is only of use for Microsoft Internet Explorer. The person in the forum link you posted is using Firefox.

  9. Hi,

    We had some issues with IE display, I found out about this plug today, installed it, made one “Word” copy/past test and it seems to do the job of cleaning the mess.

    I asked the contributor to experiment with it for a week or so, wee will see…

    Tks for the work!

  10. Pingback: WordPress Problems with Pasted MS Word Content - Web Development Blog

  11. Pingback: Cleaning up after Microsoft | D'Arcy Norman dot net

  12. Pingback: links for 2009-04-01 | D'Arcy Norman dot net

  13. Pingback: Formatting Nightmare: Pasting Content From Word Into Wordpress From Firefox

  14. Pingback: MS Word Plugins | Technical Communication Center

  15. Pingback: Live-Me

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>