<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Morten Bock</title><link>http://www.mortenbock.dk/</link><description>A feed of new content on my website</description><item><author>Morten Bock</author><category>umbraco</category><category>wordpress</category><category>blog</category><category>linq2xml</category><description>
&lt;p&gt;As you may have noticed, my blog has changed a bit this week. I
finally got around to porting it to Umbraco. I did this for several
reasons, but mainly because I feel at home with Umbraco, and I can
tweak it to do just about anything I want. Wordpress on the other
hand is PHP, and I just suck at that. So there you go...&lt;/p&gt;

&lt;p&gt;Anyhow, after setting up my document types in Umbraco I needed
to figure out how to get all my old content into the new site.
Wordpress offers to export the entire content as xml, so that part
was easy. The exported file was 3Mb, mainly because of some sort of
screwed up tags back from when I was using the Ultimate Tag Warrior
(I will miss the cool plugin names from Wordpress), which spit out
a whole lot of empty tags.&lt;/p&gt;

&lt;p&gt;The exported format is basically an RSS feed, but with some
extra elements added by wordpress. One of those is an
&amp;lt;excerpts:encoded&amp;gt; element, which does not have a namespace
declaration at the top, thus making it invalid xml. So I needed to
fix this before handling the file in my import routine. I just
added it to the rss element:&lt;/p&gt;

&lt;pre&gt;
&amp;lt;rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:excerpt="http://purl.org/rss/1.0/modules/excerpt/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:wp="http://wordpress.org/export/1.0/"&amp;gt;
   
&lt;/pre&gt;

&lt;p&gt;Sweet, now the xml is all nice and tidy and ready to be
imported. So, how to do the import? Well, I decided to do it
through the Umbraco API using a dashboard usercontrol. To get the
content from the XML file, I chose to go with Linq2Xml which is
pretty neat for navigating through the XML file.&lt;/p&gt;

&lt;p&gt;First thing I did was to disable some Lucene lock, because it
made my import fail due to the number of operations done. I also
set the script timeout value a bit high just to be sure:&lt;/p&gt;

&lt;pre&gt;
Server.ScriptTimeout = 300;
Lucene.Net.Store.FSDirectory.SetDisableLocks(true);
   
&lt;/pre&gt;

&lt;p&gt;Now, to load the Xml file. Pretty easy. I later added the
possibility to enter the XML in a textarea instead, thus the
commented out line:&lt;/p&gt;

&lt;pre&gt;
XDocument loaded = XDocument.Load(Server.MapPath("~/usercontrols/wordpress.2009-08-01.xml"));
//XDocument loaded = XDocument.Parse(wpxmltextbox.Text);

XNamespace wpns = XNamespace.Get("http://wordpress.org/export/1.0/");
XNamespace contentns = XNamespace.Get("http://purl.org/rss/1.0/modules/content/");
var q = from c in loaded.Descendants("item")
        where (string)c.Element(wpns + "post_type") == "post"
        select c;
&lt;/pre&gt;

&lt;p&gt;So now I got all my blogposts in the variable "q". time to feed
them into Umbraco. It's not too nicely structured, but it does the
job, and it's a one time deal, so no need to go crazy here.&lt;/p&gt;

&lt;pre&gt;
DocumentType dt = DocumentType.GetByAlias("BlogPost");
User author = User.GetUser(0);

foreach (XElement item in q)
{
    string posttitle = (string)item.Element("title");
    string legacyurl = ((string)item.Element("link")).Replace("", string.Empty);
    string legacyid = (string)item.Element(wpns + "post_id");
    string posturlnodename = Server.UrlDecode((string)item.Element(wpns + "post_name"));
    string postbody = (string)item.Element(contentns + "encoded");
    string posttags = string.Empty;
    DateTime createdate = DateTime.Parse((string)item.Element(wpns + "post_date"));

    int i = 0;
    foreach (XElement tag in item.Elements("category"))
    {
        if ((string)tag.Attribute("domain") == "tag" &amp;amp;&amp;amp; !string.IsNullOrEmpty((string)tag.Attribute("nicename")))
        {
            if (i &amp;gt; 0)
            {
                posttags += ",";
            }
            posttags += (string)tag.Attribute("nicename");
            i++;
        }
    }

    Document doc = Document.MakeNew(posturlnodename, dt, author, 1049);
    doc.getProperty("blogPostTitle").Value = posttitle;
    doc.getProperty("blogPostBody").Value = WordpressPostParser.ParseCodeBlocks(WordpressPostParser.ChangeImageUrls(WordpressPostParser.CreateParagraphTags(postbody)));
    doc.getProperty("blogPostLegacyUrl").Value = legacyurl;
    doc.getProperty("blogPostLegacyID").Value = legacyid;
    doc.CreateDateTime = createdate;

    if (!string.IsNullOrEmpty(posttags))
    {
        umbraco.editorControls.tags.library.addTagsToNode(doc.Id, posttags, "default");
        doc.getProperty("blogPostTags").Value = posttags;
    }

    doc.Publish(author);
    umbraco.library.UpdateDocumentCache(doc.Id);


    //comments here...
    foreach (XElement comment in item.Elements(wpns + "comment"))
    {
        if ((string)comment.Element(wpns + "comment_approved") == "1")
        {

            string commentAuthor = (string)comment.Element(wpns + "comment_author");
            string commentEmail = (string)comment.Element(wpns + "comment_author_email");
            string commentUrl = (string)comment.Element(wpns + "comment_author_url");
            string commentIP = (string)comment.Element(wpns + "comment_author_IP");
            string commentBody = (string)comment.Element(wpns + "comment_content");
            DateTime commentDate = DateTime.Parse((string)comment.Element(wpns + "comment_date"));
            
            Document commentdoc = Document.MakeNew(commentAuthor, DocumentType.GetByAlias("BlogComment"), author, doc.Id);
            commentdoc.getProperty("blogCommentAuthor").Value = commentAuthor;
            commentdoc.getProperty("blogCommentAuthorEmail").Value = commentEmail;
            commentdoc.getProperty("blogCommentAuthorURL").Value = commentUrl;
            commentdoc.getProperty("blogCommentAuthorIP").Value = commentIP;
            commentdoc.getProperty("blogCommentBody").Value = commentBody;
            commentdoc.CreateDateTime = commentDate;

            commentdoc.Publish(author);
            umbraco.library.UpdateDocumentCache(commentdoc.Id);
        }

    }

}
   
&lt;/pre&gt;

&lt;p&gt;I am using some external methods to parse the body text of the
posts. This is because Wordpress doesn't save html, but puts in
linebreaks and renders paragraph tags at render time... brrrr...
There are also some [source] tags leftover from the syntax
highlighter plugin that I need to change:&lt;/p&gt;

&lt;p&gt;These are the three methods I am using to parse the text:&lt;/p&gt;

&lt;pre&gt;
public static string CreateParagraphTags(string postbody)
{
    StringBuilder sb = new StringBuilder();
    sb.Append("&amp;lt;p&amp;gt;");
    sb.Append(postbody.Replace("\n\n", "&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;"));
    sb.Append("&amp;lt;/p&amp;gt;");
    return sb.ToString();
}

public static string ChangeImageUrls(string postbody)
{
    string parsedstring = Regex.Replace(postbody, "src=\"/wp-content", "src=\"/media/images", RegexOptions.Singleline);
    return Regex.Replace(parsedstring, "href=\"/wp-content", "href=\"/media/images", RegexOptions.Singleline);
}

public static string ParseCodeBlocks(string postbody)
{
    Regex regPattern = new Regex(@"(\[source(.*?)\])(.*?)(\[/source\])", RegexOptions.Singleline);
    Dictionary&amp;lt;string, string&amp;gt; replaceValues = new Dictionary&amp;lt;string, string&amp;gt;();

    int i = 0;
    foreach (Match match in regPattern.Matches(postbody))
    {
        string code = match.Groups[3].Value;
        if (code.Contains("&amp;lt;"))
        {
            code = code.Replace("&amp;lt;", "&amp;amp;lt;").Replace("&amp;gt;", "&amp;amp;gt;");
        }
        postbody = postbody.Replace(match.Value, string.Format("[[[replacecode{0}]]]", i));
        replaceValues.Add(string.Format("[[[replacecode{0}]]]", i), "&amp;lt;pre&amp;gt;" + code + "&amp;lt;/pre&amp;gt;");
        i++;
    }

    foreach (KeyValuePair&amp;lt;string, string&amp;gt; replaceValue in replaceValues)
    {
        postbody = postbody.Replace(replaceValue.Key, replaceValue.Value);
    }

    return postbody;
}
   
&lt;/pre&gt;

&lt;p&gt;It's not perfect. For example it added some strange &amp;lt;p&amp;gt;
tags inside my code blocks, but no more than I could handle by
doing manual updates. For these methods I added some unit tests. It
is just so much nicer to work with RegEx when you have tests to see
if you are breaking existing matches while changing this stuff.&lt;/p&gt;

&lt;p&gt;So there you have it. Posts imported and ready to go. It's so
easy I don't know why I didn't get around to it before :-)&lt;/p&gt;
</description><guid>1506</guid><link>http://www.mortenbock.dk/blog/2009/10/13/importing-wordpress-posts-to-umbraco.aspx</link><pubDate>13. oktober 2009</pubDate><title>Importing Wordpress posts to Umbraco</title></item><item><author>Morten Bock</author><category>blog</category><category>plugin</category><category>syntaxhighlighter</category><category>theme</category><category>wordpress</category><description>&lt;p&gt;Jeg har længe gået og overvejet hvordan jeg skulle få implemeteret noget kode highlighting ind i min blog, så jeg uden problemer kan skrive snippets i flere forskellige sprog.&lt;/p&gt;&lt;p&gt;Ud af det blå fandt jeg så &lt;a target="_blank" title="Code Highlighting for Wordpress" href="http://erik.range-it.de/wordpress/plugins/syntaxhighlighter/"&gt;SyntaxHighlighter&lt;/a&gt; af  &lt;a target="_blank" href="http://erik.range-it.de/"&gt;Erik Range&lt;/a&gt;. &lt;!--more--&gt;Den benytter samme &lt;a target="_blank" href="http://www.dreamprojections.com/SyntaxHighlighter/"&gt;dp.SyntaxHighlighter&lt;/a&gt; som &lt;a target="_blank" href="http://forum.umbraco.org/"&gt;Umbraco Forum&lt;/a&gt;, og den synes jeg egentlig meget godt om, så nu vil det vise sig hvordan den funker her.&lt;/p&gt;&lt;p&gt;Så here goes testing :-)&lt;/p&gt;&lt;p&gt;&lt;pre&gt;
// Re-insert source as (formatted) textarea
foreach( $theSources[2] as $sourceID =&gt; $sourceBlock ) {
$theBrush =
$theSources[1][$sourceID].$optionString;
$thePost = str_replace(
”{sourceID:{$sourceID}}”,
sprintf( $textArea, $theBrush, $syntaxOptions[’syntaxCols’], $syntaxOptions[’syntaxRows’], $sourceBlock ),
$thePost
);
}
&lt;/pre&gt;&lt;/p&gt;&lt;p&gt;Så spiller det. Installationen gik smertefrit, men det tog lige lidt tid før jeg opdagede at jeg ikke havde fået wp_footer() tag'et med i min wordpress template, og derfor blev der ikke linket til de relevante javascript filer. Men eller syne jeg det kører helt fint. Og man kan endda selv vælge hvilke javascript filer man vil have med. Der er jo ingen grund til at hente 35k javascript når man kun har brug for 5 :-)&lt;/p&gt;&lt;p&gt;Nu skal jeg så bare lige overveje om jeg bør droppe den ene sidebar så der rent faktisk også er plads til at skrive kode i bredden...&lt;/p&gt;</description><guid>1131</guid><link>http://www.mortenbock.dk/blog/2006/11/20/tester-syntaxhighlighter-i-wordpress.aspx</link><pubDate>20. november 2006</pubDate><title>Tester SyntaxHighlighter i Wordpress</title></item><item><author>Morten Bock</author><category>blog</category><category>css</category><category>theme</category><category>wordpress</category><category>xhtml</category><description>&lt;p&gt;Nogle gange er det rart at skabe noget helt fra bunden. Så jeg har besluttet at skabe mit eget wordpress theme helt fra scratch. Indtil videre har jeg fået styr på strukturen med to sidebars, så der er plads til masser af gode links, og så en tydelig opdeling af indlæggene på forsiden.&lt;/p&gt;&lt;p&gt;Jeg mangler endnu at lave en footer, men det kommer senere... all in good time. Untill then, &lt;em&gt;"I gotta go see a man about a horse"&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Forslag til farveskema er meget velkommne. Sort/hvid bliver gerne en smule anstrengende i længden...&lt;/p&gt;&lt;p&gt;Hvis du synes din blog mangler på min blogroll, så send mig en mail, eller smid en kommentar. Du kan stadig nå at blive den første! :-)&lt;/p&gt;</description><guid>1079</guid><link>http://www.mortenbock.dk/blog/2006/08/05/struktur-og-positioner-pa-plads.aspx</link><pubDate>05. august 2006</pubDate><title>Struktur og positioner på plads</title></item><item><author>Morten Bock</author><category>blog</category><category>css</category><category>tag-ping</category><category>theme</category><category>wordpress</category><category>xhtml</category><description>&lt;p&gt;Så fik jeg taget hul på at opbygge et wordpress tema. Jeg ved det ser lidt beskedent ud endnu, men det tager lidt tid at bygge den slags op fra bunden :-) Men alt bliver i den skønnes XHTML og der kommer ingen styles eller noget som helst i sidens kildekode. Det lover jeg (næsten)&lt;/p&gt;</description><guid>1077</guid><link>http://www.mortenbock.dk/blog/2006/08/03/tid-til-at-lege-med-theme.aspx</link><pubDate>03. august 2006</pubDate><title>Tid til at lege med theme</title></item><item><author>Morten Bock</author><category>plugin</category><category>wordpress</category><description>&lt;p&gt;Så fik jeg installeret Ultimate Tag Warrior, der gør det muligt ud over at putte et indlæg i en kategori, så kan det også beskrive med nogle tags/keywords.&lt;/p&gt;</description><guid>1073</guid><link>http://www.mortenbock.dk/blog/2006/07/31/ny-plugin-pa-plads.aspx</link><pubDate>31. juli 2006</pubDate><title>Ny plugin på plads</title></item><item><author>Morten Bock</author><category>blog</category><category>plugin</category><category>rss-feed</category><category>wordpress</category><description>
&lt;p&gt;Velkommen til mortenbock.dk!&lt;/p&gt;

&lt;p&gt;Jeg har netop installeret wordpress, og skal nu igang med at
modificere, tilpasse osv. osv.&lt;/p&gt;

&lt;p&gt;Vent lidt endnu med at tilføje feeds osv, da disse efter al
sansynlighed skifter url inden længe.&lt;/p&gt;

&lt;p&gt;Men nu er vi igang!&lt;/p&gt;
</description><guid>1074</guid><link>http://www.mortenbock.dk/blog/2006/07/31/hello-world.aspx</link><pubDate>31. juli 2006</pubDate><title>Hello world!</title></item></channel></rss>