Archive for the ‘Open Office Xml’ category

Outlook using Word 2007 as HTML Rendering Engine

September 7, 2007


Interesting postings here, here and here.

A common business need: Generating server-side documents on the fly

August 30, 2007


I gathered a list of common Open XML questions related to programmability:

  1. What are the Open XML File Formats and what can I do with them?
  2. Can you show me the internal structure of a Word 2007 document?
  3. What are WordprocessingML, SpreadsheetML, PresentationML, and DrawingML?
  4. Do you have a .NET API that I can use to generate documents programmatically (server-side)?
  5. What is the architecture of a server-side OBA document generation solution?
  6. How can I generate a document programmatically and have more control over document content?
  7. How do you add images to an Open XML document?
  8. How can I pull data from my data source and create a table in a document?
  9. How can I add styles and format to my document content?
  10. What about compatibility with previous versions of Office?

Office Business Applications + Open XML File Formats

When you are trying to create a document assembly solution and you are want to understand how you can use the Open XML File Formats to generate a document programmatically, you may be faced to some of the previous questions. All this questions have been answered in multiple MSDN articles, SDKs, blogs, trainings, forums, and newsgroups. However, I am the kind of person that loves end-to-end documentation and code samples that take you from zero to a working solution. We all have limited time to learn new technologies and walkthrough articles and code sample downloads are always a nice option.

Some time ago I tried to do the same thing and I blogged to show you how to generate a document using a document template, content controls, and XML mapping. I also created a little video and article that shows how to bind custom xml to a document template. This approach is great when you are trying to replace placeholder data in document templates like an invoice or contract. However, your business needs may be different and you may want to have more control over document content and formatting. In that case a better approach would be to manipulate the WordProcessingML content stored in different document parts.

I wrote a new article that helps answer the Open XML questions listed in this blog entry. I split the article in two parts and a code sample download. I start by discussing all the theory and basic concepts you need to learn to work with the Open XML File Formats. For example, I talk about Open XML Package Architecture, WordprocessingML basics, the Open XML object model, and the conceptual architecture of a document integration solution.

The second part explains all the coding that needs to happen to generate a simple sales document from scratch. I show you how to deal with images, tables, styles, and formatting. I also show how to create a helper class that pulls data from your line of business systems (in this case the AdventureWorks sample database to keep the LOB piece as simple as possible), and a helper class that uses the Open XML object model and WordprocessingML to create a document.

You can find the articles and code samples here:

Many thanks go to Doug Mahugh, Wouter van Vugt, and Frank Rice for sharing all their knowledge and helping me put this together. I hope this helps you get started with custom document generation with Open XML.


A common business need: Generating server-side documents on the fly

Office Open XML Developer Resources

August 30, 2007


Speaking of Tech Ed 2007, some of the discussions I had with ISVs revolved around Office Open XML and the current activity with the ISO standardisation process. There were basically two types of things ISVs wanted to know. The first was about the ISO standardisation process itself, why it matters, how it affects ISVs and how Office Open XML compares to existing document standards. The second common topic of discussion was around development resources and how to actually work with the Office Open XML formats. By the way, if you haven’t seen the Ecma Office Open XML standard (Ecma-376) it is available on the Ecma website.

Some good information to summarise the first category of discussion is available over at Sean McBreen’s blog. Sean is the Director of the group that looks after ISVs in New Zealand and has done a good job of answering some of the popular misconceptions around the Ecma Office Open XML standard; in particular around Intellectual Property Rights and why we need multiple standards. Rather than posting all the information I am going to link to the relevant posts on Sean’s blog. I recommend you read them to help you understand the current interest and activity around the ISO standardisation of Office Open XML. 

The post titled IPR on Ecma Office Open XML explains the three options an implementer of the standard can select from to use with their implementation of the Office Open XML format.  

The post titled My top 4 questions on Ecma Office Open XML and simple answers answers the questions we hear asked most commonly.

Another thing to make clear is that Office Open XML is already a standard, an Ecma standard known as Ecma-376. All the discussion at the moment is around whether it should be an ISO standard as well. One side of the debate is of the opinion is that there is already an ISO standard document format (ODF) and we don’t need another one. My top 4 questions on Ecma Office Open XML and simple answers and Why have another document standard? provide our view on why it is important to have multiple standards.

The other popular misconception is that the Office Open XML format is a Microsoft only format. While true that the initial work was done by Microsoft, the technology was first submitted to Ecma in late 2005 and since then has been through significant change based on the recommendations of the Ecma Technical Committee (which includes representatives from Apple, Barclays Capital, BP, The British Library, Essilor, Intel, NextPage, Novell, Statoil, Toshiba, and the United States Library of Congress). Actually, I noticed recently that Apple’s latest version of iWork ’08 mentions as one of the features that it can import Word, PowerPoint and Excel documents that use the OOXML format.  

For the second category of discussion, the meat and potatoes of how to use the Office Open XML formats I can recommend the following resources.

The web site is a great place to start:

Earlier in 2007 a series of Open XML developer workshops were run in 30 countries. The content of the workshop is available on line, including all the presentations, samples, and lab manuals. This is a great way to get up to speed fast on using Open XML formats.

There is a book called “Open XML Explained” available for download. The sample documents shown in the book are available here. The author of the book, Wouter Van Vugt, is a software development trainer/consultant who specializes in the Open XML file formats. He participates in the forums here on, and has a blog where he covers Open XML and other .NET development topics.

There are also all sorts of interesting articles about working with Open XML in the Library section of the website. 

On MSDN there is the XML in Office Developer Portal which contains information about using Open XML and includes a link to a preview of the SDK for Open XML Formats. The SDK provides strongly-typed part classes to manipulate Open XML documents. There is also an MSDN Forum called the Microsoft SDK for Open XML Formats where you can get assistance with the SDK.

There are a bunch of videos on Channel 9 about using Open XML including this one about how MindJet (an ISV in the U.S.) are using Open XML in their latest version of MindManager Pro. 

I know a number of New Zealand ISVs are already using Office Open XML to output content from their own applications for reporting, automatic document generation, document re-purposing, archiving etc. If any of you have found Open XML resources that may be useful to others, why not share them in a comment?   

Office Open XML Developer Resources

Open XML workshop videos

August 30, 2007


MSDN has published the complete set of videos of the San Francisco Open XML developer workshop that was hosted by Mindjet in June. This was one of the series of workshops that we did in over 30 locations this spring, covering the content that was recently posted on

If the workshop didn’t come to your city, now you can watch the videos from anywhere, and you can also download all of the presentations, hands-on labs, and demo files used in the workshops. The focus of the workshop was on document generation for .NET developers, but there are also Java versions of the labs available as you can find on the content page.

I participated in 16 of these workshops, starting with the one Wouter and I did in Paris last December. For the San Francisco workshop, I was lucky to be joined by Chris Predeek of the Ted Pattison Group, who was the author of many of the hands-on labs we used. It was great to work with Chris, and I learned a lot watching him walk through the code for the hands-on labs. He also covered topics such as the packaging API, XML programming in .NET, XSLT, and the Microsoft SDK for Open XML formats.

The attendees at this workshop were an enthusiastic and interesting bunch, and many of them have longstanding experience in XML formats. Here’s a video covering some of their reactions to the training.

I’ve never been involved in a video production like this one, and it was really fun. There were lights mounted in the ceiling, cables running everywhere, cameramen and a sound person and even a guy who re-applied makeup to my head every hour to keep it from shining too much. It often felt like chaos at the time, and yet the finished result looks so … organized.

At the risk of forgetting somebody, I’d like to thank a few of the people who helped create these videos. Producer Brad Cochrane and his crew were great, and Anthony Roy and the folks at Mindjet were gracious hosts. Isaac Leonard, Erick Watson, William Leong, and Don Campbell helped with numerous little details that kept things running smooth during the event itself, Michael Scherotter gave a great overview of Mindjet’s use of Open XML, and Pauls Zommers, Kelly Bowen-McCombs and Erika Ehrli all helped get the final videos published.

And I’d like to add a special note of appreciation to the attendees, who put up with all sorts of hassles and interruptions during the workshop so that we could get these videos produced for a wider audience. Thanks, everyone!

Open XML workshop videos

Open XML for developers: Nice gifts coming our way

August 16, 2007


If you are an Open XML fan (like me), I strongly recommend you check out the latest news shared by Doug Mahugh and Wouter van Vugt. Doug released all the ppt files, code samples, and hands-on labs content for the Open XML developer workshop. I attended this training and I can tell you that I owe much of my understanding of Open XML to Doug and his superb workshop.

Also, Wouter wrote the book “Open XML Explained.” He did a pretty amazing job explaining WordprocessingML, SpreadhseetML, PresentationML, and DrawingML. The best thing is that you can get is a free download.

Have fun!

Open XML for developers: Nice gifts coming our way

XPS Showcase

August 16, 2007


The team has been busy and updated the XPS site at I’d recommend taking a look at the showcase section — there’s details on a range of software and hardware solutions (from eighteen vendors last time I looked) that support XPS. Got something that isn’t listed there? If you have information on products that you think are suitable for the showcase, send an email to the folks at

XPS Showcase

ECMA Office Open XML – Options for Generating Word Documents (on the server;))

August 6, 2007


In my last workshops together with Andreas about ECMA Office Open XML file formats I demonstrated a way for generating Word documents using the packaging API and custom XML data islands. But actually the possibility I presented in the workshop is not the only one… and I have to admit… it’s not the easiest one.

Therefore I thought I’ll share my thoughts on generating documents without the need of having Office installed on your machine based on the new file format including the advantages and disadvantages for each approach. The options I’ll take a look at are the following ones (of course you can find downloads for samples I’ve created always at the end of each blog-entry):

Easiest way possible
Using Custom XML islands together with content controls

Hard and powerful
Using Custom XML islands together with dynamically generated content controls

Medium but still powerful
Custom attached schema to mark content in document relevant for automated generation

Out of discussion (in my opinion)
Invent custom “markup tags” for marking content in documents and generated based on these

In this and subsequent blog entries I’ll discuss each of these ones including their advantages and disadvantages. For this blog I’ll start off with the easiest one – just leveraging custom XML data islands.

Generating Documents with Custom XML Data Islands & Content Controls

This approach leverages the new functionality introduced with Word 2007 with custom XML data islands and content controls. Your code which is generating the document (probably running on the server) just updates the custom XML island in the Office Open XML package and the content is bound to the document surface using content controls.

What are typical characteristics of scenarios for this approach?
You have a fixed form-structure for your document that does not need to be extended, dynamically. You just need to fill in information into fields and fields of “fixed-sized” tables on the server-side without extending Word-specific markup (such as adding new rows to tables).

How does this approach work?
The first thing you need to do is agreeing on the data-structure you are working with. This data-structure models a schema that describes the business-information stored in your document. Based on this schema you can create a document template, design it’s UI with content controls and bind the controls using the Word Content Control Toolkit published on Codeplexx. On the server you simply use the System.IO.Packaging API shipping with the .NET Framework 3.0 for modifying the custom XML. Whenever the user opens the document in Word 2007, information from the custom XML is bound to the document UI automatically and any changes in the document are reflected back to the custom XML data island. More details later in this blog…

What are the advantages of this approach?

  • It’s very easy to implement.
  • You can write program logic that works directly with your business-data structure based on an XML schema.
  • With content control, business data is automatically updated in both, the document and the custom XML data island without writing additional, client sode code.
  • You don’t know anything about Word programming on the client or Word Processing ML in your custom code.

What are the disadvantages of this approach?

  • It works with Word 2007, only. Although you can open the files with older versions of Office without any problems, older versions of Office don’t have the functionality of Content Controls available as this was not implemented in previous Office versions, yet.
  • This is a very limited approach because it does not allow you to extend content in the document such as table rows, columns or complete sections in the document at all.

Sample Implementation

Last but not least I want to round off this first blog entry with a simple sample implementation demonstrating this approach. The first thing is starting up with an XML schema describing our content. Let’s assume we are using Word for automatically generating invoice documents. A schema for an invoice could look similar to the following (we assume this schema for the remaining post and the subsequent posts I’ll create on this topic):

<xs:schema targetNamespace="" elementFormDefault="qualified" xmlns="" xmlns:mstns="" xmlns:xs="">
  <xs:element name="InvoiceElement">
        <xs:element name="InvoiceNumber" type="xs:string" />
        <xs:element name="InvoiceDate" type="xs:date" />
        <xs:element name="Customer">
              <xs:element name="CompanyName" type="xs:string" />
              <xs:element name="Contact">
                    <xs:element name="Firstname" type="xs:string" />
                    <xs:element name="Lastname" type="xs:string" />
                    <xs:element name="Email" type="xs:string" />
              <xs:element name="Street" type="xs:string" />
              <xs:element name="ZipCode" type="xs:string" />
              <xs:element name="City" type="xs:string" />
              <xs:element name="Country" type="xs:string" />
        <xs:element name="InvoiceItems">
              <xs:element name="InvoiceItem" maxOccurs="unbounded">
                    <xs:element name="ItemNumber" type="xs:int" />
                    <xs:element name="ItemName" type="xs:string" />
                    <xs:element name="ItemAmount" type="xs:decimal" />
                    <xs:element name="ItemUnitPrice" type="xs:decimal" />
                    <xs:element name="ItemToals" type="xs:decimal" />
<xs:element name="InvoiceTotals" type="xs:decimal" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

Based on this schema we now can generate a class with xsd.exe, a tool shipping with the .NET Framework for generating .NET types out of XML schemas and vice versa. This enables us later to use the XmlSerializer of the .NET Framework System.Xml.Serialization namespace to serialize and de-serialize XML instances based on this schema into the .NET type we generated through xsd.exe. Thus we save ourselves from writing the typical XPath code when working with XML in a DOM;)

Visual Studio 2005 Command Prompt

As I am demoing the stuff in a web application, I copy the generated code file to the App_Code sub directory of my web application. Now we are set to proceed with the next steps. If you don’t know why I am doing this right now you will definitely understand better when we start writing the code for generating the document in our web application on the server.

Next we can design the document for our invoice including the content controls. For finding content controls you need to activate the developer tab ribbon in the Word options. The invoice we’re designing could look similar to the following one whereas the content controls I am using are text block (either simple or RTF), drop-down lists and date-time drop-downs.


As you can see we’ve marked the parts of the document we would like to bind to our XML schema structure with Word Content Controls from the developer tab ribbon. In the developer tab ribbon you will find these controls in the “Controls” group as you can see in the image above, as well. Furthermore for this simple type of document generation our document contains a fixed number of item rows (in my example 5, but of course it can be more). Each row of the table contains several content controls which we will bind to a row in an XML instance using the word content control toolkit as you can see in the next figure below. But before we can do that we have to create a Sample-XML document that we will add to our document as a Custom XML data island with the Word Content Control Toolkit. The XML document instance based on our schema needs to have 5 InvoiceItem as our template supports 5 items as you can see above (that’s it for this rather simple approach, of course you can do it dynamically as well which is more complex and covered in one of my subsequent blog entries).

<InvoiceElement xmlns="">
    <CompanyName>Microsoft Austria</CompanyName>
    <Street>Am Euro Platz 3</Street>
    <!-- ... 3 more InvoiceItem Items ... -->

Now let’s move on save the previously created Word document into your web project’s directory folder as a template, close word, open the Word Content Control toolkit. Create a new custom XML using the Content control toolkit and copy the XML shown above in the edit-mode of the XML pane of the tool. Then switch to bind mode and bind each of the XML elements to one content control in your document as shown below:


As you can see in the image above, each InvoiceItem’s members are bound to different content place holder controls for the document. This document now acts as a template for our solution. The solution we’re creating is a simple ASP.NET page that leverages the previously created classes (using XSD.EXE to create classes from the schema) to bind content from the classes to the UI of the application. The application looks similar to the following one and uses ASP.NET object binding to bind the objects’ content to the actual UI of the ASP.NET web application. I don’t want to spend too much time on ASP.NET in this posting as you can download the sample from the attachments of this post, anyway.


For this first blog post we implement the event procedure of the “Content Control simple…” button to generate a document based on the first approach which is explained in this simple example. As we use object binding with ASP.NET and have a little, self-written class in place (you can see it when downloading the sample, it’s called InvoiceAdapter and adds a pre-populated InvoiceElement instance with 5 InoviceItem instances to the Session which are bound to the UI shown above).

So assuming that we have an InvoiceElement stored in the session, we can let the user enter information into the page and hit one of the buttons to generate a document based on the information entered into the ASP.NET based UI with a very small amount of code as follows:

InvoiceElement element = 
    Session[InvoiceAdapter.CurrentInvoiceTemplateKey] as InvoiceElement;
if (element.InvoiceItems.Length != 5)
    StatusLabel.Text = "You need exactly 5 rows as in your template for this approach!";

// Then copy the document template
string templatePath = Server.MapPath("~/InvoiceTemplate.docx");
string generatedPath = Server.MapPath
if (File.Exists(generatedPath))
File.Copy(templatePath, generatedPath);

// Now you can open the copied file using the packaging APIs
using (Package p = Package.Open(generatedPath, FileMode.Open, FileAccess.ReadWrite))
    // Now we can get the path to our custom XML in the template
    Uri partUri = new Uri("/customXml/item1.xml", UriKind.Relative);
    PackagePart part = p.GetPart(partUri);
    // Overwrite existing part, therefore open with FileMode.Create
using (Stream s = part.GetStream(FileMode.Create)) { // Now use the XmlSerialize to write back the content XmlSerializer serializer = new XmlSerializer(typeof(InvoiceElement)); serializer.Serialize(s, element); } // Flush memory-content back to the package p.Flush(); } StatusLabel.Text = "Document generated as " + string.Format("Invoice{0}.docx", element.InvoiceNumber);

That’s fairly easy – and we’ve reached an attractive goal with that little amount of code. Think of what that meant in the past when you had to work with the good, old COM automation object model (which meant you had to install Office on the server). Now you don’t need to install Office on the server to generate documents and generating documents is just simple XML processing.

Wrap-Up and final comments

That’s of course the easiest way for generating documents on the server. But actually it’s so simple and restricted that I’d rather call it “filling in business information on the server” instead of “generating documents”:)) The biggest advantage of this approach is, that it is (a) simple and (b) if you need to process the information entered or modified by the user on the server it works the same, simple way technically. Still for professional document generation we need more possibilities, of course. Therefore I’ll take a look at other approaches in subsequent posts;)

You can download the sample I’ve created for showing this approach here.

ECMA Office Open XML – Options for Generating Word Documents (on the server;))

Migrating Legacy Office File to 2007 File Format Using Office Migration Planning Manager (OMPM) Tools

July 6, 2007


The Office Migration Planning Manager (OMPM) is a collection of tools that enables preparation and conversion for migration to the Microsoft 2007 Office system. OMPM checks for, and reports on, file properties to help analyse an environment.

The OMPM File Scanner (offscan) a command-line tool was used, that scans files for conversion.

The OMPM File Scanner was used to perform two types of scans:

§ A light scan that quickly identifies the Office documents on a user’s computer or network file system.

§ A deep scan that you can perform on Office documents to gather document properties that provide indicators of potential conversion issues.

SQL Server 2005 database creation tool enables to create the OMPM database, import scanned logs, and import action logs of converted files for further analysis.

Note: While executing the command line utility (ImportActions.bat) for importing action logs on desktops which are a part of work stations a folder hierarchy needs to be created. Change Folder hierarchy from C:\OFCLogs\MACHINENAME to C:\OFCLogs\YOURDOMAIN\MACHINENAME

ACCESS 2007 reporting is used to analysis scanned & converted results and export the scanned records to a text file for conversion.

The OMPM Office File Converter (OFC) a command-line tool allows for conversion of Word, Excel and PowerPoint files to the 2007 Office file formats in bulk.

There are two approaches by which you can perform file conversions with OFC tool

1. FordersToConvert: This approach is not recommended as it only performs the conversion without creating action logs and thus prohibiting the scanned files to be analysed by the Access reporting tool after conversion.

2. FileList: This approach is the recommended one as it performs the conversion and also creates the action logs which can be imported into the SQL Server database for analysis of the converted files by the Access reporting tool.

OMPM Migration Flow of Office File Conversion

OMPM Migration Flow Of Office File Conversion


Official online documentation:


Document Migration considerations:

Office File Converter (OFC)

Open XML Overview


Migrating Legacy Office File to 2007 File Format Using Office Migration Planning Manager (OMPM) Tools

Open XML API CTP Released!

June 5, 2007


Open XML API CTP Released!

Open Office File Format Nuggets

May 24, 2007

Various links to videos on the topic