How to read Microsoft Word document contents using C#/.NET?

How to read Microsoft Word document contents using C#/.NET?


There could be some possible business needs for your .NET application to read the text contents from a Microsoft Word document. This can be simply done using the APIs exposed in the dll named 'Microsoft.Office.Interop.Word.dll'.

 

Let's have a quick look on how to do it using C#/.NET. Also find the complete source code shared here for your easy reference.

 

How to read Microsoft Word document content using CSharp (www.kunal-chowdhury.com)

 

First, you need to create the 'Word' application and open the document by providing the file path. You can call the 'wordApp.Documents.Open' method to read it (as shown below). The next step is to read the content and extract the Text out of it. There are couple of other properties and methods exposed by this API, but as we don't need those while reading the text content, we are not going to discuss on it.

 

You would like to read:


How to read Microsoft Word document contents using C#/.NET?

How to read Microsoft Excel document contents using C#/.NET?

How to read Microsoft PowerPoint document contents using C#/.NET?


 

Here's the complete source code for you to use, but please make sure to properly release the COM objects at the place where it is mentioned:

 

   1:  public static string GetTextFromWordDocument(object filePath)
   2:  {
   3:      var filePathAsString = filePath as string;
   4:      if (string.IsNullOrEmpty(filePathAsString))
   5:      {
   6:          throw new ArgumentNullException("filePath");
   7:      }
   8:   
   9:      if (!File.Exists(filePathAsString))
  10:      {
  11:          throw new FileNotFoundException("Could not find file", filePathAsString);
  12:      }
  13:   
  14:      var textFromWordDocument = string.Empty;
  15:      Word.Application wordApp = new Word.Application();
  16:      Word.Document wordDocument = null;
  17:      Word.Range wordContentRange = null;
  18:   
  19:      try
  20:      {
  21:          wordDocument = wordApp.Documents.Open(ref filePath, Missing.Value, true);
  22:          wordContentRange = wordDocument.Content;
  23:          textFromWordDocument = wordContentRange.Text;
  24:      }
  25:      catch
  26:      {
  27:          // handle the COM exception
  28:      }
  29:      finally
  30:      {
  31:          if (wordDocument != null) { wordDocument.Close(false); }
  32:          if (wordApp != null) { wordApp.Quit(false); }
  33:                  
  34:          ReleaseComObject(wordDocument);
  35:          ReleaseComObject(wordContentRange);
  36:          ReleaseComObject(wordDocument);
  37:          ReleaseComObject(wordApp);
  38:      }
  39:   
  40:      return textFromWordDocument;
  41:  }

 

Was it helpful? Do let me know if you have any queries. Stay tuned for more updates.

 

 


If you have come this far, it means that you liked what you are reading. Why not reach little more and connect with me directly on Twitter, Facebook, Google+ and LinkedIn. I would love to hear your thoughts and opinions on my articles directly. Also, don't forget to share your views and/or feedback in the comment section below.

6 comments

  1. C# is the part of .Net and it is something new.. and very good development

    ReplyDelete
  2. Where can I find 'Microsoft.Office.Interop.Word.dll'?

    ReplyDelete
    Replies
    1. You can install the VSTO (Visual Studio Tools for Office) SDK, which you can grab from Microsoft sites.

      Delete
    2. You can also download it from NuGet Package Manager. Search for Interop.Word and you will be able to select the proper version no. that you need to install.

      Delete

 
© 2008-2017 Kunal-Chowdhury.com - Microsoft Technology Blog for developers and consumers | Designed by Kunal Chowdhury
Back to top