How to read Microsoft Word document contents using C#/.NET?

There could be some possible business needs for your .NET application to read the text contents from a Microsoft Word document. This can be simply done using the APIs exposed in the dll named 'Microsoft.Office.Interop.Word.dll'.


Let's have a quick look on how to do it using C#/.NET. Also find the complete source code shared here for your easy reference.


How to read Microsoft Word document content using CSharp (www.kunal-chowdhury.com)


First, you need to create the 'Word' application and open the document by providing the file path. You can call the 'wordApp.Documents.Open' method to read it (as shown below). The next step is to read the content and extract the Text out of it. There are couple of other properties and methods exposed by this API, but as we don't need those while reading the text content, we are not going to discuss on it.


You would like to read:

How to read Microsoft Word document contents using C#/.NET?

How to read Microsoft Excel document contents using C#/.NET?

How to read Microsoft PowerPoint document contents using C#/.NET?


Here's the complete source code for you to use, but please make sure to properly release the COM objects at the place where it is mentioned:


   1:  public static string GetTextFromWordDocument(object filePath)
   2:  {
   3:      var filePathAsString = filePath as string;
   4:      if (string.IsNullOrEmpty(filePathAsString))
   5:      {
   6:          throw new ArgumentNullException("filePath");
   7:      }
   9:      if (!File.Exists(filePathAsString))
  10:      {
  11:          throw new FileNotFoundException("Could not find file", filePathAsString);
  12:      }
  14:      var textFromWordDocument = string.Empty;
  15:      Word.Application wordApp = new Word.Application();
  16:      Word.Document wordDocument = null;
  17:      Word.Range wordContentRange = null;
  19:      try
  20:      {
  21:          wordDocument = wordApp.Documents.Open(ref filePath, Missing.Value, true);
  22:          wordContentRange = wordDocument.Content;
  23:          textFromWordDocument = wordContentRange.Text;
  24:      }
  25:      catch
  26:      {
  27:          // handle the COM exception
  28:      }
  29:      finally
  30:      {
  31:          if (wordDocument != null) { wordDocument.Close(false); }
  32:          if (wordApp != null) { wordApp.Quit(false); }
  34:          ReleaseComObject(wordDocument);
  35:          ReleaseComObject(wordContentRange);
  36:          ReleaseComObject(wordDocument);
  37:          ReleaseComObject(wordApp);
  38:      }
  40:      return textFromWordDocument;
  41:  }


Was it helpful? Do let me know if you have any queries. Stay tuned for more updates.



Kunal Chowdhury
If you have come this far, it means that you liked what you are reading (How to read Microsoft Word document contents using C#/.NET?).

Why not reach little more and connect with me directly on Twitter, Facebook and LinkedIn. I would love to hear your thoughts and opinions.

Authored Books: