How to read Microsoft Word document contents using C#/.NET?


There could be some possible business needs for your .NET application to read the text contents from a Microsoft Word document. This can be simply done using the APIs exposed in the dll named 'Microsoft.Office.Interop.Word.dll'.

 

Let's have a quick look on how to do it using C#/.NET. Also find the complete source code shared here for your easy reference.

 

How to read Microsoft Word document content using CSharp (www.kunal-chowdhury.com)

 

First, you need to create the 'Word' application and open the document by providing the file path. You can call the 'wordApp.Documents.Open' method to read it (as shown below). The next step is to read the content and extract the Text out of it. There are couple of other properties and methods exposed by this API, but as we don't need those while reading the text content, we are not going to discuss on it.

 

You would like to read:


How to read Microsoft Word document contents using C#/.NET?

How to read Microsoft Excel document contents using C#/.NET?

How to read Microsoft PowerPoint document contents using C#/.NET?


 

Here's the complete source code for you to use, but please make sure to properly release the COM objects at the place where it is mentioned:

 

  public static string GetTextFromWordDocument(object filePath)
  {
      var filePathAsString = filePath as string;
      if (string.IsNullOrEmpty(filePathAsString))
      {
          throw new ArgumentNullException("filePath");
      }
   
      if (!File.Exists(filePathAsString))
      {
          throw new FileNotFoundException("Could not find file", filePathAsString);
      }
   
      var textFromWordDocument = string.Empty;
      Word.Application wordApp = new Word.Application();
      Word.Document wordDocument = null;
      Word.Range wordContentRange = null;
   
      try
      {
          wordDocument = wordApp.Documents.Open(ref filePath, Missing.Value, true);
          wordContentRange = wordDocument.Content;
          textFromWordDocument = wordContentRange.Text;
      }
      catch
      {
          // handle the COM exception
      }
      finally
      {
          if (wordDocument != null) { wordDocument.Close(false); }
          if (wordApp != null) { wordApp.Quit(false); }
                  
          ReleaseComObject(wordDocument);
          ReleaseComObject(wordContentRange);
          ReleaseComObject(wordDocument);
          ReleaseComObject(wordApp);
      }
   
      return textFromWordDocument;
  }

 

Was it helpful? Do let me know if you have any queries. Stay tuned for more updates.

 

 





9to6linux.com | Covering latest news, articles, Tips and Tricks on Linux platform


Latest Tech News