Unstructured Python Library

Quick post about using the unstructured Python library for working with documents. (e.g. ETL processes)

When using this library, you’ll need to install/import the library at some point.

  • pip install unstructured[all-docs]
  • pip install unstructured[docx]

If you are only working with Word (docx) files, you will want to only install the unstructured lib using the second command above.

Why only install what you need?

  • unstructured[docx]: 56 packages installed
  • unstructured[all-docs]: 129 packages installed
  • Difference: 73 additional packages (130% more)

You’ll import the unstructured lib into your Python script with commands like the following.

  • from unstructured.partition.docx import partition_docx
  • from unstructured.partition.auto import partition

Enjoy!

Leave a comment

Hello, Welcome to DevOpsUnleashed!

A blog dedicated to sharing information about DevOps. Here, you’ll find examples, tips, and tutorials on DevOps.

Feel free to share your experiences in the comments!

Blog Categories