Windows IT Pro is the leading independent community for IT professionals deploying Microsoft Windows server and client applications and technologies.
  
  
  Advanced Search 


August 06, 2009

Tool Time: Export PDF Text with Pdftotext

RSS
Subscribe to Windows IT Pro | See More Tips Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!

If you occasionally need to export text from PDF files, pdftotext might be a handy addition to your personal toolbox. Part of Foo Labs' free Xpdf package, pdftotext is a command-line tool that automates the export process.

Using pdftotext is straightforward. If you want to export the text from a file named vmware.pdf, you can use pdftotext like this

pdftotext vmware.pdf

This command automatically creates a new file named vmware.txt in the same folder as vmware.pdf. Where possible, pdftotext will remove embedded hyphenation and line breaks. If you also want to remove physical page breaks embedded in the PDF file, you can add the -nopgbrk option:

pdftotext vmware.pdf -nopgbrk

To send the text output to the screen instead of a file, you include the - parameter at the end of the command:

pdftotext vmware.pdf -

You can use multiple parameters together as well:

pdftotext vmware.pdf -nopgbrk -

Pdftotext works only with actual text, so you won't be able to export images or scanned text that hasn't had optical character recognition (OCR) performed on it. However, it works extremely well in its specific niche.

The Xpdf package contains several other tools that can be useful for manipulating PDF files. Pdftoppm and pdftops convert PDF files to the Portable Pixel Map (PPM) or PostScript format, respectively. Pdfimages extracts all images from a PDF file, pdfinfo returns general PDF metadata, and pdffonts diagnoses font-related problems with PDF files. If you work with PDF files and like command-line tools, xpdf is well worth checking out.

End of Article



Reader Comments

You must be a registered user or online subscriber to comment on this article. Please log on before posting a comment. Are you a new visitor? Register now




Top Viewed ArticlesView all articles
Microsoft, News Corp. Discuss Locking Out Google

Microsoft and Rupert Murdoch's News Corp. recently discussed an alliance that would counter Google's fledgling online news service. ...

2009 Windows IT Pro Editors' Best and Community Choice Awards

Picking a favorite product from an impressive crowd of competitive offerings is never an easy task, and such was the case with our Editors' Best and Community Choice awards this year. ...

Command Prompt Tricks

One reader shares his tip for setting up the command prompt to reflect a remote path. ...


Related Articles Tool Time: Portable Text Editing with PrimalPad

Tool Time: Build and Burn Image Files with ImgBurn

Tool Time: Repartition Windows Servers with GParted

Tool Time: Encrypt Files with AxCrypt

Windows OSs Whitepapers Protecting Microsoft SharePoint

Related Events Deep Dive into Windows Server 2008 R2 presented by John Savill

7 Ways To Get More From Your SharePoint Deployment Now

Check out our list of Free Email Newsletters!

Windows OSs eBooks Understanding and Leveraging Code Signing Technologies

A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Related Windows OSs Resources Introducing Left-Brain.com, the online IT bookstore
Looking for books, CDs, toolkits, eBooks? Prime your mind at Left-Brain.com

Discover Windows IT Pro eLearning Series!
Clear & detailed technical information and helpful how-to's, all in our trademark no-nonsense format


Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro DevProConnections IT Job Hound
Left-Brain.com Technology Resource Directory asp.netPRO ITTV Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 © 2009 Penton Media, Inc. Terms of Use | Privacy Statement