PDFspy

A product of:

PDFspy is the ultimate “get info” utility for your PDF documents. It can extract a comprehensive list of attributes from a PDF file into an XML-based format.

New version 1.1 adds several new features and enhancements including:

  • Support for PDF 1.7 (Acrobat 9)
  • Improved reliability with a wider variety of PDF files
  • Element now shows CMYK separations that are actually used by text and vector elements
  • New element that shows the number of shading objects in PDF file
  • Several new checks for -validate option
  • Restored output being written to stdout if -o option not used, recommend using -quiet option when writing to stdout
  • Fixed calculation of page labels
  • Improved text extraction algorithm
  • Calculates color simulation values for ICCBased, Separation and DeviceN colorspaces
  • Improved Unicode, ISO Latin and AdobePDF character set support

Some examples of the many types of information PDFspy can extract: 

  • Page information (count, size, boxes)
  • Fonts usage (name, type, embedding & subset status, use of Unicode)
  • Colorspaces used (alternates, separation names, index bases)
  • Images (size, resolution, compression, colorspace)
  • Use of transparency, smooth shadings and patterns
  • Presence (or absence) of hidden text and optional content/layers
  • Hyperlinks (size, location and destination)
  • Annotations (size, location, type, contents, colors)
  • PDF/X compliance (including output intent details)
  • Metadata (info dictionary & XMP)
  • Security and Encryption settings

Example flows

Create HTML preflight reporting

PDFspy extracts information about each page in a PDF and creates an XML report. XSLT Transform element uses a stylesheet to convert XML into a formatted preflight report organized by page.

Sort black and color jobs

The flow uses Apago's PDFspy to examine the contents of a PDF and determine if it contains color or black-only elements.