Restricted and confidential property of Oracle.
Solely for use by recipient under agreement forbidding disclosure.
often disable many of these techniques even if some hidden JavaScript remains in the file. However,
Clean Content cannot guarantee that all mechanisms of activating JavaScript can be prevented.
Actions
The PDF file format supports a set of interactive features called actions. Actions can be associated
with outline items, annotations, form fields, individual pages, or the document as a whole. Example
actions include jumping to a particular destination in a document, thread, or URI location, launching
an external file, playing a sound or movie, importing or submitting form data, executing JavaScript
code, and numerous other actions. An action can be triggered based on specific user or document
interactions like opening the document, viewing a page, or selecting an outline item. Each triggering
event can execute one or more actions in sequence.
Some actions are very innocuous and don’t present any obvious risk. Other actions can be very
dangerous and have been maliciously leveraged to introduce malware into PDF documents. For
example, a user may open a document that automatically launches a malicious executable or runs
malicious JavaScript code as soon as the user opens the document.
Clean Content allows every type of PDF action to be scrubbed from the document. Each type of
action can be individually targeted for scrubbing in order to allow a particular application
environment to dictate the level of risk each type of action may present. The list of PDF actions that
can be scrubbed include Goto, GoToR, GoToE, Launch, Thread, URI, Sound, Movie, Hide, Named,
Set OCG State, Rendition, GoTo3DView, Rich Media, JavaScript, SubmitForm, Reset Form,
Import Data, Transition, and a general category of Unknown that applies to actions that may be
added to the PDF format at a later date.
The level of risk represented by each type of action is very dependent on the origin and lifecycle of
the document. The Launch and JavaScript actions provide a mechanism to execute malicious code.
The Sound, Movie, Rendition, and Rich Media actions provide a mechanism to execute media
players that may target known exploits in those players. The GoToR, GoToE, and URI actions may
link to external resources that present their own risk when activated. The various form actions may
cause data to be retrieved from or sent to an external server. The Hide and Set OCG State actions
may cause specific content in the PDF document to remain hidden from view. These interactive
features, when leveraged appropriately, allow PDF documents to provide a rich level of
functionality ranging from interactive presentations to powerful forms processing.
Clean Content is designed to remove the entire sequence of actions if a sequence includes any
action that is a scrub target. This was done in order to avoid complexities related to removing a
subset of actions from a sequence that may have interdependencies. There is one exception to this
rule. If the first action in a sequence is not a scrub target but some other action in the sequence is a
scrub target, then only the first action will remain after scrubbing. This approach was taken because
there are many cases where the first action may be a simple GoTo (internal hyperlink) action while
a more risky action follows; leaving the GoTo action allows expected outline and linking behavior
to be maintained while still removing the risky action.
Private Application Data
The PDF file format supports storing private data in PDF documents to allow extended functionality
to be created by an application. This data is stored in the Page-Piece dictionary construct in the PDF
file. For example, it is common for applications such as Adobe Illustrator and Adobe PhotoShop to