The $boundaryBox parameter
By default the SetaPDF-Extractor component will extract all content on a pages content stream.
Also the content which is positioned outside of the visible area of a page. To give you some more control
to this behavior you can pass a
page boundary box constant
to the \SetaPDF_Extractor::getResultByPageNumber()
method.
This demo shows its behavior.
PHP
<?php // load and register the autoload function require_once __DIR__ . '/../../../../../bootstrap.php'; $boxes = [ \SetaPDF_Core_PageBoundaries::MEDIA_BOX, \SetaPDF_Core_PageBoundaries::CROP_BOX, \SetaPDF_Core_PageBoundaries::BLEED_BOX, \SetaPDF_Core_PageBoundaries::TRIM_BOX, \SetaPDF_Core_PageBoundaries::ART_BOX, ]; $boundaryBox = displaySelect('Page Boundary box:', $boxes); $path = $assetsDirectory . '/pdfs/misc/Page-Boundaries.pdf'; $document = \SetaPDF_Core_Document::loadByFilename($path); $extractor = new \SetaPDF_Extractor($document); $result = $extractor->getResultByPageNumber(1, $boxes[$boundaryBox]); echo '<pre>'; echo htmlspecialchars($result); echo '</pre>';