The $boundaryBox parameter
By default, the SetaPDF-Extractor component will extract all content on a pages content stream
and the content which is positioned outside the visible area of a page. To give you some more control
to this behavior you can pass a
page boundary box constant
to the \setasign\SetaPDF2\Extractor\Extractor::getResultByPageNumber() method.
This demo shows its behavior.
PHP
<?php
use setasign\SetaPDF2\Core\Document;
use setasign\SetaPDF2\Core\PageBoundaries;
use setasign\SetaPDF2\Extractor\Extractor;
// load and register the autoload function
require_once __DIR__ . '/../../../../../bootstrap.php';
$boxes = [
PageBoundaries::MEDIA_BOX,
PageBoundaries::CROP_BOX,
PageBoundaries::BLEED_BOX,
PageBoundaries::TRIM_BOX,
PageBoundaries::ART_BOX,
];
$boundaryBox = displaySelect('Page Boundary box:', $boxes);
$path = $assetsDirectory . '/pdfs/misc/Page-Boundaries.pdf';
$document = Document::loadByFilename($path);
$extractor = new Extractor($document);
$result = $extractor->getResultByPageNumber(1, $boxes[$boundaryBox]);
echo '<pre>';
echo htmlspecialchars($result);
echo '</pre>';
