Skip to content

Circular references on Page Tree causes PDF::Reader to crash with SystemStackError #530

@tomascco

Description

@tomascco

Pages-tree-refs.pdf (source)
Running the following script with the attached PDF renders the following error:

require "bundler/inline"

gemfile do
  gem "pdf-reader"
end

PDF::Reader.new("Pages-tree-refs.pdf").pages
# /usr/local/bundle/gems/pdf-reader-2.12.0/lib/pdf/reader/reference.rb:65:in `hash': stack level too deep (SystemStackError)

This is caused by a circular reference with Page Tree objects:

% ...
1 0 obj
  << /Type /Catalog
     /Pages 2 0 R
  >>
endobj

2 0 obj
  << /Type /Pages
     /Kids [6 0 R 3 0 R]
     /Count 2
     /MediaBox [0 0 595 842]
  >>
endobj

3 0 obj
  << /Type /Pages
     /Kids [4 0 R]
     /Count 1
     /MediaBox [0 0 595 842]
  >>
endobj

4 0 obj
  << /Type /Pages
     /Kids [5 0 R]
     /Count 1
     /MediaBox [0 0 595 842]
  >>
endobj

5 0 obj
  << /Type /Pages
     /Kids [3 0 R]
     /Count 1
     /MediaBox [0 0 595 842]
  >>
endobj
% ...

Here we can observe that 2 0 R is the root, that has two children: 6 0 R and the problematic 3 0 R:

3 0 R --> 4 0 R --> 5 0 R --> 3 0 R <-- the cycle restarts here.

I would like to give an shot to solve this, may I do it?

Context: I've been using PDF::Reader as a dependency of a gem created for my undergraduate thesis (https://github.com/tomascco/rubrik). As part of my research, I've tested PDF::Reader against some of the PDFs on the pdf.js repository (https://github.com/mozilla/pdf.js/tree/master/test/pdfs) and found some cases like this one.

I'd also like give some feedbacks as someone that used PDF::Reader as a dependency for a higher level PDF interface.

Would these patches and suggestions be welcome? @yob

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions