Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-799

speedup re-parsing by serializing valid parse result



    • Epic
    • Status: Closed
    • Normal
    • Resolution: Won't Do
    • None
    • None
    • Compiler
    • None
    • Parser Goes Native
    • Agent
    • 5


      The new parser / evaluator can gain a speedup by reading the AST from a fast loading "pre-parsed and validated" serialization format.
      A simple test (using Marshal) shows that this could speedup the startup by a factor 2-3. This is very promising as it also means that we can slot in a native parser to produce the serialized AST if it does not exist. Using the native parser on first parse will provide a great boost also for the initial parsing. But it cannot break the 2-3 times speedup barrier as we still need to read and instantiate the AST. With some "luck" we may gain more if our deserialization can beat Marshal, and we may get better throughput by use of tabulation and other performance tricks in the serializer.

      This opens up for further possibilities such as pre-parsed and validated modules that are signed and ready to be used (compare to a jar with class files).

      We are currently in the process of exploring implementation and writing design documents. Implementation of features required to deliver this has already been implemented.


      A complication when serializing is how to handle Location information. This information is now lazily computed. Storing this in each element is costly to compute and space inefficient. If instead, the source and length is stored in the model elements (it is now in an adapter), the model could rely on the location data to be stored at the root of the model, and thus not have to have a Locator per element as an adapter. The adapter could be reconstructed on demand.

      The one remaining issue is how to handle the source text; the locator has the ability to retrieve the source text for an expression. We could skip this support, make it optional (if the source happens to be available), rely on source and model to be in sync and load the text on demand (they may be out of sync), or store a copy in the serialized model.

      There are pros and cons with all approaches. Not storing the source seems most relevant, as the ability to refer to source is mostly valuable when reporting errors, and a source file with errors does never produce a serialized model.


        Issue Links



              Unassigned Unassigned
              henrik.lindberg Henrik Lindberg
              Kurt Wall Kurt Wall
              0 Vote for this issue
              2 Start watching this issue



                Zendesk Support