摘要:Syntax errors are generally easy to fix for humans, but not for parsers in general nor LR parsers in particular. Traditional "panic mode" error recovery, though easy to implement and applicable to any grammar, often leads to a cascading chain of errors that drown out the original. More advanced error recovery techniques suffer less from this problem but have seen little practical use because their typical performance was seen as poor, their worst case unbounded, and the repairs they reported arbitrary. In this paper we introduce the CPCT algorithm, and an implementation of that algorithm, that address these issues. First, CPCT reports the complete set of minimum cost repair sequences for a given location, allowing programmers to select the one that best fits their intention. Second, on a corpus of 200,000 real-world syntactically invalid Java programs, CPCT is able to repair 98.37%±0.017% of files within a timeout of 0.5s. Finally, CPCT uses the complete set of minimum cost repair sequences to reduce the cascading error problem, where incorrect error recovery causes further spurious syntax errors to be identified. Across the test corpus, CPCT reports 435,812±473 error locations to the user, reducing the cascading error problem substantially relative to the 981,628±0 error locations reported by panic mode.
关键词:Parsing; error recovery; programming languages