SuperC: Parsing all of C by taming the preprocessor

Paul Gazzillo, Robert Grimm

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

C tools, such as source browsers, bug finders, and automated refactorings, need to process two languages: C itself and the preprocessor. The latter improves expressivity through file includes, macros, and static conditionals. But it operates only on tokens, making it hard to even parse both languages. This paper presents a complete, performant solution to this problem. First, a configuration preserving preprocessor resolves includes and macros yet leaves static conditionals intact, thus preserving a program's variability. To ensure completeness, we analyze all interactions between preprocessor features and identify techniques for correctly handling them. Second, a configuration- preserving parser generates a wellformed AST with static choice nodes for conditionals. It forks new subparsers when encountering static conditionals and merges them again after the conditionals. To ensure performance, we present a simple algorithm for table-driven Fork-Merge LR parsing and four novel optimizations. We demonstrate the effectiveness of our approach on the x86 Linux kernel.

Original languageEnglish (US)
Title of host publicationPLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation
Pages323-334
Number of pages12
DOIs
StatePublished - 2012
Event33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12 - Beijing, China
Duration: Jun 11 2012Jun 16 2012

Other

Other33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12
CountryChina
CityBeijing
Period6/11/126/16/12

Fingerprint

Macros
Linux

Keywords

  • C
  • Fork-merge LR parsing
  • LR parsing
  • Preprocessor
  • SuperC

ASJC Scopus subject areas

  • Software

Cite this

Gazzillo, P., & Grimm, R. (2012). SuperC: Parsing all of C by taming the preprocessor. In PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 323-334) https://doi.org/10.1145/2254064.2254103

SuperC : Parsing all of C by taming the preprocessor. / Gazzillo, Paul; Grimm, Robert.

PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012. p. 323-334.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gazzillo, P & Grimm, R 2012, SuperC: Parsing all of C by taming the preprocessor. in PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 323-334, 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'12, Beijing, China, 6/11/12. https://doi.org/10.1145/2254064.2254103
Gazzillo P, Grimm R. SuperC: Parsing all of C by taming the preprocessor. In PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012. p. 323-334 https://doi.org/10.1145/2254064.2254103
Gazzillo, Paul ; Grimm, Robert. / SuperC : Parsing all of C by taming the preprocessor. PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation. 2012. pp. 323-334
@inproceedings{637faeee82aa4010ac0f091ada582960,
title = "SuperC: Parsing all of C by taming the preprocessor",
abstract = "C tools, such as source browsers, bug finders, and automated refactorings, need to process two languages: C itself and the preprocessor. The latter improves expressivity through file includes, macros, and static conditionals. But it operates only on tokens, making it hard to even parse both languages. This paper presents a complete, performant solution to this problem. First, a configuration preserving preprocessor resolves includes and macros yet leaves static conditionals intact, thus preserving a program's variability. To ensure completeness, we analyze all interactions between preprocessor features and identify techniques for correctly handling them. Second, a configuration- preserving parser generates a wellformed AST with static choice nodes for conditionals. It forks new subparsers when encountering static conditionals and merges them again after the conditionals. To ensure performance, we present a simple algorithm for table-driven Fork-Merge LR parsing and four novel optimizations. We demonstrate the effectiveness of our approach on the x86 Linux kernel.",
keywords = "C, Fork-merge LR parsing, LR parsing, Preprocessor, SuperC",
author = "Paul Gazzillo and Robert Grimm",
year = "2012",
doi = "10.1145/2254064.2254103",
language = "English (US)",
isbn = "9781450312059",
pages = "323--334",
booktitle = "PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation",

}

TY - GEN

T1 - SuperC

T2 - Parsing all of C by taming the preprocessor

AU - Gazzillo, Paul

AU - Grimm, Robert

PY - 2012

Y1 - 2012

N2 - C tools, such as source browsers, bug finders, and automated refactorings, need to process two languages: C itself and the preprocessor. The latter improves expressivity through file includes, macros, and static conditionals. But it operates only on tokens, making it hard to even parse both languages. This paper presents a complete, performant solution to this problem. First, a configuration preserving preprocessor resolves includes and macros yet leaves static conditionals intact, thus preserving a program's variability. To ensure completeness, we analyze all interactions between preprocessor features and identify techniques for correctly handling them. Second, a configuration- preserving parser generates a wellformed AST with static choice nodes for conditionals. It forks new subparsers when encountering static conditionals and merges them again after the conditionals. To ensure performance, we present a simple algorithm for table-driven Fork-Merge LR parsing and four novel optimizations. We demonstrate the effectiveness of our approach on the x86 Linux kernel.

AB - C tools, such as source browsers, bug finders, and automated refactorings, need to process two languages: C itself and the preprocessor. The latter improves expressivity through file includes, macros, and static conditionals. But it operates only on tokens, making it hard to even parse both languages. This paper presents a complete, performant solution to this problem. First, a configuration preserving preprocessor resolves includes and macros yet leaves static conditionals intact, thus preserving a program's variability. To ensure completeness, we analyze all interactions between preprocessor features and identify techniques for correctly handling them. Second, a configuration- preserving parser generates a wellformed AST with static choice nodes for conditionals. It forks new subparsers when encountering static conditionals and merges them again after the conditionals. To ensure performance, we present a simple algorithm for table-driven Fork-Merge LR parsing and four novel optimizations. We demonstrate the effectiveness of our approach on the x86 Linux kernel.

KW - C

KW - Fork-merge LR parsing

KW - LR parsing

KW - Preprocessor

KW - SuperC

UR - http://www.scopus.com/inward/record.url?scp=84863470851&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863470851&partnerID=8YFLogxK

U2 - 10.1145/2254064.2254103

DO - 10.1145/2254064.2254103

M3 - Conference contribution

AN - SCOPUS:84863470851

SN - 9781450312059

SP - 323

EP - 334

BT - PLDI'12 - Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation

ER -