ContextFree Path Querying: Algorithms and Applications
This project is a space for contextfree path querying (CFPQ) algorithms development, evaluation and comparison.
Main parts:
 Data set for CFPQ evaluation.
 Collection of CFPQ algorithms implemented on the top of GraphBLAS API.
 Our fork of RedisGraph where we are working on CFPQ extension for RedisGraph.
 Meerkat is a parser combinator library for CFPQ.
 CoFRA is a CFL reachability based framework for static analysis tools development. Contains ReSharper and Rider plugins as a demo.
 YaccConstructor is a sandbox for CFPQ algorithms development.
Participants
Publications

Regular expressions are used in SPARQL property paths to query RDF graphs. However, regular expressions can only define the most limited class of languages, called regular languages. Contextfree languages are a wider class containing all regular languages. There are no contextfree expressions to define them, so it is necessary to write grammars. We propose an extension of regular expressions, called recursive expressions, to support the definition of a subset of contextfree languages. The goal of our work is therefore to provide simple operators allowing the definition of languages as close as possible to contextfree languages.ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium,

ADBIS 2020. Advances in Databases and Information Systems. Lecture Notes in Computer Science.,
Contextfree path queries (CFPQ) extend the regular path queries (RPQ) by allowing contextfree grammars to be used as constraints for paths. Algorithms for CFPQ are actively developed, but J. Kuijpers et al. have recently concluded, that existing algorithms are not performant enough to be used in realworld applications. Thus the development of new algorithms for CFPQ is justified. In this paper, we provide a new CFPQ algorithm which is based on such linear algebra operations as Kronecker product and transitive closure and handles grammars presented as recursive state machines. Thus, the proposed algorithm can be implemented by using highperformance libraries and modern parallel hardware. Moreover, it avoids grammar growth which provides the possibility for queries optimization.

Contextfree path querying (CFPQ) widely used for graphstructured data analysis in different areas. It is crucial to develop highly efficient algorithms for CFPQ since the size of the input data is typically large. We show how to reduce GFPQ evaluation to solving systems of matrix equations over R  a problem for which there exist highperformance solutions. Also, we demonstrate the applicability of our approach to realworld data analysis.Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data,

GRADESNDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA),
A recent study showed that the applicability of contextfree path querying (CFPQ) algorithms with relational query semantics integrated with graph databases is limited because of low performance and high memory consumption of existing solutions. In this work, we implement a matrixbased CFPQ algorithm by using appropriate highperformance libraries for linear algebra and integrate it with RedisGraph graph database. Also, we introduce a new CFPQ algorithm with singlepath query semantics that allows us to extract one found path for each pair of nodes. Finally, we provide the evaluation of our algorithms for both semantics which shows that matrixbased CFPQ implementation for RedisGraph database is performant enough for realworld data analysis.

Programming and Computer Software,
Path querying with conjunctive grammars is known to be undecidable. There is an algorithm for path querying with linear conjunctive grammars which provides an overapproximation of the result, but there is no algorithm for arbitrary conjunctive grammars. We propose the first algorithm for path querying with arbitrary conjunctive grammars. The proposed algorithm is matrixbased and allows us to efficiently apply GPGPU computing techniques and other optimizations for matrix operations.

Recently proposed matrix multiplication based algorithm for contextfree path querying (CFPQ) offloads the most performancecritical parts onto boolean matrices multiplication. Thus, it is possible to achieve high performance of CFPQ by means of modern parallel hardware and software. In this paper, we provide results of empirical performance comparison of different implementations of this algorithm on both realworld data and synthetic data for the worst cases.Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA),

Logic, Language, Information, and Computation,
The BarHillel theorem states that contextfree languages are closed under intersection with a regular set. This theorem has a constructive proof and thus provides a formal justification of correctness of the algorithms for applications mentioned above. Mechanization of the BarHillel theorem, therefore, is both a fundamental result of formal language theory and a basis for the certified implementation of the algorithms for applications. In this work, we present the mechanized proof of the BarHillel theorem in Coq.

Proceedings of the Institute for System Programming,
One of the problems in graph data analysis is querying for specific paths. Such queries are usually performed by means of a formal grammar that describes the allowed edgelabeling of the paths. Path query is said to be calculated using relational query semantics if it is evaluated to triple ((A,v1,v2), such that there is a path from v1 to v2 such that the labels on the edges of this path form a string derivable from the nonterminal A. We focus on the Boolean languages that use Boolean grammars to describe the labeling of paths. Although path querying using relational query semantics and Boolean grammars is known to be undecidable, in this work we propose a path querying algorithm on acyclic graphs which uses relational query semantics and Boolean grammars and approximates the exact solution. To achieve better performance in compare with the naive algorithm, considered classes of graphs were limited to acyclic graphs.

Proceedings of the 9th ACM SIGPLAN International Symposium on Scala,
Transparent integration of a domainspecific language for specification of contextfree path queries (CFPQs) into a generalpurpose programming language as well as static checking of errors in queries may greatly simplify the development of applications using CFPQs. LINQ and ORM can be used for the integration, but they have issues with flexibility: query decomposition and reusing of subqueries are a challenge. Adaptation of parser combinators technique for paths querying may solve these problems. Conventional parser combinators process linear input, and only the Trails library is known to apply this technique for path querying. We demonstrate that it is possible to create general parser combinators for CFPQ which support arbitrary contextfree grammars and arbitrary input graphs. We implement a library of such parser combinators and show that it is applicable for realistic tasks.
 GRADESNDA '18 Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA),

Proceedings of the 13th Central & Eastern European Software Engineering Conference in Russia (CEESECR '17),
There are several solutions for CFPQ, but how to provide structural representation of query result which is practical for answer processing and debugging is still an open problem. In this paper we propose a graph parsing technique which allows one to build such representation with respect to given grammar in polynomial time and space for arbitrary contextfree grammar and graph. Proposed algorithm is based on generalized LL parsing algorithm, while previous solutions are based mostly on CYK or Earley algorithms, which reduces time complexity in some cases.

Perspectives of System Informatics,
We present a technique for syntax analysis of a regular set of input strings. This problem is relevant for the analysis of stringembedded languages when a host program generates clauses of embedded language at run time. Our technique is based on a generalization of RNGLR algorithm, which, inherently, allows us to construct a finite representation of parse forest for regularly approximated set of input strings. This representation can be further utilized for semantic analysis and transformations in the context of reengineering, code maintenance, program understanding etc. The approach in question implements relaxed parsing: nonrecognized strings in approximation set are ignored with no error detection.
Resources

Sources and data set on GitHab
Sources of matrixbased algorithm implementations, sources of testing system, collected data set for CFPQ algorithms evaluation (graphs and queries).