Learning Caching Strategies for Dynamic Workloads on Graph Databases

The project will seek to improve performance of mixed read-write workloads on graph databases by optimising page cache performance.  The optimisation approach will seek to exploit a whole range of factors impacting the cache performance (such as workload characteristics and underlying graph topology) rather than relying on generic strategies, such as LRU. Application of machine learning techniques to dynamically adapt replacement policy as workload and topology evolve over time will be of particular interest. The project will be conducted in a close collaboration with Neo4j, the world leader in the graph database technologies.

Start date

2 January 2025

Duration

3.5 years

Application deadline

Funding information

UKRI standard stipend, tuition fees, research training support grant

About

As the volume and complexity of data has changed, particularly in recent years, database systems have begun to evolve along different paths for different use cases. While relational databases have been traditionally popular, more recently graph databases have risen to prominence. A graph database uses nodes (vertices) and relationships (edges) to create a graph (network) that represents the entities and associativity between them.

Data caching plays a central role in maintaining low latency and high throughput of data access by ensuring data times that are likely to be accessed in the near future are available in the main memory. However, most existing caching strategies employed by graph databases do not take into account graph topology when determining which items should be kept in cache. They also do not adapt well to dynamic changes in the graph topology and the query workload. 

In this project, we plan to investigate how the topology of a graph can be exploited to optimise the caching strategy with the aim of improving hit rate, and overall throughput. We will also aim to explore how machine learning techniques can be used to dynamically learn an optimal caching policy based on current topological context and data access patterns. 

A successful candidate will work in the Distributed Systems and Concurrency group at University of Surrey and benefit from close collaboration with Neo4j, a world leader in graph data base technology. The supervisory team has a strong publication record in systems, distributed systems and concurrency, programming languages, and formal verification, including publications at flagship venues, such as OSDI, ATC, EuroSys, VLDB, and PODC among others. 

Further details on the proposed project can be found at Learning Caching Strategies for Dynamic Workloads on Graph Databases

Eligibility criteria

Open to candidates who pay UK/home rate fees. See UKCISA for further information.

Applicants are expected to hold a first or upper second-class (2:1) UK degree in a relevant discipline, Computer Science PHD (or equivalent overseas qualification). 

Interest in systems research (e.g., operating systems, distributed and concurrent systems) is a plus. 

An ideal candidate will possess solid coding skills and some experience with empirical performance evaluation.

How to apply

Applications should be submitted via the Computer Science PHD programme page. In place of a research proposal you should upload a document stating the title of the project that you wish to apply for and the name of the relevant supervisor.

Studentship FAQs

Read our studentship FAQs to find out more about applying and funding.

Application deadline

Contact details

Gregory Chockler
06 BB 02
Telephone: +44 (0)1483 682651
E-mail: g.chockler@surrey.ac.uk
studentship-cta-strip

Studentships at Surrey

We have a wide range of studentship opportunities available.