Learning Caching Strategies for Dynamic Workloads on Graph Databases

Graph databases are becoming increasingly critical in modern AI applications, from social networks to drug discovery, as they efficiently represent and process complex relationships in data. However, their performance faces a significant challenge: the vast disparity between fast memory access and slow disk operations, with disk access being up to 100,000 times slower than memory. 

While modern graph databases prioritize in-memory processing, the sheer volume of data often exceeds available memory, necessitating intelligent caching strategies to enable efficient data movement between memory and disk. However, current caching strategies, designed for traditional relational databases, fail to consider the interconnected nature of graph data, leading to suboptimal performance in graph-based applications.

This research project, supported by Neo4j, the world leader in graph database technology, will seek to develop novel strategies for graph database caching that exploit a whole range of factors impacting the cache performance (such as workload characteristics and underlying graph topology) rather than relying on generic strategies, such as LRU. Application of machine learning techniques to dynamically adapt replacement policy as workload and topology evolve over time will be of particular interest. The project will be conducted in a close collaboration with Neo4j, the world leader in the graph database technologies.

Start date

1 July 2025

Duration

3.5 years

Application deadline

Funding source

Matched funding: University of Surrey and Neo4j

Funding information

  • UKRI standard stipend (£19,237 p.a. 2024/25 rates)
  • Full home or O/S tuition fees (as applicable)
  • Research, training and support grant of up to £3,000 over the project.

About

As the volume and complexity of data has changed, particularly in recent years, database systems have begun to evolve along different paths for different use cases. While relational databases have been traditionally popular, more recently graph databases have risen to prominence. A graph database uses nodes (vertices) and relationships (edges) to create a graph (network) that represents the entities and associativity between them.

Data caching plays a central role in maintaining low latency and high throughput of data access by ensuring data times that are likely to be accessed in the near future are available in the main memory. However, most existing caching strategies employed by graph databases rely on generic mechanisms, such as LRU, and do not take into account intricate temporal and spatial dependencies specific to the graph data. They also do not adapt well to dynamic changes in the graph topology and the query workload. 

In this project, we plan to investigate how a variety of factors, such as workload characteristics and underlying graph topology impact caching strategies for graph databases, with the goal of improving hit rate, and overall throughput. We will also aim to explore how machine learning techniques can be used to dynamically learn an optimal caching policy based on topological features and current data access patterns. 

A successful candidate will work in the Distributed Systems and Concurrency group at University of Surrey and benefit from close collaboration with Neo4j, a world leader in graph data base technology. The supervisory team has a strong publication record in systems, distributed systems and concurrency, programming languages, and formal verification, including publications at flagship venues, such as OSDI, ATC, EuroSys, VLDB, and PODC among others. 

Further details on the proposed project can be found at Learning Caching Strategies for Dynamic Workloads on Graph Databases.

External supervisors

Eligibility criteria

Open to any UK or international candidates.

Applicants are expected to hold a first or upper second-class (2:1) UK degree in a relevant discipline, Computer Science PhD (or equivalent overseas qualification). 

Interest in systems research (e.g., operating systems, distributed and concurrent systems) is a plus. 

An ideal candidate will possess solid coding skills and some experience with empirical performance evaluation.

How to apply

Applications should be submitted via the Computer Science PhD programme page. In place of a research proposal you should upload a document stating the title of the project that you wish to apply for and the name of the relevant supervisor.

Studentship FAQs

Read our studentship FAQs to find out more about applying and funding.

Application deadline

Contact details

Gregory Chockler
06 BB 02
Telephone: +44 (0)1483 682651
E-mail: g.chockler@surrey.ac.uk
studentship-cta-strip

Studentships at Surrey

We have a wide range of studentship opportunities available.