But while much of our research is driven by applications, we also conduct research to optimize various aspects of the data science pipeline.

  • We develop learned index structures so that data extraction is efficient for the specific data that we are dealing with.
  • We develop data summaries so that machine learning algorithms can be executed faster on smaller amounts of data.
  • We develop methods for workload-aware model materialization -- i.e., we look for the best ways to store the learned models so that we can use them as efficiently as possible for our application.
  • And finally, we work on end-to-end optimizations for the data science pipeline. For example, when the data are updated, we consider how to update our models as efficiently as possible, without having to retrain from scratch.

Video: Algorithmic Data Science - Research