We explore the optimization of queries with client side UDFs. Many UDFs can
only be executed at the client site, for reasons of security,
confidentiality, or availability of resources. How should a query be
optimized to take client-site UDFs into account? We demonstrate that in this
context the known execution techniques for expensive server site UDFs
perform badly. The involved network latencies cannot be ignored. We blend
well-known distributed database algorithms with established techniques to
handle expensive server-site UDFs. The resulting query execution techniques
are implemented in the Cornell Predator database system, and we present
performance results to demonstrate their effectiveness.
We also reconsider the question of expensive UDF placement in the context of
client-site UDFs. The known techniques, namely rank ordering, turn out to be
inadequate. We demonstrate query plan optimizations for client-site UDFs and
show their effectiveness in performance tests. Finally we propose a System-R
style optimizer for query plans involving client-site operations.