CMU Silicon Valley welcomes Kostas Tsioutsiouliklis

Date/Time: Wednesday, November 4, 2015, 11:30 am (PT) / 2:30 pm (ET) 

Location: CMU Silicon Valley Campus: Bldg 23, Rm 118

Open to Carnegie Mellon students, faculty, and staff only

Set Cover at Web Scale

Given a set of elements U, the classic Set Cover problem requires selecting a minimum size subset A from a family of finite subsets F of U, such that the elements covered by A are the ones covered by F. It naturally occurs in many settings in web search, web mining, and web advertising. The greedy algorithm that iteratively selects a set in F that covers the most uncovered elements yields an optimum (1+ln|U|)-approximation but is inherently sequential. In this work we give the first MapReduce Set Cover algorithm that scales to problem sizes of ~1 trillion elements and runs in log_p(Delta) iterations for a nearly optimum approximation ratio of pln(Delta), where Delta is the cardinality of the largest set in F. This work is in collaboration with Stergios Stergiou and was published at KDD'15. In this lecture, we will go into the technical details of our work but also elaborate on our motivation, which was web search, especially web crawling.

Kostas Tsioutsiouliklis is Director of Content Science at Yahoo! Labs in Sunnyvale, CA. His team focuses on Content Acquisition and Content Understanding for web search and for the Yahoo! homepage. He holds a Ph.D. in Computer Science from Princeton University, and a Diploma in Computer Engineering from the University of Patras, Greece. Prior to Yahoo! Labs, Kostas was a Research Staff Member at NEC Labs, and more recently a member of the Search and Relevance team at Twitter, working on trends.