# REU-CAAR Summer Research Group

*The Department welcomes comments, suggestions and corrections. Send email to editor [at] cs [dot] umd [dot] edu.*

Since 2013, the Computer Science Department has hosted students from around the country to participate in a Research Experience for Undergraduate/Combinatorics and Algorithms for Real Problems/ (REU-CAAR) program each summer. This year, twenty students from around the country have come to do research with professors in the Departments of Computer Science and the Department Electrical and Computer Engineering. Professor William Gasarch and Professor Samir Khuller lead the program, which offers very diverse periences for student researchers. The REU program pairs a professor with two or more student researchers and together, they tackle a research problem. This year's research groups include the following:

**Quantum Computing**

*Professor Andrew Childs* with *Jiahui Liu, Prayaag Venkat, and Justin Yirka*

Quantum bits, or qubits, can store information in a linear combination of basis states corresponding to classical bit strings. By representing and manipulating information using quantum states, some information-processing tasks can be accomplished much more efficiently than with classical information. Several possible projects in this area are available with researchers in the Joint Center for Quantum Information and Computer Science (QuICS, quics.umd.edu). These projects (linked to below) will explore questions related to extracting information from quantum states, the relationship between a multipartite quantum state and its subsystems, the number of qubits that must be exchanged to perform a distributed computation, and protocols for compressing quantum messages when the receiver has partial information about the message.

**Cryptography: Security versus Efficiency**

*Professor Dana Dachman-Soled and Professor Uzi Vishkin with Shir Maimon, Robert Metzger Laura Sullivan-Russett*

Several developments in computer systems during the last several decades have caused performance programmers to increasingly work closely with system architecture features to improve efficiency in critical applications. But, is there an inherent tension between programming for performance and security? In this project, we first identify certain facets of this tension and then explore some of them. We will then consider feasibility of turning this tension into key-recovery attacks on commonly used cryptosystems such as AES and RSA.

**The Mathematics Needed for Kidney Exchange Programs**

*Professor John Dickerson with Linyi Chen, Samsara Counts, and Cameron Moy*

Consider the following scenario:

Alice needs a kidney and Bob is happy to give her his. But Bob's is incompatible! Boo!

Carol needs a kidney and Dan is happy to give her his. But Dan's is incompatible! Boo!

But if Bob's kidney is fine for Carol and Dan's kidney is fine for Alice. Then they could do a kidney exchange where Bob's kidney goes to Carol and Dan's kidney goes to Alice.

This is a simple scenario, perhaps too simple. But imagine if there is a database of people who need kidney's and people who are willing to be donors (there is!). Then one can imagine exchanges like the one above. They may even be far more complex (the longest ever involved a cycle of length 16).

Several mathematical issues arise from making this work. How to tell how good a kidney is? How to balance the many factors of compatability and the difficulty of transporting kidneys?

**Machines that Learn Languages from Interactions**

*Professor Hal Daumé III with Yang Cao, Kealyssa Castillo-Martin, Carl Denton, and Isabella Huang*

Natural language processing systems currently learn mostly from labeled data for a specific task-domain pairs. While this approach is useful, it cannot scale very well. Systems need to be able to learn to improve their performance from naturalistic interaction with users in addition to labeled data. This offers systems the opportunity to test proposed generalizations and receive feedback on their performance. Ideally, this feedback should be acquired naturally, or at a very low cost, through interaction with users.

For instance, machine translation systems trained on large quantities of government documents do very well at translating more government documents, but quite poorly at translating anything else. However, a translation system that is deployed to facilitate conversations between users feedback. The system must balance testing potential generalizations (exploration) with satisfying immediate user needs (exploitation), and must be able to get the most out of user feedback.

This general idea can be explored in many natural language processing applications, eventually leading to fully interactive systems that learn continually throughout their lifetime.

**Using Game Theory on the Quiz show Jeopardy **

*Professor William Gasarch with Jessica Abramson and Natalie Collina*

In the final round of the quiz show Jeopardy Alice has a dollars, Bob has b dollars, Carol has c dollars, and they each need to bet some amount on the last question, knowing just the category. What if all players know all players prob of getting the question right? How much should they bet? Variants can also be asked. This is just one of several projects involving probability that we can work on. And they are all fun!

**Phase Retrieval in Image Processing**

*Professor Thomas Goldstein with Justin Hontz, Valarie McCulloch, and Ziyan Zhong*

There are many applications where computers must create images from data. In some cases this amounts to solving simple linear systems (of the type you've seen in linear algebra). But in other cases this involves solving large non-linear systems of equations (which is a hard problem). This is the Phase Retrievel Problem.

Recently, several new approaches to phase retrieval have emerged, some of which have strong theoretical guarantees. However, despite the recent explosion of work in this area, surprisingly little has been done to build efficient, practical implementations of these new methods, and benchmark them against one another.

In this research project, students will learn about phase retrieval problems and the new tools that have emerged to solve them. Students will then implement several new methods and benchmark them against one another using a variety of real and synthetic datasets. If time permits, students will learn to use MPI for distributed computing, and this tool will be used to solve extremely large problem instances in the cloud. This problem has natural applications in data storage and electron microscopy.

**Data Center Scheduling**

*Professor Samir Khuller with Xi Chen and Tu Luan*

In modern datacenters, technologies like MapReduce divide large problems into small jobs, which need to be scheduled on complex architectures. This project introduced a general framework that converts algorithms for offline scheduling and knapsack-type problems (in which the entire input is known upfront) to the online setting (i.e., the entire problem input is not available upfront). This framework is then used to produce algorithms that have the best known approximation performance for a variety of scheduling problems. The algorithms perform well on real world data sets.