Monday, April 25, 2016

Why I gave your paper a Strong Accept

See also: Why I gave your paper a Strong Reject

I know this blog is mostly about me complaining about academics, but there's a reason I stay engaged with the research community: I learn stuff. Broadly speaking, I think it's incredibly important for industry to both stay abreast of what's going on in the academic world, as well as have some measure of influence on it. For those reasons, I serve on a few program committees a year and do other things like help review proposals for Google's Faculty Research Award program.

Apart from learning new things, there are other reasons to stay engaged. One is that I get a chance to meet and often work with some incredible colleagues, either professors (to collaborate with) or students (to host as interns and, in many cases, hire as full-time employees later on).

I also enjoy serving on program committees more than just going to conferences and reading papers that have already been published. I feel like it's part of my job to give back and contribute my expertise (such as it is) to help guide the work happening in the research community. Way too many papers could use a nudge in the right direction by someone who knows what's happening in the real world -- as a professor and grad student, I gained a great deal from my interactions with colleagues in industry.

Whenever I serve on a program committee, I make it a point to champion at least a couple of papers at the PC meeting. My colleagues can attest to times I've (perhaps literally) pounded my fist on the table and argued that we need to accept some paper. So to go along with my recent post on why I tend to mark papers as reject, here are some of the reasons that make me excited to give out a Strong Accept.

(Disclaimer: This blog represents my personal opinion. My employer and my dog have nothing to do with it. Well, the dog might have swayed me a little.)

The paper is perfect and flawless. Hah! Just kidding! This never happens. No paper is ever perfect -- far from it. Indeed, I often champion papers with significant flaws in the presentation, the ideas, or the evaluation. What I try to do is decide whether the problems can be fixed through shepherding. Not everything can be fixed, mind you. Minor wording changes or a slight shift in focus are fixable. Major new experiments or a total overhaul of the system design are not. When I champion a paper, I only do so if I'm willing to be on the hook to shepherd it, should it come to that at the PC meeting (and it often does).

Somebody needs to stand up for good papers. Arguably, no paper would ever get accepted unless some PC member were willing to go to bat for it. Sadly, it's a lot easier for the PC to find flaws in a paper (hence leading to rejection) than it is to stand up for a paper and argue for acceptance -- despite the paper's flaws. Every PC meeting I go to, someone says, "This is the best paper in my pile, and we should take it -- that's why I gave it a weak accept." Weak accept!?!? WEAK!?! If that's the best you can do, you have no business being on a program committee. Stand up for something.

In an effort to balance this out, I try to take a stand for a couple of papers every time I go to a PC meeting, even though I might not be successful in convincing others that those papers should be accepted. Way better than only giving out milquetoast scores like "weak accept" or -- worse -- the cop-out "borderline".

The paper got me excited. This is probably the #1 reason I give out Strong Accepts. When this happens, usually by the end of the first page, I'm getting excited about the rest of the paper. The problem sounds compelling. The approach is downright sexy. The summary of results sound pretty sweet. All right, so I'm jazzed about this one. Sometimes it's a big letdown when I get into the meat and find out that the approach ain't all it was cracked up to be in the intro. But when I get turned on by a paper, I'll let the small stuff slide for sure.

It's hard to predict when a paper will get me hot under the collar. Sometimes it's because the problem is close to stuff I work on, and I naturally gravitate to those kinds of papers. Other times it's a problem I really wish I had solved. Much of the time, it's because the intro and motivation are just really eloquent and convincing. The quality of writing matters a lot here.

I learned a lot reading the paper. Ultimately, a paper is all about what the reader takes away from it. A paper on a topic slightly out of my area that does a fine job explaining the problem and the solution is a beautiful thing. Deciding how much "tutorial" material to fit into a paper can be challenging, especially if you're assuming that the reviewers are already experts in the topic at hand. But more often than not, the PC members reading your paper might not know as much about the area as you expect. Good exposition is usually worth the space. The experts will skim it anyway, and you might sell the paper to a non-expert like me.

There's a real-world evaluation. This is not a requirement, and indeed it's somewhat rare, but if a paper evaluates its approach on anything approximating a real-world scale (or dataset) it's winning major brownie points in my book. Purely artificial, lab-based evaluations are more common, and less compelling. If the paper includes a real-life deployment or retrospective on what the authors learned through the experience, even better. Even papers without that many "new ideas" can get accepted if they have a strong and interesting evaluation (cough cough).

The paper looks at a new problem, or has a new take on an old problem. Creativity -- either in terms of the problem you're working on, or how you approach that problem -- counts for a great deal. I care much more about a creative approach to solving a new and interesting (or old and hard-to-crack) problem than a paper that is thoroughly evaluated along every possible axis. Way too many papers are merely incremental deltas on top of previous work. I'm not that interested in reading the Nth paper on time synchronization or multi-hop routing, unless you are doing things really differently from how they've been done before. (If the area is well-trodden, it's also unlikely you'll convince me you have a solution that the hundreds of other papers on the same topic have failed to uncover.) Being bold and striking out in a new research direction might be risky, but it's also more likely to catch my attention after I've reviewed 20 papers on less exciting topics.


Wednesday, April 20, 2016

Why I gave your paper a Strong Reject

Also see: Why I gave your paper a Strong Accept.

I'm almost done reviewing papers for another conference, so you know what that means -- time to blog.

I am starting to realize that trying to educate individual authors through my witty and often scathing paper reviews may not be scaling as well as I would like. I wish someone would teach a class on "How to Write a Decent Goddamned Scientific Paper", and assign this post as required reading. But alas, I'll have to make do with those poor souls who stumble across this blog. Maybe I'll start linking this post to my reviews.

All of this has probably been said before (strong reject) and possibly by me (weak accept?), but I thought I'd share some of the top reasons why I tend to shred papers that I'm reviewing.

(Obligatory disclaimer: This post represents my opinion, not that of my employer. Or anyone else for that matter.)

The abstract and intro suck. By the time I'm done reading the first page of the paper, I've more or less decided if I'm going to be reading the rest in a positive or negative light. In some cases, I won't really read the rest of the paper if I've already decided it's getting The Big SR. Keep in mind I've got a pile of 20 or 30 other papers to review, and I'm not going to spend my time picking apart the nuances of your proofs and evaluation if you've bombed the intro.

Lots of things can go wrong here. Obvious ones are pervasive typos and grammatical mistakes. (In some cases, this is tolerable, if it's clear the authors are not native English speakers, but if the writing quality is really poor I'll argue against accepting the paper even if the technical content is mostly fine.) A less obvious one is not clearly summarizing your approach and your results in the abstract and intro. Don't make me read deep into the paper to understand what the hell you're doing and what the results were. It's not a Dan Brown novel -- there's no big surprise at the end.

The best papers have really eloquent intros. When I used to write papers, I would spend far more time on the first two pages than anything else, since that's what really counts. The rest of the paper is just backing up what you said there.

Diving into your solution before defining the problem. This is a huge pet peeve of mine. Many papers go straight into the details of the proposed solution or system design before nailing down what you're trying to accomplish. At the very least you need to spell out the goals and constraints. Better yet, provide a realistic, concrete application and describe it in detail. And tell me why previous solutions don't work. In short -- motivate the work.

Focusing the paper on the mundane implementation details, rather than the ideas. Many systems papers make this mistake. They waste four or five pages telling you all about the really boring aspects of how the system was implemented -- elaborate diagrams with boxes and arrows, detailed descriptions of the APIs, what version of Python was used, how much RAM was on the machine under the grad student's desk.

To first approximation, I don't care. What I do care about are your ideas, and how those ideas will translate beyond your specific implementation. Many systems people confuse the artifact with the idea -- something I have blogged about before. There are papers where the meat is in the implementation details -- such as how some very difficult technical problem was overcome through a new approach. But the vast majority of papers, implementation doesn't matter that much, nor should it. Don't pad your paper with this crap just to make it sound more technical. I know it's an easy few pages to write, but it doesn't usually add that much value.

Writing a bunch of wordy bullshit that doesn't mean anything. Trust me, you're not going to wow and amaze the program committee by talking about dynamic, scalable, context-aware, Pareto-optimal middleware for cloud hosting of sensing-intensive distributed vehicular applications. If your writing sounds like the automatically-generated, fake Rooter paper ("A theoretical grand challenge in theory is the important unification of virtual machines and real-time theory. To what extent can web browsers be constructed to achieve this purpose?"), you might want to rethink your approach. Be concise and concrete. Explain what you're doing in clear terms. Bad ideas won't get accepted just because they sound fancy.

Overcomplicating the problem so you get a chance to showcase some elaborate technical approach. A great deal of CS research starts with a solution and tries to work backwards to the problem. (I'm as guilty of this, too.) Usually when sitting down to write the paper, the authors realize that the technical methods they are enamored with require a contrived, artificial problem to make the methods sound compelling. Reviewers generally aren't going to be fooled by this. If by simplifying the problem just a little bit, you render your beautiful design unnecessary, it might be time to work on a different problem.

Figures with no descriptive captions. This is a minor one but drives me insane every time. You know what I mean: A figure with multiple axes, lots of data, and the caption says "Figure 3." The reviewer then has to read deep into the text to understand what the figure is showing and what the take-away is. Ideally, figures should be self-contained: the caption should summarize both the content of the figure and the meaning of the data presented. Here is an example from one of my old papers:


Isn't that beautiful? Even someone skimming the paper -- an approach I do not endorse when it comes to my publications -- can understand what message the figure is trying to convey.

Cursory and naive treatment of related work. The related work section is not a shout-out track on a rap album ("This one goes out to my main man, the one and only Docta Patterson up in Bezerkeley, what up G!"). It's not there to be a list of citations just to prove you're aware of those papers. You're supposed to discuss the related work and place it in context, and contrast your approach. It's not enough to say "References [1-36] also have worked on this problem." Treat the related work with respect. If you think it's wrong, say so, and say why. If you are building on other people's good ideas, give them due credit. As my PhD advisor used to tell me, stand on the shoulders of giants, not their toes.

Wednesday, March 2, 2016

Everything I did wrong as a professor

I really screwed things up as a young faculty member at Harvard. It worked out OK in the end, but, man, I wish I could go back in time to when I was a new professor and give my younger self some much-needed advice. No, not the "you shouldn't be a professor, get another kind of job" advice -- I wouldn't have listened to that -- but one of the reasons I ended up leaving academia is that I burned myself out. Maybe that could have been avoided had I taken a different approach to the job.

What did I get wrong? Let me count the ways...

Working on too many projects at once. I thrive on having many balls in the air. As a junior faculty member, though, I probably should have stayed focused on just one or two juicy projects, and let all the others fall to the side. I did not have a good filter for thinking about which projects I should take on and where they might lead. It was difficult to say no to any new research direction, since for all I knew it might lead somewhere great.

This one is tricky. When I first heard about using sensor networks to monitor volcanic eruptions, I thought it was a terrible idea and unlikely to lead anywhere. It turned out to be one of my most exciting and productive projects. So what the hell do I know?

Taking on high-risk projects with factors out of my control. Managing risk was not something I spent much time thinking about. I worked hard to build collaborations with the medical community to use sensor networks for things like disaster triage and monitoring patients with Parkinson's Disease. The volcano monitoring project also had a tremendous amount of risk (not just that of the volcano trying to destroy our sensors). I got lucky in some cases but it would have been better, probably, to stick to "core" CS projects that didn't involve straying too far from the lab. I can sure as hell figure out how to program a Linux box to do what I want -- had that volcano not been erupting, though, we wouldn't have gotten our paper published.

Taking on too many students. This goes along with the too-many-projects problem described above. I dreamed of having a big group, and I did. I had something like a dozen PhD students, undergrads, and postdocs rotating through my group at any given time. This ends up being a vicious cycle, as the more people in the group, the more time I had to spend writing grant proposals, and had less time to mentor them and go deep on their research. I seriously had PhD students where I reckon I spent more time writing grants to cover their salary than they spent working in the lab. If I had just, say, three or four good PhD students that would have been so much easier to manage.

Wasted too much time courting companies for money. I did not know how to play the funding game, and had unrealistic expectations of the value of visiting companies and drumming up interest in my work with them. I took countless trips to random companies up and down the Eastern seaboard, most of which did not pan out in terms of funding or collaborations. I should have stuck with the more obvious funding opportunities (NSF, Microsoft) and not blown so much energy on getting little bits of money here and there from companies that didn't understand how to fund academic research.

Way, way, way too much travel. I had to go to every goddamn conference, workshop, program committee meeting, NSF panel, you name it. I never turned down an invitation to speak somewhere, no matter how far afield from my core community and how little influence it would have on my career. I'd travel at least twice a month, sometimes more. I'd go to a conference, come home, and turn around and see the same set of people just a few weeks later at some other event. There were times when I felt that my airline status was more important to maintain than my marriage.

Conferences are a huge time sink. You don't go to see the talks -- if I need to read a paper and have questions about it I can email the authors. Sometimes it was just about having beers with my academic friends in an exotic location (like, say, upstate New York). Still, what an expensive and tiring way to maintain a social life. There's also way too many of them -- there should be just one big event a year where everyone would show up.

All the boondoggles. I wasted an incredible amount of time on little side projects that didn't need me. Each one might not be much of a time sink, but they really add up. Editorial board of a journal. PC chair of random, forgettable workshops. Serving on all manner of random committees. I found it hard to avoid this trap because you think, well, saying no means you're not going to get asked to do the more important thing next time. I have a feeling it doesn't work that way.

Hard to say if I could have really done things differently, and I know lots of faculty who seem to keep their shit together despite doing everything "wrong" on the list above. So maybe it's just me.

Wednesday, January 6, 2016

Academics, we need to talk.

Although I made the move to industry a bit more than five years ago, I still serve on program committees and review articles for journals and the like. So it's painful for me to see some of my academic colleagues totally botch it when it comes to doing industry-relevant research. Profs, grad students: we need to talk.

(Standard disclaimer: This is my personal blog and the opinions expressed here are mine alone, and most certainly not that of my employer.)

Of course, many academics do a great job of visionary, out-of-the-box, push-the-envelope research that inspires and drives work in industry. I'm talking about stuff like Shwetak Patel's increasingly insane ways of inferring activities from random signals in the environment; Dina Katabi's rethinking of wireless protocols; and pretty much anything that David Patterson has ever done. Those folks (and many others) are doing fine. Keep it up.

But the vast majority of papers and proposals I read are, well, crap. Mostly I'm involved in mobile, wireless, and systems, and in all three of these subfields I see plenty of academic work that tries to solve problems relevant to industry, but often gets it badly wrong. I don't quite know why this happens, but I have some ideas of how we might be able to fix it.

Now, I don't think all academic research has to be relevant to industry. In some sense, the best research (albeit the riskiest and often hardest to fund) is stuff going way beyond where industry is focused today. Many academics kind of kid themselves about how forward-thinking their work is, though. Working on biomolecular computation? That's far out. Working on building a faster version of MapReduce? Not so much. I'd argue most academics work on the latter kind of problem -- and that's fine! -- but don't pretend you're immune from industry relevance just because you're in a university.

My first piece of advice: do a sabbatical or internship in industry. It's probably the best way to get exposed to the real world problems -- and the scale and complexity -- that you want to have as inspiration for your work. It drives me insane to see papers that claim that some problem is "unsolved" when most of the industry players have already solved it, but they didn't happen to write an NSDI or SIGCOMM paper about it. Learn about what's going on in the real world and what problems are both being actively worked on, and what will be important a few years down the line. If you buy me a beer I'll spend a lot of time telling you what's hard at Google and what we need help with. Hint: It ain't yet another multihop routing protocol.

And by "industry", I do not mean "an academic research lab that happens to reside at a company." You can spend time there and learn a lot, but it isn't getting quite the level of product exposure I have in mind.

For this to work, you have to work on a real product team. I know this sounds like a waste of time because you probably won't get a paper out of it, but it can be an extremely eye-opening experience. Not only do you start to understand the constraints that real products have -- like, oh, it actually has to work -- but you also pick up on good engineering practices: bug tracking, code reviews, documentation. You get your hands dirty. You might even launch something that other people use (I know, crazy, right?).

Second: don't get hung up on who invents what. Coming from academia, I was trained to fiercely defend my intellectual territory, pissing all over anything that seemed remotely close to my area of interest. Industry is far more collaborative and credit is widely shared. If another team has good ideas, and can help you achieve your goals faster, join them -- and build upon that common success. So much bad academic work seems to boil down to someone trying to be so differentiated and unique that they paint themselves in a sparkly, rainbow-colored corner that, yes, ensures you stand out, but means you're often going about things the wrong way. Unfortunately, the publish-or-perish culture of academia often forces people to add arbitrary differentiation to their work so they can't be accused of being derivative. That's too bad, because a tremendous amount of value can be derived by refining and improving upon someone else's ideas.

Third: hold yourself to a higher standard. Program committees can be brutal, but I don't think they go far enough when it comes to setting the bar for real-world relevance. Collecting data from a dozen students running your little buggy mobile app is nothing compared to industry scale. Even if you can't get a million or a hundred million people using your app, what would it take? Wouldn't it have to work on a lot of different phone platforms? Not badly impact battery life? Actually be secure? Use a server running somewhere other than under a grad student's desk? Possibly have a unit test or two in the code?

Now, I know what you're thinking -- all of this is a distraction, since you don't need to go this extra mile to get that paper submitted. That's true. But if you're trying to do industry-relevant work, it helps to look at things through an industry lens, which means going beyond the "it ran once and I got the graphs" prototypes you're probably building. Why doesn't Google just rewrite Android in Haskell, or why doesn't Facebook just throw away their filesystem and use yours instead? Maybe it has something to do with some of these annoying "engineering problems".

Finally, try to keep an open mind about how your research can have impact and relevance. A lot of academics get so laser-focused on impressing the next program committee that they fail to see the big picture. It's like being a chef that's only out to impress the restaurant critics and can't cook a decent goddamn hamburger. My PhD advisor never seemed to care particularly about publishing papers; rather, he wanted to move the needle for the field, and he did (multiple times). Over-indexing on what's going to impress the tiny set of people reviewing SOSP or MobiCom papers is a missed opportunity. Racking up publications is fine, but if you want to have impact on the real world, there's a lot more you can do.

Wednesday, August 26, 2015

What I learned about mobile usage in Indonesia

A couple of weeks ago I traveled to Jakarta to understand mobile (and especially mobile browser) usage in Indonesia. Indonesia is a huge country with a population of nearly 250 million people and a vast number of them are getting online. For many, smartphones are the first and only device they use for accessing the Internet. I wanted to share some of the things I learned interviewing a number of Indonesian smartphone users.

Phones for sale at a Lotte store in Jakarta.

I want to emphasize that this is my personal blog, and the opinions expressed here are mine, and not that of my employer.

Some of my key takeaways from the week...

Smartphones are central to users' lives
For everyone I interviewed, their smartphone was absolutely central to their life and was a major window to the outside world. For nearly all of these users, the smartphone is the first and only Internet-connected device they own, and they rely on their phones a great deal. Desktop or laptop Internet usage was limited to office workers or students, and even then the smartphone dominated.

I saw a wide range of phones, from top-of-the-line Samsung devices all the way down to 2-3 year old, low-end Androids running badly out-of-date software. Even so, people make heavy, heavy use of their phones: for messaging, games, watching YouTube, downloading music, taking and sharing pictures ... all of the same things that "we" (meaning for the sake of this article the relatively wealthy and well-connected citizens of, say, North America or Europe) use our phones for as well.

The US-centric mindset is that the phone is a "second screen" and that laptops, desktops, etc. are the main device that people use. Not so here. It's not just "mobile first", it's "mobile only".

Mobile data is cheap and connectivity nearly ubiquitous
I was surprised at how inexpensive mobile data was and how well connected the city and suburbs of Jakarta were. For 100,000 rupiah -- less than $8 -- I bought 2GB of data. A huge range of data pack options were available, but the typical price seems to be around $4 per GB. Now, for many Indonesians this is not as cheap as it sounds to me, but it's still quite affordable -- less than filling up a tank of gas for your motorbike.

Everyone I met used prepaid mobile data: Typically they would "top up" by buying a voucher or card at a kiosk -- with cash -- which would give them another couple of GB of data. The carrier sends an SMS when the quota is about to run out, and much like filling up on gas, you'd head to the kiosk and get another card. Various other approaches were used -- some people would SMS a person they knew, who would top up for them and then pay them by transferring money through their bank account. We didn't meet anyone who had an account with a mobile carrier and got billed regularly.

Some users had an "unlimited" data plan, but when they went over a certain quota the speed would drop down to something almost unbearable -- as bad as 16 Kbps in some cases.

Overall, though, network performance was quite good, and I used my phone extensively on Telkomsel's network with few problems, even out in the boonies. The folks we interviewed generally did not express problems with connectivity -- only when they would travel into more rural areas was this a problem. Check out OpenSignalMap's coverage map of Telkomsel for example -- it's pretty broad.

Very few users used WiFi with any regularity on their phones. Sometimes they would join a WiFi hotspot at work or while out shopping, but cellular data seemed to be the typical way to connect.

The main use cases are messaging, social networking, and search, in that order
Everyone I met used Blackberry Messenger and WhatsApp extensively. Many users were on Facebook as well, and other social networking and messaging apps such as Line, Path, and Twitter were often mentioned. For whatever reason, BBM (on Android) is hugely popular here although I got the sense that younger folks were gravitating towards WhatsApp. Users would have dozens or even hundreds of BBM and WhatsApp contacts, and many of them were getting frequent chat notifications from these apps during our interviews. Facebook seems to be tremendously popular as well.

We often hear that "for many users in emerging markets, Facebook is the Internet". I didn't get that sense at all; people know about the Internet and the web, for sure, and Facebook is just another app for them (although an important one).

After messaging and social networking, searching for content on the Web is pretty important. Google is widely used and highly regarded -- everyone calls it "Mbah Google" meaning the wise old grandfather who knows all. Browsers were mostly used for web searches and not much else -- indeed, none of the folks I interviewed had much if any knowledge about things like bookmarks, tabs, Incognito mode, or anything else in the browser.

"Death of the Web" is greatly exaggerated
There is often a lot of hand-wringing about native apps spelling the "death" of the web. Apps are popular, sure, but they don't seem to replace any of the use cases for the Web on mobile, at least for these users. I wouldn't expect -- or even want -- mobile websites to replace WhatsApp or Facebook. That seems like a losing proposition to me, and I don't fully understand the drive to make mobile websites more "like apps". Despite the popularity of apps, the Web, and Web search, still play a huge role on mobile devices for these users -- I don't see that going away any time soon.

Nobody can update anything, because their storage is full
Nearly everyone I met had maxed out the storage on their phones -- typically 8GB or more -- presumably downloading pictures, videos, games, and so forth. (It seems that WhatsApp automatically stores all images downloaded from a conversation into the local device, which might be a major contributor, given its popularity here.) As a result, nobody was able to update their apps, even when Chrome (for example) reminds them to do so. We saw a lot of out-of-date app versions being used, and people told us they have been unable to update due to storage constraints. (I was expecting people to tell me they didn't update apps because of data quota limits, but that didn't seem to be a major issue.) I don't know what can be done about this -- some way to automatically zap old WhatsApp images or something -- but it obviously creates problems for users when they are using buggy or insecure versions of things.

The future looks bright
Despite all of the challenges I saw, I came away with an extremely optimistic outlook for mobile users in Indonesia. I was impressed with how pervasive smartphones and mobile network connectivity were. I was glad to see that data cost was not a huge barrier to use -- apart from YouTube, people seemed able to purchase enough mobile data for their typical needs. Devices and connectivity are only going to get better and more affordable. It's a really exciting time to be working in this space.

Tuesday, May 5, 2015

A modest proposal: SOSIGCOMMOBIXDI

I have a problem: there are way too many conferences to attend. Even worse, the degree of overlap between the systems, mobile, and networking communities means that I am basically running into the same people at all of these events. You have a problem, too: You are paying money (and time) to attend all of these separate conferences.

Conservatively, there are five "top tier" conferences that are "must attend" events every year: SOSP/OSDI, NSDI, MobiSys, MobiCom, SIGCOMM. (Not to mention excellent venues like USENIX ATC, EuroSys, CoNext, SenSys, the list goes on.) And then all of the "smaller workshops because we don't like how big the conferences are but you pretty much have to go anyway": HotOS, HotMobile, HotNets.

Realistically, nobody makes it to all of these events (unless you're, say, a poor junior faculty member going for tenure and have to show your place in as many places as possible). So you pick and choose based on whether you have a paper accepted, or whether you have enough travel money laying around, or whether you just have to get away from home for a few days.

Consider the costs of running all of these separate events. For the attendees, there is the high cost (and time investment) for travel, registration fees, and taking time away from work and home to attend each conference. A single conference trip probably costs $1500-2000, more if you are traveling overseas, and anywhere from three days to a week of time away from home. Especially for those with young children at home each trip takes a serious toll.

Organizing a conference is also a huge amount of work, regardless of whether it's a workshop for 50 people or a big conference for 500. This is especially true for the poor general chair who has to work out all of the details of the venue, hotel, meals, A/V setup, finances, etc.

You know where this is going: Why don't we have one, big, annual conference spanning the systems, networking, and mobile research communities? (And, while we're at it, why not throw in databases for good measure?) SOSIGCOMMOBIXDI would run for, say, 5 days, with parallel (yes!) sessions covering each of these major areas. It would happen at roughly the same week each year, so people can plan their travel and vacation schedules well in advance. It's like FCRC, for systems!

I can hear the objections now! Let me take them one by one.

But won't it be too big? SIGCOMM and SOSP/OSDI already have something like 600 people in attendance; these are hardly close-knit communities. Given the amount of overlap across these various conferences, I estimate that there would be no more than 3,000 people attending SOSIGCOMMOBIXDI, although I'll grant I might be underestimating -- let's be generous and say 5,000 people. Organizing an event for 5,000 people is no big deal. Most large cities have hotels and convention centers that can comfortably handle events of this size. Hell, medical conferences typically have 10,000 or more (way more) attendees. It is understood how to run conferences at this scale. It's not something a typical professor has experience doing, so best to rely on a professional events organization like USENIX.

I have been to 5,000-person conferences and if anything, it's more energizing -- and there is much more to do -- than these 500-person events where everyone is expected to sit in the same room listening to the same talks all day long. You have room for smaller breakouts and workshops; a larger, more interesting industry presence; and greater draw for really interesting keynote speakers.

But I want a single track! Get over it. The single-track "constraint" is often cited by senior people who remember what it was like in the early days of the field when conferences were 1/5th the size that they are now, and every PC member read every paper. The people who complain about parallel tracks are often the ones who spend most of the conference out in the hall chatting with their colleagues -- they're not listening to every talk anyway. Even if they sit in the room all day they're probably on their laptops pretending to listen to the talk, or writing blog posts (like I'm doing now).

Ever been to the morning session on the third day of a conference? Crickets. Where are all of the "single-trackers" then?

Moreover, the single-track "constraint" severely limits the number of papers a conference can publish every year. Most 2.5-day conference can take no more than 25-30 papers to fit in a single track model. To squeeze more papers in, we've gotten rid of the more memorable aspects of these conferences: keynotes, panels, breakouts. It doesn't scale.

Removing the single-track requirement also opens up a bunch of possibilities for changing up the format of the conference. Sure, you want some large plenary sessions and a few tracks of papers. But hosting a few mini-workshops, working sessions, or BoFs during the day is possible too. Squeeze in poster and demo sessions here and there. Even set some space aside for an industry trade show (these are often really fun, but most academic conferences rarely have them).

Worried you're going to miss something? The papers are all online, and USENIX even posts videos of all of the talks. So, I claim that the single-track model is outdated.

But then there's only one paper submission deadline a year! Not necessarily. We could have rolling submissions for SOSIGCOMMOBIXDI, much like SIGMOD and some other venues do. Since SOSIGCOMMOBIXDI practically consists of multiple federated events, each one can have its own set of deadlines, and they could be staggered across sub-events. Paper submission and evaluation are only loosely coupled to the timing of the conference itself.

But ACM and USENIX won't get as much conference registration revenue if there's only one event! Oh, I hadn't thought of that.

Thursday, April 30, 2015

Flywheel: Google's Data Compression Proxy for the Mobile Web

Next week, we'll be presenting our work on the Chrome Data Compression proxy, codenamed Flywheel, at NSDI 2015. Here's a link to the full paper. Our wonderful intern and Berkeley PhD student Colin Scott will be giving the talk. (I'm happy to answer questions about the paper in the comments section below.)

It's safe to say that the paper would have never happened without Colin -- most of us are too busy building and running the service to spend the time it takes to write a really good paper. Colin's intern project was specifically to collect data and write a paper about the system (he also contributed some features and did some great experiments). It was a win-win situation since we got to offload most of the paper writing to Colin, and he managed to get a publication out of it!


Rather than summarize the paper, I thought I'd provide some backstory on Flywheel and how it came about. It's a useful story to understand how a product like this goes from conception to launch at a company like Google.

(That said, standard disclaimer applies: This is my personal blog, and the opinions expressed here are mine alone.)

Backstory: Making the mobile web fast

When I moved to Seattle in 2011, I was given the mission to start a team with a focus on improving mobile Web performance. I started out by hiring folks like Ben Greenstein and Michael Piatek to help figure out what we should do. We spent the first few months taking a very academic approach to the problem: Since we didn't understand mobile Web performance, of course we needed to measure it!

We built a measurement tool, called Velodrome, which allowed us to automate the process of collecting Web performance data on a fleet of phones and tablets -- launching the browser with a given URL, measuring a bunch of things, taking screenshots, and uploading the data to an AppEngine-based service that monitored the fleet and provided a simple REST API for clients to use. We built a ton of infrastructure for Velodrome and used it on countless experiments. Other teams at Google also started using Velodrome to run their own measurements and pretty soon we had a few tables full of phones and tablets churning away 24/7. (This turned out to be a lot harder than we expected -- just keeping them running continuously without having to manually reboot them every few hours was a big pain.)

At the same time we started working with the PageSpeed team, which had built the gold standard proxy for optimizing Web performance. PageSpeed was focused completely on desktop performance at the time, and we wanted to develop some mobile-specific optimizations and incorporate them. We did a bunch of prototyping work and explorations of various things that would help.

The downside to PageSpeed is that sites have to install it -- or opt into Google's PageSpeed Service. We wanted to do something that would reach more users, so we started exploring building a browser-based proxy that users, rather than sites, could turn on to get faster Web page load times. (Not long after this, Amazon announced their Silk browser for the Kindle Fire, which was very much the same idea. Scooped!)

Starting a new project

Hence we started the Flywheel project. Initially our goal was to combine PageSpeed's optimizations, the new SPDY protocol, and some clever server-side pre-rendering and prefetching to make Web pages load lightning fast, even on cellular connections. The first version of Flywheel, which we built over about a year and a half, was built on top of PageSpeed Service.

Early in the project, we learned of the (confidential at the time) effort to port Chrome to Android and iOS. The Chrome team was excited about the potential for Flywheel, and asked us to join their team to launch it as a feature in the new browser. The timing was perfect. However, the Chrome leadership was far more interested in a proxy that could compress Web pages, which is especially important for users in emerging markets, on expensive mobile data plans. Indeed, many of the original predictive optimizations we were using in Flywheel would have resulted in substantially greater data usage for the user (e.g., prefetching the next few pages you were expected to visit). It also turned out that compression is way easier than performance, so we decided to focus our efforts on squeezing out as many bytes as possible. (A common mantra at the time was "no bytes left behind".)

Rewriting in Go

As we got closer to launching, we were really starting to feel the pain of bolting Flywheel onto PageSpeed Service. Originally, we planned to leverage many of the complex optimizations used by PageSpeed, but as we focused more on compression, we found that PageSpeed was not well-suited to our needs, for a bunch of reasons. In early 2013, Michael Piatek convinced me that it was worth trying to rewrite the service, from scratch, in Go -- as a way of both doing a clean redesign from scratch but also leveraging Go's support for building Google-scale services. It was a big risk, but we agreed that if the rewrite wasn't bearing fruit in just a couple of months that we'd stop work on it and go back to PageSpeed.


Fortunately, Michael and the rest of the team executed at lightning speed and in just a few months we had substantially reimplemented Flywheel in Go, a story documented elsewhere on this blog. In November 2013 I submitted a CL to delete the thousands of lines of the PageSpeed-based Flywheel implementation, and we switched over entirely to the new, Go-based system in production.

PageSpeed Service in C++ was pushing 270 Kloc at the time. The Go-based rewrite was just 25 Kloc, 13Kloc of which were tests. The new system was much easier to maintain, faster to develop, and gave our team sole ownership of the codebase, rather than having to negotiate changes across the multiple teams sharing the PageSpeed code. The bet paid off. The team was much happier and more productive on the new codebase, and we managed to migrate seamlessly to the Go-based system well before the full public launch.

Launching

We announced support for Flywheel in the M28 beta release of Chrome at Google I/O in 2013, and finally launched the service to 100% of Chrome Mobile users in January 2014. Since then we've seen tremendous growth of the service. More than 10% of all Chrome Mobile users have Flywheel enabled, with percentages running much higher in countries (like Brazil and India) where mobile data costs are high. The service handles billions of requests a day from millions of users. Chrome adoption on mobile has skyrocketed over the last year, and is now the #1 mobile browser in many parts of the world. We also recently launched Flywheel for Chrome desktop and ChromeOS. Every day I check the dashboards and see traffic going up and to the right -- it's exciting.

We came up with the idea for Flywheel in late 2011, and launched in early 2014 -- about 2.5 years of development work from concept to launch. I have no idea if that's typical at Google or anywhere else. To be sure, we faced a couple of setbacks which delayed launch by six months or more -- mostly factors out of our control. We decided to hold off on the full public release until the Go rewrite was done, but there were other factors as well. Looking back, I'm not sure there's much we could have done to accelerate the development and launch process, although I'm sure it would have gone faster had we been doing it as a startup, rather than at Google. (By the same token, launching as part of Chrome is a huge opportunity that we would not have had anywhere else.)

What's next?

Now that Flywheel is maturing, we have a bunch of new projects getting started. We still invest a lot of energy into optimizing and maintaining the Flywheel service. Much of the work focuses on making the service more robust to all of the weird problems we face proxying the whole Web -- random website outages causing backlogs of requests at the proxy, all manner of non-standards-compliant sites and middleboxes, etc. (Buy me a beer and I'll tell you some stories...) We are branching out beyond Flywheel to build some exciting new features in Chrome and Android to improve the mobile experience, especially for users in emerging markets. It'll be a while until I can write a blog post about these projects of course :-)