Tuesday, November 13, 2012

Herding Cats: How YouTube Processes 72 Hours of Video in 1 Minute

On September 29, at the statistical height of its popularity, the music video for Psy?s "Gangnam Style" received 12.8 million views on YouTube. That translates to an average of 8900 streaming requests for the video per minute, or 148 per second. And it?s not as though Psy?s was the only viral video on YouTube that day. Several other videos were trending, including Carly Rae Jepsen?s "Call Me Maybe," One Direction?s "Live While We?re Young," dramatic helmet-cam footage from a soldier in a firefight in Afghanistan, and grainy cellphone video of lightning hitting a car?which were, collectively, also getting millions of views.

"YouTube as a site gets traffic at a level every day that would DDoS a large percentage of other websites," says Rushabh Doshi, a software engineer at YouTube, referring to the distributed denial of service attacks that hackers routinely use to overload and incapacitate websites.

It?s a measure of the success of the YouTube platform that all those video streams were served up without any instability to the site. But serving huge volumes of video is only one-half of the back-end engineering challenge for YouTube, which Google bought in 2006. The site must ingest huge quantities of video uploaded 24 hours a day around the world. That?s Doshi?s specialty?he is the tech lead for uploads. "Everything at YouTube is made immensely more complicated because of the scale at which we operate. We get 72 hours of video uploaded every minute. That?s like 36 full-length movies being uploaded to your site every minute."

When Doshi joined YouTube in 2007, he says the site was getting 6 hours of video per minute. In the past five years, then, the site has seen a 10-fold increase in volume. But, Doshi says, the amount of data has exploded at an even higher rate. "The insane part is that there has been this huge shift going on from people capturing video at 240p resolution with low-bit-rate cameras to modern consumer devices that are 1080p with 25- to 30-megabit-per-second video streams," he says. "That really explodes the amount of data that you need to process."

Data Deluge


Digital video has fundamentally changed the Web in the past half-decade, and no company has been more central to that evolution than YouTube. It democratized and popularized video sharing. It provided an opportunity for expression and protest for citizens of oppressive regimes, such as those in Iran, Libya, and Syria. It has also provided a vehicle through which the provocateurs of one culture can incite outrage in another; the video that prompted riots and mayhem throughout the Muslim world was spread through YouTube. And YouTube has made thousands and thousands of cats temporarily famous. With the ascendance of social media, sites and services such as Facebook, Twitter, and Instagram get most of the attention these days. But, according to Web analytics firm Alexa, the only sites that get more traffic than YouTube are Google and Facebook.

The site has become so synonymous with online video that it?s easy to overlook what a technological marvel it is. The site can ingest video directly from pretty much any source: phones, tablets, connected cameras, computer webcams, what have you. And engineers at the site have worked hard to make some of the complexities of digital video?codecs, bit rate, resolution?invisible to users, doing all of the transcoding and processing on the back end.

YouTube?s upload page, for instance, is embedded with software that can automatically upload and transcode multiple HD video files in real time. To decrease latency, the uploaded video is usually sent to whichever Google data center is geographically closest to the user. "We absolutely obsess over speed," Doshi says. "One key insight is that you don?t have to wait to have the entire video to start processing it. The other key insight is that we can split up the video into smaller chunks and start processing each chunk separately. This plays really well to Google?s strengths?we have big data centers with lots of computers, so we have the CPU power to throw at it. Instead of trying to process one video on one computer, we break it up and distribute it."

The task is made even more complex by the process the company uses to make uploading faster. When you break up a video into discrete sections, distribute it, and then transcode it on different machines, you run the risk of an inconsistent end result. "The devil is in the details," Doshi says. He didn?t reveal too much about the process, except to hint that after the different transcodes, there is a final video-processing pass. "When you stitch the whole video back together, it has to behave more or less as if it were transcoded on one machine, so it doesn?t have weird jumps or large color changes."

Search and Discovery


All this happens in the background as thousands of people upload video at the same time. However, YouTube faces another challenge beyond just processing videos: The site also wants those videos to be tagged with data and easily searchable once they?ve arrived. But users are inconsistent with the information they volunteer with their videos.

Cristos Goodrow is YouTube?s engineering director for search and discovery. If you?ve ever found some obscure piece of footage you were searching desperately for or serendipitously stumbled upon something delightful, you have him to thank. Then again, if you are frustratingly lost in a morass of cat clips and teenybopper music videos, that?s his fault too. Goodrow says that being part of the world?s most dominant search provider has its advantages?YouTube uses much of Google?s search technology?but he points out that there are many things that make video search different.

"At Google, people are often looking for information, and more often they want navigational or canonical results," Goodrow says. "The other day, I wanted to get passports for my kids, and if I type "Passport," there?s a clear result that I?d want, which is the U.S. Passport Agency. It doesn?t change that often, and that?s what everybody wants." The demand for video, he says, is far more subject to trends. "There are old videos that people search for and want," he says. "But for very broad queries they tend to want newer videos all the time."

There is also a chronic problem of labeling. Often, users aren?t at all specific when they name their videos ("Crazy Jump!"), or they give their videos names that are personally relevant yet not unique ("John?s Phone Unboxing"). People searching for a video aren?t necessarily specific enough, either. So Goodrow?s team uses the search algorithms to try and match the query contextually to the video. "We could fill a thousand search result pages with results from the term ?funny video,? " he says. "Some of those videos don?t get much attention or aren?t that funny. Others are so funny that people link to them on their blogs, or maybe they make it onto the Yahoo homepage or something like that. Those are most likely to be the funniest of the funny videos."

Yet there are perils to relying upon the wisdom of the Web. My own search for Funny Video turned up dozens of videos of people falling down or getting hit in the face (and other places) in incredibly predictable and not-so-funny ways. I was actually surprised to learn that YouTube doesn?t use any sort of face detection or video analytics tools to try to figure out what each video is about, but Goodrow?s asserted that Google?s traditional PageRank algorithm, which assigns values to Web pages based on the other pages that link to them, works well for video as well. Obviously, though, PageRank doesn?t know funny.

YouTube may bias results toward the new and trending, but, remarkably, the service never deletes a video, no matter how few people watch it, unless the video?s owner requests it to be taken down. I asked Goodrow why the company wastes valuable storage resources on videos nobody watches. In a typical Google employee?s response, he demurred from commenting on the relative value of one video versus another and gave me a technical answer.

"If nobody?s watched a video for a long, long time, we may put it in a system that has higher latency or lower throughput than a system where we put a viral video," he said. "Then, if someone wants it, they?ll be willing to wait an extra second to get it." Presumably, 2 extra seconds would be unacceptable.

Source: http://www.popularmechanics.com/technology/gadgets/news/herding-cats-how-youtube-processes-72-hours-of-video-in-1-minute-14720513?src=rss

narcolepsy narcolepsy weather st louis faceoff kings island red hot chili peppers tour orange juice

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.