#GPUniversity: Deep Learning and Beyond

NVIDIA hosted GPUniversity, a day of talks and a hands-on workshop on Deep Learning. This was held in the Husky Union Building (HUB) at University of Washington, Seattle on 14 April, 2017. The workshop was organized to discuss the future of Artificial Intelligence computing and discover how Graphics Processing Units (GPUs) are powering this revolution.

The day had a solid lineup of speakers (Stan Birchfield, Nvidia and Prof. Ali Farhadi, UW-Seattle) and a workshop on Signal Processing using NVIDIA digits.

The talks started at 10:30 am, with Dr. Stan Birchfiled presenting on ‘Deep Learning for Autonomous Drone Flying through Forest Trails‘. He is a Principal Research Scientist at NVIDIA, Seattle. Dr. Birchfield provided us a brief overview of three major projects happening at NVIDIA. The first project described how NVIDIA is currently looking at replacing the Image Signal Processor (ISP), which is a collection of modules like auto exposure, denoise, demosaic, amongst others, with a deep learning network. Here is a blog post from NVIDIA that could provide some information on the advances in deep learning.

The second project was about their efforts to reduce driver distraction. By making use of data from inside the car, the head pose and gaze of the driver are estimated. A different research team at NVIDIA is also researching on the use of hand gestures for automotive interfaces. Having worked on gesture recognition using a standard camera and computer vision algorithms, this research excites me. Their most recent paper can be found in CVPR2016.

He finally addressed the topic of image-to-image translation before speaking about his research. Image-to-image translation would allow one to shift images from a day view to night, from a sunny image to rainy, or from RGB to IR. The possibilities are endless. The system takes a raw image as input and provides a final image as output. Here is a publication by NVIDIA I found on the topic.

This was followed by information about Dr. Birchfield’s research on autonomous flight of drones in forests. Most drone enthusiasts have found it hard to navigate their autonomous aerial vehicles in the forest. The trees create multipath effect and attenuate/block the signal, resulting in GPS being unreliable. However, if this problem could be solved, drones could serve multiple functions – search and rescue, environmental mapping, personal videography, and of course, drone racing!

NVIDIA’s approach to the problem eliminates the use of GPS (at this stage) and uses deep learning for computer vision instead. Their research is done using micro aerial vehicles (MAV). For this purpose, they make use of the 3DR Iris+ with a Jetson Tegra TX1 processor. By the method of imitation learning (used in NVIDIA self-driving cars), the drone is taught to fly along a trail and stop at a safe distance if a human is detected. The dataset makes use of prior research from University of Zurich (Giusti et al. 2016) and the data collected from Pacific Northwest trails. The system also makes use of  DSO and YOLO algorithms. The distribution mismatch was fixed by adding three cameras instead of just one. A detailed talk about this research will be presented at the GPU Technology Conference in May. You can follow the research here.

Professor Ali Farhadi had an interactive session on Visual Intelligence. He started his presentation by showcasing the performance of YOLO in real-time.

YOLO in real-time using a mobile phone

An additional demo that followed briefed the design of a 5$ computer to detect people. This was built using a Raspberry Pi Zero.

Prof. Farhadi took us through a number of projects in his 45-minute talk. The man never fails to impress (I have been in his class and he is an inspiring teacher!) I am going to provide a brief description of these projects and add links to publications/research websites below.

Visual recognition involves visual knowledge, data, parsing and visual reasoning. The action-centric view of visual recognition involves three parts: recognizing actions, predicting expected outcomes and devising a plan. The projects discussed include all these factors.

  1. imsitu.org : It is used for situation recognition, as opposed to treating all the components of an image as objects. This enables the system to not just predict the objects or locations, but include information on the activity being performed and the roles of the participants performing the activity. The demo provided on the website implements Compositional Conditional Random field, pre-trained using semantic data augmentation on 5 million web images.
    Go ahead and try it here.
  2. Learn EVerything about ANything (LEVAN): Single camera systems pose a problem when size is a determining factor for visual intelligence. However, if we are able to understand the average sizes of objects, we could make better predictions by imposing a distribution. LEVAN acts as a visual encyclopedia for you, helping you explore and understand in detail any topic that you are curious about.
    Try the demo here. If it does not have a concept you are looking for, click and add it to the database! 🙂
  3. Visual Knowledge Extraction Engine (VisKE): To briefly describe it, VisKE does visual fact checking. It provides the most probable explanation based on the visual image off the internet. It generates a factor graph that assigns scores based on how much it visually trusts the information.
    Try the demo here.
  4. Visual Newtonian Dynamics (VIND): VIND predicts the dynamics of query objects in static images. The dataset compiled includes videos aligned with Newtonian scenarios represented using game engines, and still images with their ground truth dynamics. A Newtonian neural network performs the correlation.
  5. What Happens if?: By making use of the Forces in Scenes (ForScene) dataset from the University of Washington, and using a combination of Recurrent Neural Nets with Convolutional Neural Nets, this project aims to understand the effect of external forces on objects. The system makes sequential predictions based on the force vector applied to a specific location.
  6. AI2 THOR Framework: THOR is the dataset of visually realistic scenes rendered for studying actions based on visual input.

Hope these projects shed more light on the possibilities in Computer Vision and Deep Learning.

GPUniversity Workshop about Deep Learning Institute

If you would like to get your hands dirty, try nvlabs.qwiklab.com for access to NVIDIA DIGITS or courses mentioned on the Deep Learning Institute website.

International Women’s Day celebration with Women Techmakers: What I learnt

International Women’s Day is observed on March 8th of every year. It is a global calling to celebrate the social, economic, cultural and political achievements of women. It is also a time to reflect on the way we have progressed, and to encourage ordinary women to do extraordinary things for their countries and communities.

This year, Women Techmakers is hosting summits at Google offices across the globe to celebrate International Women’s Day. The summits last for an entire day, with talks, discussions and hands-on activities by various partners, such as Tensorflow, Speechless and more! These, unlike the Women Techmakers meetups, are invitation-only. The attendees are chosen through an online application available on their website. It requires a short essay on “What are you passionate about solving for in your current role?“. I crossed my fingers and poured my heart into the essay I sent across. I was fortunate to be extended an invitation to attend the summit at Google Kirkland on March 4th.

On the day of:

The surprisingly sunny Seattle weather offered the most stunning view of the Kirkland office. Here is a picture of what I saw.

Google Kirkland office on a warm, sunny day

The atmosphere in the welcome room was incredibly warm and friendly. The theme of the summit was “Telling your story”. To get started on this, we were given name tags to which we could attach three qualities we best associate with. This served as an easy conversation starter, which was a blessing for an introvert such as myself. After exchanging greetings with women from companies big and small, and downing a sumptuous breakfast, we headed into a room full of ~150 talented women.

Name tags and some “swag”

The day started with Olga Garcia, Engineering Program Manager at Google, giving us something to think about. She paraphrased a line from the poem “Our Grandmothers” by Maya Angelou:

I come as one, but I stand as ten thousand.

It makes us ponder about the journey of those who have paved the way for us, and instills the confidence in us to do the same for others.

Senator Patty Kuderer welcomed us with her story of fighting to bring gender parity in the state of Washington. She is also an advocate of change for introducing more girls to STEM, something she had to forgo during her school days. This was succeeded by a Keynote address by Thais Melo, a Tech Lead Manager in Google Cloud. Her journey of going from coding in the corner to leading a team evidences that we can take on any role as long as we believe in our abilities.

The Stories of Success panel that followed the Keynote opened us to stories of four brilliant women: Sara Adineh, Nikisha Reyes-Grange, Angel Tian and Heather Sherman. We were introduced to wide-ranging experiences from their respective fields and were given an insight into how they dealt with challenges they faced.

After lunch, we split into two groups to attend fun workshops; Group 1 headed to “An Introduction to Tensor Flow” and Group 2 was a part of “Develop Your Story: An Interactive Workshop”. I chose to be a part of the latter, excited to learn from Kimberly MacLean. It was two hours of learning to voice our stories through group discussions, role play, and fun games. This included “Yes, and…”, Portkey exercise (YES! The workshop was lead by someone who gets Harry Porter!), story building, and working on our elevator pitch. Time flew by and I made some very good friends through the exercises. I also learned from others that the TensorFlow workshop was just as much fun; why can’t we be in two places at the same time!

With all our newly found knowledge and energy, we moved into the last session of the day – An evening with Waymo! Waymo is an autonomous driving company spun out of Alpabet Inc., in December 2016. Being a Robotics Engineer with experience in Computer Vision and Machine Learning, this session was the perfect blend to geek out! The interactive activity required us to think from the perspective of a software engineer, a systems engineer, a project manager and a mechanical engineer. We went through the design process for different scenarios, such as snow, rain and crowded neighborhoods.

Women Techmakers Seattle: Success is…

To learn more about what happened at all other summits, follow #WTM17 on Twitter or Google Plus.