"Among all sources of information, visual information may be the most interesting"
Violetta Shevchenko, an Amazon applied scientist and former intern, combines vision and language to create solutions to challenging problems.
Violetta Shevchenko is enthusiastic about computer vision — and that enthusiasm is thanks, in part, to a fish farm.
When Shevchenko chose to study computer science at the Southern Federal University, in her home country of Russia, she was motivated by a yearning to understand how computers work. But as an undergrad student, she had little hope of following a career in science.
I never thought I would be a scientist because I just didn't see any options in my country.
“I never thought I would be a scientist because I just didn't see any options in my country,” she recalled. Her mother’s experience of studying physics yet having to switch to economics in search of better opportunities led her to believe she had limited chances.
That changed after she moved to Finland to pursue a master’s degree in computational engineering at LUT University. There she learned that science could be both an interesting and viable career option, and that there were opportunities and resources available for those willing to follow that path.
During her master’s program, Shevchenko worked in collaboration with a fish farm in Finland. She used computer vision to count fish populations passing through the river. “It was more of a standard computer vision approach, without any advanced techniques,” she said. “But I loved working with the images, so I wanted to continue with computer vision research.”
That experience sparked what has become a long-time fascination.
“Among all sources of information, visual information may be the most interesting, and also the most easily perceived,” she said. “All we have to do is look around.”
Shevchenko, who recently accepted a role as an applied scientist after concluding an internship at Amazon’s office in Adelaide, Australia, talked with Amazon Science about her experiences, both in academia and at Amazon.
Having lived in Finland for one year, Shevchenko wanted to continue her academic trajectory in a warmer and sunnier region.
“I had visited Adelaide before and the city is amazing,” she said. “So that became my priority.” The University of Adelaide came up in her first online search for computer vision and machine learning PhD programs. “I was extremely lucky that I found this amazing center and people who were working in the area that I was particularly interested in.”
At the University of Adelaide’s Australian Institute for Machine Learning (AIML), where Shevchenko pursued her PhD, researchers apply machine learning to solve problems in diverse fields, such as agriculture, mining, transport, manufacturing, and medicine. She received a scholarship from the Australian Centre for Robotic Vision (ACRV), a part of the Australian Research Council Centre of Excellence program, which promoted cutting-edge research on computer vision for seven years, until 2021.
Shevchenko focused her PhD on visual question answering, which she describes as a natural next step from classical computer vision tasks. “That's the problem where we have an image, and we want to ask a computer or any artificial intelligence questions about that image. So, we want to test the ability of AI to reason over visual information.”
Research with real-world applications
During her PhD, she worked on developing strategies to improve the practical applications of visual question answering in real-life scenarios by using external knowledge. One of the potential applications of this technology is for the visual assistance of visually impaired people. In a traditional task, a model can extract information from images directly and use that to answer certain questions.
“If there is an image of five horses running outside, the question may be how many horses are there. So, we will test the counting ability of the model,” Shevchenko explained. In the real world, however, researchers might want to ask a question that requires knowledge that is not necessarily in the image.
“If you ask how many mammals are in this scene, you need to know what mammals are,” she explains. “My whole PhD was about trying to make sure that the application of this task is not only restricted to research — where you have your training data — but can also be applied in the real world, where the range of the knowledge required is unrestricted.”
In October 2021, Shevchenko joined Amazon as an intern. When she first heard that Amazon was opening a new office in Adelaide, she thought it could be a great opportunity. Her PhD supervisor, Anton van den Hengel, is also the director of applied science at Amazon’s Adelaide office; he talked to her about projects his team was pursuing. It sounded like a perfect fit, particularly the opportunity to work on more applied research.
“Basic research is interesting and exciting work. But sometimes you feel less motivated because you can't always see the direct outcome of your work,” she noted. “You produce a paper, but you don't know how this paper will actually influence other people, how many people will use it, how many people will actually find it beneficial.”
As an intern, Shevchenko worked with data in the Amazon catalog, where multiple images, textual descriptions, and attributes exist for each product. This data may be used by Amazon scientists to classify products, cluster them in similar groups, find duplicates, and fill in information that a seller might have omitted, among many other tasks.
“All these tasks usually require extracting representations as a first step,” she explains. “No matter what you are doing, you first need to process your data and get something we call vector embeddings. Embeddings summarize and get all the important information from your data and transfer it into numerical form, which you can further use in your models.” Her task: create representations that combine visual and textual information efficiently.
She also contributed to a virtual product try-on project, a completely new area for her.
“I love that process when you're starting on a new direction, and you’re reading the literature, and diving deep into the basics of a new topic,” she says. One of the greatest challenges in this project, she says, is to make sure that the model developed is trustworthy and works for all customers.
Combining computer vision with other areas
During her internship, which ended in April, Shevchenko had the opportunity to work with Amazon scientists from multiple backgrounds and different experiences.
“No matter which problem I faced, there was always someone from our team who I could talk to, who had a really good experience or knowledge to help me with it. That was a great opportunity.”
She also benefitted from access to resources she didn’t have in academia. “During my PhD, when I worked with neural networks, training large models, there was always this problem of not having enough computing resources,” she explained. “But with Amazon, you can use almost any resource that is available to you, which greatly accelerates your process.”
Shevchenko, who moved into her full time role earlier this month, believes there is still a lot to explore in computer vision research. But, in the future, she believes that different areas of AI will coalesce.
“Basic computer vision tasks have been solved more or less efficiently already. So, we're going to the next step, where we're combining computer vision with other areas, like natural language processing.”