KDD 2023: Graph neural networks’ new frontiers

Conference general chair and Amazon Scholar Yizhou Sun on modeling long-range dependencies, improving efficiency, and new causal models.

In 2021 and 2022, when Amazon Science asked members of the program committees of the Knowledge Discovery and Data Mining Conference (KDD) to discuss the state of their field, the conversations revolved around graph neural networks.

Yizhou Sun.jpeg
Yizhou Sun, an associate professor of computer science at the University of California, Los Angeles; an Amazon Scholar; and general chair of the 2023 Knowledge Discovery and Data Mining Conference.

Graph learning remains the most popular topic at KDD 2023, but as Yizhou Sun, an associate professor of computer science at the University of California, Los Angeles; an Amazon Scholar; and the conference’s general chair, explains, that doesn’t mean that the field has stood still.

Graph neural networks (GNNs) are machine learning models that produce embeddings, or vector representations, of graph nodes that capture information about the nodes’ relationships to other nodes. They can be used for graph-related tasks, such as predicting edges or labeling nodes, but they can also be used for arbitrary downstream processing tasks, which simply take advantage of the information encoded in graph structure.

But within that general definition, “the implication of ‘graph neural network’ could be very different,” Sun says. “‘Graph neural network’ is a very broad term.”

For instance, Sun explains, traditional GNNs use message passing to produce embeddings. Each node in the graph is embedded, and then each node receives the embeddings of its neighboring nodes (the passed messages), which it integrates into an updated embedding. Typically, this process is performed two to three times, so that the embedding of each node captures information about its one- to three-hop neighborhood.

Related content
Information extraction, drug discovery, and software analysis are just a few applications of this versatile tool.

“If I do message passing, I can only collect information from my immediate neighbors,” Sun explains. “I need to go through many, many layers to model long-range dependencies. For some specific applications, like software analysis or simulation of physical systems, long-range dependency becomes critical.

“So people asked how we can change this architecture. They were inspired by the transformer” — the attention-based neural architecture that underlies today’s large language models — “because the transformer can be considered a special case of a graph neural network, where in the input window, every token can be connected to every other token.

“If every node can communicate with every node in the graph, you can easily address this long-range-dependency issue. But there will be two limitations. One is efficiency. For some graphs, there are many millions or even billions of nodes. You cannot efficiently talk to everyone else in the graph.”

The second concern, Sun explains, is that too much long-range connectivity undermines the very point of graphical representation. Graphs are useful because they capture meaningful relationships between nodes — which means leaving out the meaningless ones. If every node in the graph communicates with every other node, the meaningful connections are diluted.

Related content
In tests, new approach is 15 to 18 times as fast as predecessors.

To combat this problem, “people try to find a way to mimic the position encoding in the text setting or the image setting,” Sun says. “In the text setting, we just turned the position into some encoding. And later, in the computer vision domain, people said, ‘Okay, let's also do that with image patches.’ So, for example, we can break each image into six-by-six patches, and the relative position of those patches can be turned into a position encoding.

“So the next question is, in the graph setting, how we can get that natural kind of relative position? There are different ways to do that, like random walk — a very simple one. And also people try to do eigendecomposition, where we utilize eigenvectors to encode the relative position of those nodes. But eigendecomposition is very time consuming, so again, it comes down to the efficiency problem.”

Efficiency

Indeed, Sun explains, improving the efficiency of GNNs is itself an active area of research — from high-level algorithmic design down to the level of chip design.

“At the algorithm level, you might try to do some sort of sampling technique, just try to make the number of operations smaller,” she says. “Or maybe just design some more efficient algorithms to sparsify the graphs. For example, let's say we wanted to do some sort of similarity search, to keep the most similar nodes to each target node. Then people can design some smart index technology to make that part very fast.

“And in the inference stage, we can do knowledge distillation to distill a very complicated model, let's say a graph neural network, into a very simple graph neural network — or not necessarily a graph neural network, maybe just a very simple kind of structure, like an MLP [multilayer perceptron]. Then we can do the calculation much faster. Quantization can also be applied in the inference stage to make computation much faster.

Related content
Amazon’s George Karypis will give a keynote address on graph neural networks, a field in which “there is some fundamental theoretical stuff that we still need to understand.”

“So that's at the algorithm level. But nowadays people go deeper. Sometimes, if you want to solve the problem, you need to go to the system level. So people say, let's see how we can design this distributed system to accelerate the training, accelerate the inference.

“For example, in some cases, the memory becomes the main constraint. In this case, probably the only thing we can do is distribute the workload. Then the natural problems are how we can coordinate or synchronize the model parameters trained by each computational node. If we have to distribute the data to 10 machines, how can you coordinate with those 10 machines to make sure you only have one final version?

“And people now even go even deeper, to do the acceleration on the hardware side. So software-hardware co-design also becomes more and more popular. It requires people to really know so many different fields.

“By the way, at KDD, compared to many other machine learning conferences, real-world problems are always our top focus. In many cases, in order to solve the real-world problem, we have to talk to people with different backgrounds, because we cannot just wrap it up into the kind of ideal problems we solved when we were in high school.”

Applications

Beyond such general efforts to improve GNNs’ versatility and accuracy, however, there’s also new research on specific applications of GNN technology.

“There’s some work on how we can do causal analysis in the graph setting, meaning that the objects actually interfere with each other,” Sun explains. “This is quite different from the traditional setting: the patients in a drug study, for example, are independent from each other.

Related content
Novel cross-graph-attention and self-attention mechanisms enable state-of-the-art performance.

“There is also a new trend to combine deep representation learning with the causal inference. For example, how can we represent the treatment you try as a continuous vector, instead of just a binary treatment? Can we make the treatment timewise continuous — meaning that it's not just a static kind of one-time treatment? If I put the treatment 10 days later, how would the outcome compare to putting the treatment 20 days later? Time is very important; how can we inject that time information in?

“Graphs can also be considered a good data structure to describe multiagent dynamical systems — how those objects interact with each other in a dynamic network setting. And then, how can we incorporate the generative idea into graphs? Graph generation is very useful for many fields, such as in the drug industry.

“And then there are so many applications where we can benefit from large language models [LLMs]. For example, knowledge graph reasoning. We know that LLMs hallucinate, and reasoning on KGs is very rigorous. What would be a good combination of these two?

“With GNNs, there’s always new stuff. Graphs are just a very useful data structure to model our interconnected world.”

Related content

US, WA, Seattle
By applying to this position, your application will be considered for all locations we hire for in the United States. Are you interested in machine learning, deep learning, automated reasoning, speech, robotics, computer vision, optimization, or quantum computing? We are looking for applied scientists capable of using a variety of domain expertise to invent, design, evangelize, and implement state-of-the-art solutions for never-before-solved problems. Our full-time opportunities are available in, but are not limited to the following domains: • Machine Learning: You will put Machine Learning theory into practice through experimentation and invention, leveraging machine learning techniques (such as random forest, Bayesian networks, ensemble learning, clustering, etc.), and implement learning systems to work on massive datasets in an effort to tackle never-before-solved problems. • Automated Reasoning: AWS Automated Reasoning teams deliver tools that are called billions of times daily. Amazon development teams are integrating automated-reasoning tools such as Dafny, P, and SAW into their development processes, raising the bar on the security, durability, availability, and quality of our products. Areas of work include: Distributed proof search, SAT and SMT solvers, Reasoning about distributed systems, Automating regulatory compliance, Program analysis and synthesis, Security and privacy, Cryptography, Static analysis, Property-based testing, Model-checking, Deductive verification, compilation into mainstream programming languages, Automatic test generation, and Static and dynamic methods for concurrent systems. • Natural Language Processing and Speech Technologies: You will tackle some of the most interesting research problems on the leading edge of natural language processing. We are hiring in all areas of spoken language understanding: NLP, NLU, ASR, text-to-speech (TTS), and more! • Computer Vision and Robotics: You will help build solutions where visual input helps the customers shop, anticipate technological advances, work with leading edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for our customers. • Quantum: Quantum computing is rapidly emerging and our customers can the see the potential it has to address their challenges. One of our missions at AWS is to give customers access to the most innovative technology available and help them continuously reinvent their business. Quantum computing is a technology that holds promise to be transformational in many industries. We are adding quantum computing resources to the toolkits of every researcher and developer. If this sounds exciting to you - come build the future with us! Key job responsibilities You will have access to large datasets with billions of images and video to build large-scale systems Analyze and model terabytes of text, images, and other types of data to solve real-world problems and translate business and functional requirements into quick prototypes or proofs of concept Own the design and development of end-to-end systems Write technical white papers, create technical roadmaps, and drive production level projects that will support Amazon Web Services Work closely with AWS scientists to develop solutions and deploy them into production Work with diverse groups of people and cross-functional teams to solve complex business problems
US, WA, Seattle
Our mission is to create best-in-class AI agents that seamlessly integrate multimodal inputs like speech, images, and video, enabling natural, empathetic, and adaptive interactions. We develop cutting-edge Large Language Models (LLMs) that leverage advanced architectures, cross-modal learning, interpretability, and responsible AI techniques to provide coherent, context-aware responses augmented by real-time knowledge retrieval. We seek a talented Applied Scientist with expertise in LLMs, speech, audio, NLP, or multimodal learning to pioneer innovations in data simulation, representation, model pre-training/fine-tuning, generation, reasoning, retrieval, and evaluation. The ideal candidate will build scalable solutions for a variety of applications, such as streaming real-time conversational experiences, including multilingual support, talking avatar interactions, customizable personalities, and conversational turn-taking. With a passion for pushing boundaries and rapid experimentation, you'll deliver high-impact solutions from research to customer-facing products and services. Key job responsibilities As an Applied Scientist, you'll leverage your expertise to research novel algorithms and modeling techniques to develop data simulation approaches mimicking real-world interactions with a focus on the speech modality. You'll acquire and curate large, diverse datasets while ensuring privacy, creating robust evaluation metrics and test sets to comprehensively assess LLM performance. Integrating human-in-the-loop feedback, you'll iterate on data selection, sampling, and enhancement techniques to improve the core model performance. Your innovations in data representation, model pre-training/fine-tuning on simulated and real-world datasets, and responsible AI practices will directly impact customers through new AI products and services.
US, WA, Seattle
Our mission is to create best-in-class AI agents that seamlessly integrate multimodal inputs like speech, images, and video, enabling natural, empathetic, and adaptive interactions. We develop cutting-edge Large Language Models (LLMs) that leverage advanced architectures, cross-modal learning, interpretability, and responsible AI techniques to provide coherent, context-aware responses augmented by real-time knowledge retrieval. We seek a talented Applied Scientist with expertise in LLMs, speech, audio, NLP, or multimodal learning to pioneer innovations in data simulation, representation, model pre-training/fine-tuning, generation, reasoning, retrieval, and evaluation. The ideal candidate will build scalable solutions for a variety of applications, such as streaming real-time conversational experiences, including multilingual support, talking avatar interactions, customizable personalities, and conversational turn-taking. With a passion for pushing boundaries and rapid experimentation, you'll deliver high-impact solutions from research to customer-facing products and services. Key job responsibilities As an Applied Scientist, you'll leverage your expertise to research novel algorithms and modeling techniques to develop data simulation approaches mimicking real-world interactions with a focus on the speech modality. You'll acquire and curate large, diverse datasets while ensuring privacy, creating robust evaluation metrics and test sets to comprehensively assess LLM performance. Integrating human-in-the-loop feedback, you'll iterate on data selection, sampling, and enhancement techniques to improve the core model performance. Your innovations in data representation, model pre-training/fine-tuning on simulated and real-world datasets, and responsible AI practices will directly impact customers through new AI products and services.
US, WA, Seattle
Join us at the cutting edge of Amazon's sustainability initiatives to work on environmental and social advancements to support Amazon's long term worldwide sustainability strategy. At Amazon, we're working to be the most customer-centric company on earth. To get there, we need exceptionally talented, bright, and driven people. The Worldwide Sustainability (WWS) organization capitalizes on Amazon’s scale & speed to build a more resilient and sustainable company. We manage our social and environmental impacts globally, driving solutions that enable our customers, businesses, and the world around us to become more sustainable. Sustainability Science and Innovation (SSI) is a multi-disciplinary team within the WW Sustainability organization that combines science, analytics, economics, statistics, machine learning, product development, and engineering expertise. We use this expertise and skills to identify, develop and evaluate the science and innovations necessary for Amazon, customers and partners to meet their long-term sustainability goals and commitments. We’re seeking a Senior Principal Scientist for Sustainability and Climate AI to drive technical strategy and innovation for our long-term sustainability and climate commitments through AI & ML. You will serve as the strategic technical advisor to science, emerging tech, and climate pledge partners operating at the Director, VPs, and SVP level. You will set the next generation modeling standards for the team and tackle the most immature/complex modeling problems following the latest sustainability/climate sciences. Staying hyper current with emergent sustainability/climate science and machine learning trends, you'll be trusted to translate recommendations to leadership and be the voice of our interpretation. You will nurture a continuous delivery culture to embed informed, science-based decision-making into existing mechanisms, such as decarbonization strategies, ESG compliance, and risk management. You will also have the opportunity to collaborate with the Climate Pledge team to define strategies based on emergent science/tech trends and influence investment strategy. As a leader on this team, you'll play a key role in worldwide sustainability organizational planning, hiring, mentorship and leadership development. If you see yourself as a thought leader and innovator at the intersection of climate science and tech, we’d like to connect with you. About the team Diverse Experiences: World Wide Sustainability (WWS) values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture: It’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth: We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance: We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, WA, Bellevue
The Geospatial science team solves problems at the interface of ML/AI and GIS for Amazon's last mile delivery programs. We have access to Earth-scale datasets and use them to solve challenging problems that affect hundreds of thousands of transporters. We are looking for strong candidates to join the transportation science team which owns time estimation, GPS trajectory learning, and sensor fusion from phone data. You will join a team of GIS and ML domain experts and be expected to develop ML models, present research results to stakeholders, and collaborate with SDEs to implement the models in production. Key job responsibilities - Understand business problems and translate them into science problems - Develop ML models - Present research results - Write and publish papers - Write production code - Collaborate with SDEs and other scientists
IN, KA, Bengaluru
Job Description AOP(Analytics Operations and Programs) team is responsible for creating core analytics, insight generation and science capabilities for ROW Ops. We develop scalable analytics applications and research modeling to optimize operation processes.. You will work with professional Product Managers, Data Engineers, Data Scientists, Research Scientists, Applied Scientists and Business Intelligence Engineers using rigorous quantitative approaches to ensure high quality data/science products for our customers around the world. We are looking for an Applied Scientist to join our growing Science Team in Bangalore/Hyderabad. As an Applied Scientist, you are able to use a range of science methodologies to solve challenging business problems when the solution is unclear. You will be responsible for building ML models to solve complex business problems and test them in production environment. The scope of role includes defining the charter for the project and proposing solutions which align with org's priorities and production constraints but still create impact . You will achieve this by leveraging strong leadership and communication skills, data science skills and by acquiring domain knowledge pertaining to the delivery operations systems. You will provide ML thought leadership to technical and business leaders, and possess ability to think strategically about business, product, and technical challenges. You will also be expected to contribute to the science community by participating in science reviews and publishing in internal or external ML conferences. Our team solves a broad range of problems that can be scaled across ROW (Rest of the World including countries like India, Australia, Singapore, MENA and LATAM). Here is a glimpse of the problems that this team deals with on a regular basis: • Using live package and truck signals to adjust truck capacities in real-time • HOTW models for Last Mile Channel Allocation • Using LLMs to automate analytical processes and insight generation • Using ML to predict parameters which affect truck scheduling • Working with global science teams to predict Shipments Per Route for $MM savings • Deep Learning models to classify addresses based on various attributes Key job responsibilities 1. Use machine learning and analytical techniques to create scalable solutions for business problems Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes 2. Design, develop, evaluate and deploy, innovative and highly scalable ML models 3. Work closely with other science and engineering teams to drive real-time model implementations 4. Work closely with Ops/Product partners to identify problems and propose machine learning solutions 5. Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model maintenance 6. Work proactively with engineering teams and product managers to evangelize new algorithms and drive the implementation of large-scale complex ML models in production 7. Leading projects and mentoring other scientists, engineers in the use of ML techniques As part of our team, candidate in this role will work in close collaboration with other applied scientists and cross functional teams on high visibility projects with direct exposure to the senior leadership team on regular basis. About the team This team is responsible for applying science based algo and techniques to solve the problems in operation and supply chain. Some of these problems include Truck Scheduling, LM capacity planning, LLM and so on.
US, WA, Bellevue
The Learning & Development Science team in Amazon Logistics (AMZL) builds state-of-the-art Artificial Intelligence (AI) solutions for enhancing leadership and associate development within the organization. We develop technology and mechanisms to map the learner journeys, answer real-time questions through chat assistants, and drive the right interventions at the right time. As an Applied Scientist on the team, you will play a critical role in driving the design, research, and development of these science initiatives. The ideal candidate will lead the research on learning and development trends, and develop impactful learning journey roadmap that align with organizational goals and priorities. By parsing the information of different learning courses, they will utilize the latest advances in Gen AI technology to address the personalized questions in real-time from the leadership and associates through chat assistants. Post the learning interventions, the candidate will apply causal inference or A/B experimentation frameworks to assess the associated impact of these learning programs on associate performance. As a part of this role, this candidate will collaborate with a large team of experts in the field and move the state of learning experience research forward. They should have the ability to communicate the science insights effectively to both technical and non-technical audiences. Key job responsibilities * Apply science models to extract actionable information from learning feedback * Leverage GenAI/Large Language Model (LLM) technology for scaling and automating learning experience workflows * Design and implement metrics to evaluate the effectiveness of AI models * Present deep dives and analysis to both technical and non-technical stakeholders, ensuring clarity and understanding and influencing business partners * Perform statistical analysis and statistical tests including hypothesis testing and A/B testing * Recognize and adopt best practices in reporting and analysis: data integrity, test design, analysis, validation, and documentation
US, WA, Bellevue
Are you excited about developing cutting-edge generative AI, large language models (LLMs), and foundation models? Are you looking for opportunities to build and deploy them on real-world problems at a truly vast scale with global impact? At AFT (Amazon Fulfillment Technologies) AI, a group of around 50 scientists and engineers, we are on a mission to build a new generation of dynamic end-to-end prediction models (and agents) for our warehouses based on GenAI and LLMs. These models will be able to understand and make use of petabytes of human-centered as well as process information, and learn to perceive and act to further improve our world-class customer experience – at Amazon scale. We are looking for a Sr. Applied Scientist who will become of the research leads in a team that builds next-level end-to-end process predictions and shift simulations for all systems in a full warehouse with the help of generative AI, graph neural networks, and LLMs. Together, we will be pushing beyond the state of the art in simulation and optimization of one of the most complex systems in the world: Amazon's Fulfillment Network. Key job responsibilities In this role, you will dive deep into our fulfillment network, understand complex processes, and channel your insights to build large-scale machine learning models (LLMs and Transformer-based GNNs) that will be able to understand (and, eventually, optimize) the state and future of our buildings, network, and orders. You will face a high level of research ambiguity and problems that require creative, ambitious, and inventive solutions. You will work with and in a team of applied scientists to solve cutting-edge problems going beyond the published state of the art that will drive transformative change on a truly global scale. You will identify promising research directions, define parts of our research agenda and be a mentor to members of our team and beyond. You will influence the broader Amazon science community and communicate with technical, scientific and business leaders. If you thrive in a dynamic environment and are passionate about pushing the boundaries of generative AI, LLMs, and optimization systems, we want to hear from you. A day in the life Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! About the team Amazon Fulfillment Technologies (AFT) powers Amazon’s global fulfillment network. We invent and deliver software, hardware, and data science solutions that orchestrate processes, robots, machines, and people. We harmonize the physical and virtual world so Amazon customers can get what they want, when they want it. The AFT AI team has deep expertise developing cutting edge AI solutions at scale and successfully applying them to business problems in the Amazon Fulfillment Network. These solutions typically utilize machine learning and computer vision techniques, applied to text, sequences of events, images or video from existing or new hardware. We influence each stage of innovation from inception to deployment, developing a research plan, creating and testing prototype solutions, and shepherding the production versions to launch.
US, CA, Santa Clara
Machine learning (ML) has been strategic to Amazon from the early years. We are pioneers in areas such as recommendation engines, product search, eCommerce fraud detection, and large-scale optimization of fulfillment center operations. The Generative AI team helps AWS customers accelerate the use of Generative AI to solve business and operational challenges and promote innovation in their organization. As an applied scientist, you are proficient in designing and developing advanced ML models to solve diverse challenges and opportunities. You will be working with terabytes of text, images, and other types of data to solve real-world problems. You'll design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. We’re looking for talented scientists capable of applying ML algorithms and cutting-edge deep learning (DL) and reinforcement learning approaches to areas such as drug discovery, customer segmentation, fraud prevention, capacity planning, predictive maintenance, pricing optimization, call center analytics, player pose estimation, event detection, and virtual assistant among others. Key job responsibilities The primary responsibilities of this role are to: • Design, develop, and evaluate innovative ML models to solve diverse challenges and opportunities across industries • Interact with customer directly to understand their business problems, and help them with defining and implementing scalable Generative AI solutions to solve them • Work closely with account teams, research scientist teams, and product engineering teams to drive model implementations and new solution A day in the life ABOUT AWS: Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
IL, Tel Aviv
Come build the future of entertainment with us. Are you interested in helping shape the future of movies and television? Do you want to help define the next generation of how and what Amazon customers are watching? Prime Video is a premium streaming service that offers customers a vast collection of TV shows and movies - all with the ease of finding what they love to watch in one place. We offer customers thousands of popular movies and TV shows including Amazon Originals and exclusive licensed content to exciting live sports events. We also offer our members the opportunity to subscribe to add-on channels which they can cancel at anytime and to rent or buy new release movies and TV box sets on the Prime Video Store. Prime Video is a fast-paced, growth business - available in over 240 countries and territories worldwide. The team works in a dynamic environment where innovating on behalf of our customers is at the heart of everything we do. If this sounds exciting to you, please read on. We are looking for a Data Scientist to embark on our journey to build a Prime Video Sports tech team in Israel from ground up. Our team will focus on developing products to allow for personalizing the customers’ experience and providing them real-time insights and revolutionary experiences using Computer Vision (CV) and Machine Learning (ML). You will get a chance to work on greenfield, cutting-edge and large-scale engineering and big-data challenges, and a rare opportunity to be one of the founders of the Israel Prime Video Sports tech team in Israel. Key job responsibilities - Design and deliver big data architectures for experimental and production consumption between scientists and software engineering. - Develop the end-to-end automation of data pipelines, making datasets readily-consumable by science and engineering teams. - Create automated alarming and dashboards to monitor data integrity. - Create and manage capacity and performance plans. - Act as the subject matter expert for the data structure and usage.