How to teach Transformers to care about word order

New position encoding scheme improves state-of-the-art performance on several natural-language-processing tasks.

The Transformer is a neural-network architecture that has proven extremely useful for natural-language-processing tasks because it can recognize long-range dependencies. It could, for instance, recognize that in a sentence that includes the word “rented”, the word “flat” is more likely to mean “apartment” than it would be otherwise, even if “rented” is the second word in the sentence and “flat” the 10th.

In its most basic form, the Transformer is indifferent to word order. It can recognize the relationship between “rented” and “flat”, but it doesn’t care which comes first.

Word order, however, can make a big difference to meaning. Consider, for instance, the sentences “We rented a small but clean, well-equipped two-bed flat” and “We rented a small but clean, well-equipped flat-bed truck”.

Position embedding.png
These images map 252 words of an input text sequence (y-axis) against the 512 latent position features identified by two different position-encoding schemes. Lighter colors indicate higher values for features, darker colors lower values. FLOATER (bottom) produces a more regular encoding than an earlier scheme (top), which also learns its position feature set from training data. The vertical lines toward the bottom of the top visualization indicate that the encoding model is simply using the same encoding for input sequences longer than those it saw during training, while FLOATER’s smooth gradation from light to dark demonstrates that its encoding generalizes easily to longer sequences.

Starting with the paper that introduced the Transformer, researchers have proposed a series of position encoders that inject word-order information into the Transformer model. But last week, at the International Conference on Machine Learning, we presented a new position encoder that enables better performance than its predecessors on a range of natural-language-processing (NLP) tasks.

We designed our position encoder so that it can be integrated into existing Transformer models, conferring its benefits to NLP systems that already have been trained extensively on large data sets.

Before the Transformer was introduced in 2017, the most popular architecture for NLP was the long short-term memory, or LSTM. LSTMs process sequenced inputs in order, and each output reflects both the inputs and the outputs that preceded it.

LSTMs are very good at inferring local relationships — a word’s relationships, both syntactic and semantic, with the two or three words that immediately precede it — but they’re not as good at modeling long-range dependencies. That’s where the Transformer excels.

Position encodings are an attempt to achieve the best of both worlds: an awareness of long-range dependencies and a sensitivity to local word order. The ideal position encoding should have three properties:

  1. It should be able to handle sequences of arbitrary length; that is, it shouldn’t be locked in to some maximum sequence length.
  2. It should be learnable from training data; different encodings may work better for different tasks.
  3. It should be efficient; adding position encoding shouldn’t unreasonably inflate the size of the neural model.

Past position encoding schemes have met at best two of these criteria. For instance, the original Transformer paper proposed an encoding based on a family of sinusoidal functions; that encoding remains popular, but it is not learnable.

Our scheme, which we call FLOATER, is the first to meet all three criteria.

The naïve way to encode position would be simply to assign successive numbers to successive words in an input sequence. But this has drawbacks in a machine learning context. If at runtime the model sees a sequence of a length it did not encounter during training, it will be flummoxed about how to proceed.

So most position encoding schemes instead use position vectors, which carry information that can be used to deduce the relative positions of two inputs. If those schemes are fully learnable, however, they tend to inflate the model size; or, to keep model inflation under control, they limit the distances across which relative position can be compared.

Functional approach

Instead of learning to directly compute a position vector from each word in an input sequence, FLOATER learns a function that computes each word’s position vector from that of the word that preceded it.

Learning a general function rather than direct mappings makes FLOATER much more space efficient than other learnable encoding schemes. But a general function can also be applied to any word in a sequence, regardless of its position, so FLOATER is indifferent to sequence length.

Any given manually engineered position function — such as the sinusoidal functions proposed in the original Transformer paper — can be thought of as a special case of the general FLOATER function. So in a pretrained network, we can simply substitute FLOATER for any such function and then fine-tune it on a small set of training data.

Past work on position encoding has shown that re-encoding position information at every layer of a Transformer network improves performance on NLP tasks. If we allowed FLOATER to learn a different function for every layer, the model size would again begin to inflate.

So instead, we learn a single function that is applied at every layer. This results in different position encodings at each layer, however, because the inputs are different. Our experiments indicate that this approach strikes a good balance between model size and performance improvements.

In one set of experiments, we compared our position encoder to its two leading predecessors on four different machine translation tasks and found that it delivered the best results across the board.

In another set of experiments, we added our position encoder to Transformer models that had previously been trained on three different language-understanding and question-answering tasks.

Of 23 distinct tasks, the addition of our position encoder improved performance on 21. The two on which its performance fell slightly short were low-data versions of tasks on which, with larger sets of training data, it improved performance.

Related content

US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning to help Amazon provide the best experience to our Selling Partners by automatically understanding and addressing their challenges, needs and opportunities? Do you want to build advanced algorithmic systems that are powered by state-of-art ML, such as Natural Language Processing, Large Language Models, Deep Learning, Computer Vision and Causal Modeling, to seamlessly engage with Sellers? Are you excited by the prospect of analyzing and modeling terabytes of data and creating cutting edge algorithms to solve real world problems? Do you like to build end-to-end business solutions and directly impact the profitability of the company and experience of our customers? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Selling Partner Experience Science team. Key job responsibilities Use statistical and machine learning techniques to create the next generation of the tools that empower Amazon's Selling Partners to succeed. Design, develop and deploy highly innovative models to interact with Sellers and delight them with solutions. Work closely with teams of scientists and software engineers to drive real-time model implementations and deliver novel and highly impactful features. Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation. Research and implement novel machine learning and statistical approaches. Lead strategic initiatives to employ the most recent advances in ML in a fast-paced, experimental environment. Drive the vision and roadmap for how ML can continually improve Selling Partner experience. About the team Selling Partner Experience Science (SPeXSci) is a growing team of scientists, engineers and product leaders engaged in the research and development of the next generation of ML-driven technology to empower Amazon's Selling Partners to succeed. We draw from many science domains, from Natural Language Processing to Computer Vision to Optimization to Economics, to create solutions that seamlessly and automatically engage with Sellers, solve their problems, and help them grow. Focused on collaboration, innovation and strategic impact, we work closely with other science and technology teams, product and operations organizations, and with senior leadership, to transform the Selling Partner experience.
US, WA, Seattle
The AWS AI Labs team has a world-leading team of researchers and academics, and we are looking for world-class colleagues to join us and make the AI revolution happen. Our team of scientists have developed the algorithms and models that power AWS computer vision services such as Amazon Rekognition and Amazon Textract. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. AWS is the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems which will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. Our research themes include, but are not limited to: few-shot learning, transfer learning, unsupervised and semi-supervised methods, active learning and semi-automated data annotation, large scale image and video detection and recognition, face detection and recognition, OCR and scene text recognition, document understanding, 3D scene and layout understanding, and geometric computer vision. For this role, we are looking for scientist who have experience working in the intersection of vision and language. We are located in Seattle, Pasadena, Palo Alto (USA) and in Haifa and Tel Aviv (Israel).
RO, Iasi
Amazon’s mission is to be earth’s most customer-centric company and our team is the guardian of our customer’s privacy. Amazon SDO Privacy engineering operates in Austin – TX, US and Iasi, Bucharest – Romania. Our mission is to develop services which will enable every Amazon service operating with personal data to satisfy the privacy rights of Amazon customers. We are working backwards from the customers and world-wide privacy regulations, think long term, and propose solutions which will assure Amazon Privacy compliance. Our external customers are world-wide customers of Amazon Retail Website, Amazon B2B services (e.g. Seller central, App / Skill Developers), and Amazon Subsidiaries. Our internal customers are services within Amazon who operate with personal data, Legal Representatives, and Customer Service Agents. You can opt-in for being part of one of the existing or newly formed engineering teams who will contribute to Amazon mission to meet external customers’ privacy rights: Personal Data Classification, The Right to be forgotten, The right of access, or Digital Markets Act – The Right of Portability. The ideal candidate has a great passion for data and an insatiable desire to learn and innovate. A commitment to team work, hustle and strong communication skills (to both business and technical partners) are absolute requirements. Creating reliable, scalable, and high-performance products requires a sound understanding of the fundamentals of Computer Science and practical experience building large-scale distributed systems. Your solutions will apply to all of Amazon’s consumer and digital businesses including but not limited to Amazon.com, Alexa, Kindle, Amazon Go, Prime Video and more. Key job responsibilities As an data scientist on our team, you will apply the appropriate technologies and best practices to autonomously solve difficult problems. You'll contribute to the science solution design, run experiments, research new algorithms, and find new ways of optimizing customer experience. Besides theoretical analysis and innovation, you will work closely with talented engineers and ML scientists to put your algorithms and models into practice. You will collaborate with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, policy, and model development. Your work will directly impact the trust customers place in Amazon Privacy, globally.
JP, 13, Tokyo
The JP Economics team is a central science team working across a variety of topics in the JP Retail business and beyond. We work closely with JP business leaders to drive change at Amazon. We focus on solving long-term, ambiguous and challenging problems, while providing advisory support to help solve short-term business pain points. Key topics include pricing, product selection, delivery speed, profitability, and customer experience. We tackle these issues by building novel economic/econometric models, machine learning systems, and high-impact experiments which we integrate into business, financial, and system-level decision making. Our work is highly collaborative and we regularly partner with JP- EU- and US-based interdisciplinary teams. In this role, you will build ground-breaking, state-of-the-art causal inference models to guide multi-billion-dollar investment decisions around the global Amazon marketplaces. You will own, execute, and expand a research roadmap that connects science, business, and engineering and contributes to Amazon's long term success. As one of the first economists outside North America/EU, you will make an outsized impact to our international marketplaces and pioneer in expanding Amazon’s economist community in Asia. The ideal candidate will be an experienced economist in empirical industrial organization, labour economics, econometrics, or related structural/reduced-form causal inference fields. You are a self-starter who enjoys ambiguity in a fast-paced and ever-changing environment. You think big on the next game-changing opportunity but also dive deep into every detail that matters. You insist on the highest standards and are consistent in delivering results. Key job responsibilities Work with Product, Finance, Data Science, and Data Engineering teams across the globe to deliver data-driven insights and products for regional and world-wide launches. Innovate on how Amazon can leverage data analytics to better serve our customers through selection and pricing. Contribute to building a strong data science community in Amazon Asia.
GB, London
Are you excited about applying economic models and methods using large data sets to solve real world business problems? Then join the Economic Decision Science (EDS) team. EDS is an economic science team based in the EU Stores business. The teams goal is to optimize and automate business decision making in the EU business and beyond. An internship at Amazon is an opportunity to work with leading economic researchers on influencing needle-moving business decisions using incomparable datasets and tools. It is an opportunity for PhD students and recent PhD graduates in Economics or related fields. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics, as well as basic familiarity with Stata, R, or Python is necessary. Experience with SQL would be a plus. As an Economics Intern, you will be working in a fast-paced, cross-disciplinary team of researchers who are pioneers in the field. You will take on complex problems, and work on solutions that either leverage existing academic and industrial research, or utilize your own out-of-the-box pragmatic thinking. In addition to coming up with novel solutions and prototypes, you may even need to deliver these to production in customer facing products. Roughly 85% of previous intern cohorts have converted to full time economics employment at Amazon.
US, CA, Cupertino
We're looking for an Applied Scientist to help us secure Amazon's most critical data. In this role, you'll work closely with internal security teams to design and build AR-powered systems that protect our customers' data. You will build on top of existing formal verification tools developed by AWS and develop new methods to apply those tools at scale. You will need to be innovative, entrepreneurial, and adaptable. We move fast, experiment, iterate and then scale quickly, thoughtfully balancing speed and quality. Inclusive Team Culture Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future. Key job responsibilities Deeply understand AR techniques for analyzing programs and other systems, and keep up with emerging ideas from the research community. Engage with our customers to develop understanding of their needs. Propose and develop solutions that leverage symbolic reasoning services and concepts from programming languages, theorem proving, formal verification and constraint solving. Implement these solutions as services and work with others to deploy them at scale across Payments and Healthcare. Author papers and present your work internally and externally. Train new teammates, mentor others, participate in recruiting and interviewing, and participate in our tactical and strategic planning. About the team Our small team of applied scientists works within a larger security group, supporting thousands of engineers who are developing Amazon's payments and healthcare services. Security is a rich area for automated reasoning. Most other approaches are quite ad-hoc and take a lot of human effort. AR can help us to reason deliberately and systematically, and the dream of provable security is incredibly compelling. We are working to make this happen at scale. We partner closely with our larger security group and with other automated reasoning teams in AWS that develop core reasoning services.
US, NY, New York
Search Thematic Ad Experience (STAX) team within Sponsored Products is looking for a leader to lead a team of talented applied scientists working on cutting-edge science to innovate on ad experiences for Amazon shoppers!. You will manage a team of scientists, engineers, and PMs to innovate new widgets on Amazon Search page to improve shopper experience using state-of-the-art NLP and computer vision models. You will be leading some industry first experiences that has the potential to revolutionize how shopping looks and feels like on Amazon, and e-commerce marketplaces in general. You will have the opportunity to design the vision on how ad experiences look on Amazon search page, and use the combination of advanced techniques and continuous experimentation to realize this vision. Your work will be core to Amazon’s advertising business. You will be a significant contributor in building the future of sponsored advertising, directly impacting the shopper experience for our hundreds of millions of shoppers worldwide, while delivering significant value for hundreds of thousands of advertisers across the purchase journey with ads on Amazon. Key job responsibilities * Be the technical leader in Machine Learning; lead efforts within the team, and collaborate and influence across the organization. * Be a critic, visionary, and execution leader. Invent and test new product ideas that are powered by science that addresses key product gaps or shopper needs. * Set, plan, and execute on a roadmap that strikes the optimal balance between short term delivery and long term exploration. You will influence what we invest in today and tomorrow. * Evangelize the team’s science innovation within the organization, company, and in key conferences (internal and external). * Be ruthless with prioritization. You will be managing a team which is highly sought after. But not all can be done. Have a deep understanding of the tradeoffs involved and be fierce in prioritizing. * Bring clarity, direction, and guidance to help teams navigate through unsolved problems with the goal to elevate the shopper experience. We work on ambiguous problems and the right approach is often unknown. You will bring your rich experience to help guide the team through these ambiguities, while working with product and engineering in crisply defining the science scope and opportunities. * Have strong product and business acumen to drive both shopper improvements and business outcomes. A day in the life * Lead a multidisciplinary team that embodies “customer obsessed science”: inventing brand new approaches to solve Amazon’s unique problems, and using those inventions in software that affects hundreds of millions of customers * Dive deep into our metrics, ongoing experiments to understand how and why they are benefitting our shoppers (or not) * Design, prototype and validate new widgets, techniques, and ideas. Take end-to-end ownership of moving from prototype to final implementation. * Be an advocate and expert for STAX science to leaders and stakeholders inside and outside advertising. About the team We are the Search thematic ads experience team within Sponsored products - a fast growing team of customer-obsessed engineers, technologists, product leaders, and scientists. We are focused on continuous exploration of contexts and creatives to drive value for both our customers and advertisers, through continuous innovation. We focus on new ads experiences globally to help shoppers make the most informed purchase decision while helping shortcut the time to discovery that shoppers are highly likely to engage with. We also harvest rich contextual and behavioral signals that are used to optimize our backend models to continually improve the shopper experience. We obsess about our customers and are continuously seeking opportunities to delight them.
US, CA, Palo Alto
Amazon is the 4th most popular site in the US. Our product search engine, one of the most heavily used services in the world, indexes billions of products and serves hundreds of millions of customers world-wide. We are working on a new initiative to transform our search engine into a shopping engine that assists customers with their shopping missions. We look at all aspects of search CX, query understanding, Ranking, Indexing and ask how we can make big step improvements by applying advanced Machine Learning (ML) and Deep Learning (DL) techniques. We’re seeking a thought leader to direct science initiatives for the Search Relevance and Ranking at Amazon. This person will also be a deep learning practitioner/thinker and guide the research in these three areas. They’ll also have the ability to drive cutting edge, product oriented research and should have a notable publication record. This intellectual thought leader will help enhance the science in addition to developing the thinking of our team. This leader will direct and shape the science philosophy, planning and strategy for the team, as we explore multi-modal, multi lingual search through the use of deep learning . We’re seeking an individual that can enhance the science thinking of our team: The org is made of 60+ applied scientists, (2 Principal scientists and 5 Senior ASMs). This person will lead and shape the science philosophy, planning and strategy for the team, as we push into Deep Learning to solve problems like cold start, discovery and personalization in the Search domain. Joining this team, you’ll experience the benefits of working in a dynamic, entrepreneurial environment, while leveraging the resources of Amazon [Earth's most customer-centric internet company]. We provide a highly customer-centric, team-oriented environment in our offices located in Palo Alto, California.
JP, 13, Tokyo
Our mission is to help every vendor drive the most significant impact selling on Amazon. Our team invent, test and launch some of the most innovative services, technology, processes for our global vendors. Our new AVS Professional Services (ProServe) team will go deep with our largest and most sophisticated vendor customers, combining elite client-service skills with cutting edge applied science techniques, backed up by Amazon’s 20+ years of experience in Japan. We start from the customer’s problem and work backwards to apply distinctive results that “only Amazon” can deliver. Amazon is looking for a talented and passionate Applied Science Manager to manage our growing team of Applied Scientists and Business Intelligence Engineers to build world class statistical and machine learning models to be delivered directly to our largest vendors, and working closely with the vendors' senior leaders. The Applied Science Manager will set the strategy for the services to invent, collaborating with the AVS business consultants team to determine customer needs and translating them to a science and development roadmap, and finally coordinating its execution through the team. In this position, you will be part of a larger team touching all areas of science-based development in the vendor domain, not limited to Japan-only products, but collaborating with worldwide science and business leaders. Our current projects touch on the areas of causal inference, reinforcement learning, representation learning, anomaly detection, NLP and forecasting. As the AVS ProServe Applied Science Manager, you will be empowered to expand your scope of influence, and use ProServe as an incubator for products that can be expanded to all Amazon vendors, worldwide. We place strong emphasis on talent growth. As the Applied Science Manager, you will be expected in actively growing future Amazon science leaders, and providing mentoring inside and outside of your team. Key job responsibilities The Applied Science Manager is accountable for: (1) Creating a vision, a strategy, and a roadmap tackling the most challenging business questions from our leading vendors, assess quantitatively their feasibility and entitlement, and scale their scope beyond the ProServe team. (2) Coordinate execution of the roadmap, through direct reports, consisting of scientists and business intelligence engineers. (3) Grow and manage a technical team, actively mentoring, developing, and promoting team members. (4) Work closely with other science managers, program/product managers, and business leadership worldwide to scope new areas of growth, creating new partnerships, and proposing new business initiatives. (5) Act as a technical supervisor, able to assess scientific direction, technical design documents, and steer development efforts to maximize project delivery.
US, NY, New York
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. As a core product offering within our advertising portfolio, Sponsored Products (SP) helps merchants, retail vendors, and brand owners succeed via native advertising, which grows incremental sales of their products sold through Amazon. The SP team's primary goals are to help shoppers discover new products they love, be the most efficient way for advertisers to meet their business objectives, and build a sustainable business that continuously innovates on behalf of customers. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! The Supply team (within Sponsored Products) is looking for an Applied Scientist to join a fast-growing team with the mandate of creating new ad experiences that elevate the shopping experience for hundreds of millions customers worldwide. The Applied Scientist will take end-to-end ownership of driving new product/feature innovation by applying advanced statistical and machine learning models. The role will handle petabytes of unstructured data (images, text, videos) to extract insights into what metadata can be useful for us to highlight to simplify purchase decisions, and propose new experiences that increase shopper engagement. Why you love this opportunity Amazon is investing heavily in building a world-class advertising business. This team is responsible for defining and delivering a collection of advertising products that drive discovery and sales. Our solutions generate billions in revenue and drive long-term growth for Amazon’s Retail and Marketplace businesses. We deliver billions of ad impressions, millions of clicks daily, and break fresh ground to create world-class products. We are highly motivated, collaborative, and fun-loving team with an entrepreneurial spirit - with a broad mandate to experiment and innovate. Key job responsibilities As an Applied Scientist on this team you will: Build machine learning models and perform data analysis to deliver scalable solutions to business problems. Perform hands-on analysis and modeling with very large data sets to develop insights that increase traffic monetization and merchandise sales without compromising shopper experience. Work closely with software engineers on detailed requirements, technical designs and implementation of end-to-end solutions in production. Design and run A/B experiments that affect hundreds of millions of customers, evaluate the impact of your optimizations and communicate your results to various business stakeholders. Work with scientists and economists to model the interaction between organic sales and sponsored content and to further evolve Amazon's marketplace. Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving. Research new predictive learning approaches for the sponsored products business. Write production code to bring models into production. A day in the life You will invent new experiences and influence customer-facing shopping experiences to help suppliers grow their retail business and the auction dynamics that leverage native advertising; this is your opportunity to work within the fastest-growing businesses across all of Amazon! Define a long-term science vision for our advertising business, driven fundamentally from our customers' needs, translating that direction into specific plans for research and applied scientists, as well as engineering and product teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding.