Protege: Prompt-based diverse question generation from web articles

2023
Download Copy BibTeX
Copy BibTeX
Rich and diverse knowledge-bases (KB) are foundational building blocks for online knowledge-sharing communities such as StackOverflow and Quora and applications such as conversational assistants (aka chatbots). A popular format for knowledge bases is question-answer pairs (or FAQs), where questions are designed to accurately match a multitude of queries. In this paper, we address the problem of automatic creation of such Q&A-based knowledge bases from domain-specific, long-form textual content (e.g., web articles). Specifically, we consider the problem of question generation, which is the task of generating questions given a paragraph of text as input, with a goal to achieve both diversity and fidelity of the generated questions. Towards this goal we propose Protege, a diverse question generation framework which consists of (1) a novel encoder-decoder-based Large Language Model (LLM) architecture which can take a variety of prompts and generate a diverse set of candidate questions, and (2) a hill-climbing algorithm that maximizes a sub-modular objective function to balance diversity with fidelity. Through our experiments on three popular public Q&A datasets, we demonstrate that Protege improves diversity by +16% and fidelity by +8% over diverse beam search and prompt-based baselines.
Research areas

Latest news

IN, TN, Chennai
DESCRIPTION The Digital Acceleration (DA) team in India is seeking a talented, self-driven Applied Scientist to work on prototyping, optimizing, and deploying ML algorithms for solving Digital businesses problems. Key job responsibilities - Research, experiment and build Proof Of Concepts advancing the state of the art in AI & ML. - Collaborate with cross-functional teams to architect and execute technically rigorous AI projects. - Thrive in dynamic environments, adapting quickly to evolving technical requirements and deadlines. - Engage in effective technical communication (written & spoken) with coordination across teams. - Conduct thorough documentation of algorithms, methodologies, and findings for transparency and reproducibility. - Publish research papers in internal and external venues of repute - Support on-call activities for critical issues 4:14 BASIC QUALIFICATIONS - Experience building machine learning models or developing algorithms for business application - PhD, or a Master's degree and experience in CS, CE, ML or related field - Knowledge of programming languages such as C/C++, Python, Java or Perl - Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing - Proficiency in coding and software development, with a strong focus on machine learning frameworks. - Understanding of relevant statistical measures such as confidence intervals, significance of error measurements, development and evaluation data sets, etc. - Excellent communication skills (written & spoken) and ability to collaborate effectively in a distributed, cross-functional team setting. 4:14 PREFERRED QUALIFICATIONS - 3+ years of building machine learning models or developing algorithms for business application experience - Have publications at top-tier peer-reviewed conferences or journals - Track record of diving into data to discover hidden patterns and conducting error/deviation analysis - Ability to develop experimental and analytic plans for data modeling processes, use of strong baselines, ability to accurately determine cause and effect relations - Exceptional level of organization and strong attention to detail - Comfortable working in a fast paced, highly collaborative, dynamic work environment We are open to hiring candidates to work out of one of the following locations: Chennai, TN, IND
US, VA, Arlington
Machine learning (ML) has been strategic to Amazon from the early years. We are pioneers in areas such as recommendation engines, product search, eCommerce fraud detection, and large-scale optimization of fulfillment center operations. The Generative AI team helps AWS customers accelerate the use of Generative AI to solve business and operational challenges and promote innovation in their organization. As an applied scientist, you are proficient in designing and developing advanced ML models to solve diverse challenges and opportunities. You will be working with terabytes of text, images, and other types of data to solve real-world problems. You'll design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. We’re looking for talented scientists capable of applying ML algorithms and cutting-edge deep learning (DL) and reinforcement learning approaches to areas such as drug discovery, customer segmentation, fraud prevention, capacity planning, predictive maintenance, pricing optimization, call center analytics, player pose estimation, event detection, and virtual assistant among others. AWS Sales, Marketing, and Global Services (SMGS) is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector. The AWS Global Support team interacts with leading companies and believes that world-class support is critical to customer success. AWS Support also partners with a global list of customers that are building mission-critical applications on top of AWS services. Key job responsibilities The primary responsibilities of this role are to: - Design, develop, and evaluate innovative ML models to solve diverse challenges and opportunities across industries - Interact with customer directly to understand their business problems, and help them with defining and implementing scalable Generative AI solutions to solve them - Work closely with account teams, research scientist teams, and product engineering teams to drive model implementations and new solution About the team About AWS Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Atlanta, GA, USA | Austin, TX, USA | Houston, TX, USA | New York, NJ, USA | New York, NY, USA | San Francisco, CA, USA | Santa Clara, CA, USA | Seattle, WA, USA
US, WA, Seattle
Prime Video offers customers a vast collection of movies, series, and sports—all available to watch on hundreds of compatible devices. U.S. Prime members can also subscribe to 100+ channels including Max, discovery+, Paramount+ with SHOWTIME, BET+, MGM+, ViX+, PBS KIDS, NBA League Pass, MLB.TV, and STARZ with no extra apps to download, and no cable required. Prime Video is just one of the savings, convenience, and entertainment benefits included in a Prime membership. More than 200 million Prime members in 25 countries around the world enjoy access to Amazon’s enormous selection, exceptional value, and fast delivery. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities As a Data Scientist at Amazon Prime Video, you will work with massive customer datasets, provide guidance to product teams on metrics of success, and influence feature launch decisions through statistical analysis of the outcomes of A/B experiments. You will develop machine learning models to facilitate understanding of customer's streaming behavior and build predictive models to inform personalization and ranking systems. You will work closely other scientists, economists and engineers to research new ways to improve operational efficiency of deployed models and metrics. A successful candidate will have a strong proven expertise in statistical modeling, machine learning, and experiment design, along with a solid practical understanding of strength and weakness of various scientific approaches. They have excellent communication skills, and can effectively communicate complex technical concepts with a range of technical and non-technical audience. They will be agile and capable of adapting to a fast-paced environment. They have an excellent track-record on delivering impactful projects, simplifying their approaches where necessary. A successful data scientist will own end-to-end team goals, operates with autonomy and strive to meet key deliverables in a timely manner, and with high quality. About the team Prime Video discovery science is a central team which defines customer and business success metrics, models, heuristics and econometric frameworks. The team develops, owns and operates a suite of data science and machine learning models that feed into online systems that are responsible for personalization and search relevance. The team is responsible for Prime Video’s experimentation practice and continuously innovates and upskills teams across the organization on science best practices. The team values diversity, collaboration and learning, and is excited to welcome a new member whose passion and creativity will help the team continue innovating and enhancing customer experience. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, CA, Palo Alto
We’re working to improve shopping on Amazon using the conversational capabilities of large language models, and are searching for pioneers who are passionate about technology, innovation, and customer experience, and are ready to make a lasting impact on the industry. You’ll be working with talented scientists, engineers, and technical program managers (TPM) to innovate on behalf of our customers. If you’re fired up about being part of a dynamic, driven team, then this is your moment to join us on this exciting journey! We are open to hiring candidates to work out of one of the following locations: Palo Alto, CA, USA
US, NJ, Newark
Employer: Audible, Inc. Title: Data Scientist II Location: 1 Washington Street, Newark, NJ, 07102 Duties: Design and implement scalable and reliable approaches to support or automate decision making throughout the business. Apply a range of data science techniques and tools combined with subject matter expertise to solve difficult business problems and cases in which the solution approach is unclear. Acquire data by building the necessary SQL/ETL queries. Import processes through various company specific interfaces for accessing RedShift, and S3/edX storage systems. Build relationships with stakeholders and counterparts, and communicate model outputs, observations, and key performance indicators (KPIs) to the management to develop sustainable and consumable products. Explore and analyze data by inspecting univariate distributions and multivariate interactions, constructing appropriate transformations, and tracking down the source and meaning of anomalies. Build production-ready models using statistical modeling, mathematical modeling, econometric modeling, machine learning algorithms, network modeling, social network modeling, natural language processing, or genetic algorithms. Validate models against alternative approaches, expected and observed outcome, and other business defined key performance indicators. Implement models that comply with evaluations of the computational demands, accuracy, and reliability of the relevant ETL processes at various stages of production. Position reports into Newark, NJ office; however, telecommuting from a home office may be allowed. Requirements: Requires a Master’s in Statistics, Computer Science, Data Science, Machine Learning, Applied Math, Operations Research, Economics, or a related field plus two (2) years of Data Scientist or other occupation/position/job title with research or work experience related to data processing and predictive Machine Learning modeling at scale. Experience may be gained concurrently and must include: Two (2) years in each of the following: - Building statistical models and machine learning models using large datasets from multiple resources - Non-linear models including Neural Nets or Deep Learning, and Gradient Boosting - Applying specialized modelling software including Python, R, SAS, MATLAB, or Stata. One (1) year in the following: - Using database technologies including SQL or ETL. Alternatively, will accept a Bachelor's and five (5) years of experience. Salary: $163,238 - $178,400/year. Multiple positions. Apply online: www.amazon.jobs Job Code: ADBL135. We are open to hiring candidates to work out of one of the following locations: Newark, NJ, USA
CN, 11, Beijing
Amazon Search JP builds features powering product search on the Amazon JP shopping site and expands the innovations to world wide. As an Applied Scientist on this growing team, you will take on a key role in improving the NLP and ranking capabilities of the Amazon product search service. Our ultimate goal is to help customers find the products they are searching for, and discover new products they would be interested in. We do so by developing NLP components that cover a wide range of languages and systems. As an Applied Scientist for Search JP, you will design, implement and deliver search features on Amazon site, helping millions of customers every day to find quickly what they are looking for. You will propose innovation in NLP and IR to build ML models trained on terabytes of product and traffic data, which are evaluated using both offline metrics as well as online metrics from A/B testing. You will then integrate these models into the production search engine that serves customers, closing the loop through data, modeling, application, and customer feedback. The chosen approaches for model architecture will balance business-defined performance metrics with the needs of millisecond response times. Key job responsibilities - Designing and implementing new features and machine learned models, including the application of state-of-art deep learning to solve search matching, ranking and Search suggestion problems. - Analyzing data and metrics relevant to the search experiences. - Working with teams worldwide on global projects. Your benefits include: - Working on a high-impact, high-visibility product, with your work improving the experience of millions of customers - The opportunity to use (and innovate) state-of-the-art ML methods to solve real-world problems with tangible customer impact - Being part of a growing team where you can influence the team's mission, direction, and how we achieve our goals We are open to hiring candidates to work out of one of the following locations: Beijing, 11, CHN | Shanghai, 31, CHN
DE, BE, Berlin
The Artificial General Intelligence (AGI) team is looking for a Senior Applied Scientist with background in Large Language Model, Natural Language Process and Machine/Deep learning. You will be work with a team of applied and research scientists to bring all existing Alexa features and beyond to LLM empowered Alexa. You will interact in a cross-functional capacity with science, product and engineering leaders. Key job responsibilities • Conducting research leading to improved Alexa AI systems • Communicating effectively with leadership team as well as with colleagues from science, engineering and business backgrounds. • Providing research directions and mentoring junior researchers. We are open to hiring candidates to work out of one of the following locations: Berlin, BE, DEU
US, MA, North Reading
Working at Amazon Robotics Are you inspired by invention? Is problem solving through teamwork in your DNA? Do you like the idea of seeing how your work impacts the bigger picture? Answer yes to any of these and you’ll fit right in here at Amazon Robotics. We are a smart, collaborative team of doers that work passionately to apply cutting-edge advances in robotics and software to solve real-world challenges that will transform our customers’ experiences in ways we can’t even imagine yet. We invent new improvements every day. We are Amazon Robotics and we will give you the tools and support you need to invent with us in ways that are rewarding, fulfilling and fun. Position Overview The Amazon Robotics (AR) Software Research and Science team builds and runs simulation experiments and delivers analyses that are central to understanding the performance of the entire AR system. This includes operational and software scaling characteristics, bottlenecks, and robustness to “chaos monkey” stresses -- we inform critical engineering and business decisions about Amazon’s approach to robotic fulfillment. We are seeking an enthusiastic Data Scientist to design and implement state-of-the-art solutions for never-before-solved problems. The DS will collaborate closely with other research and robotics experts to design and run experiments, research new algorithms, and find new ways to improve Amazon Robotics analytics to optimize the Customer experience. They will partner with technology and product leaders to solve business problems using scientific approaches. They will build new tools and invent business insights that surprise and delight our customers. They will work to quantify system performance at scale, and to expand the breadth and depth of our analysis to increase the ability of software components and warehouse processes. They will work to evolve our library of key performance indicators and construct experiments that efficiently root cause emergent behaviors. They will engage with software development teams and warehouse design engineers to drive the evolution of the AR system, as well as the simulation engine that supports our work. Inclusive Team Culture Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have 12 affinity groups (employee resource groups) with more than 87,000 employees across hundreds of chapters around the world. We have innovative benefit offerings and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which reminds team members to seek diverse perspectives, learn and be curious, and earn trust. Flexibility It isn’t about which hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We offer flexibility and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth We care about your career growth too. Whether your goals are to explore new technologies, take on bigger opportunities, or get to the next level, we'll help you get there. Our business is growing fast and our people will grow with it. A day in the life Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! We are open to hiring candidates to work out of one of the following locations: North Reading, MA, USA
US, MA, Boston
The Artificial General Intelligence (AGI) - Automations team is developing AI technologies to automate workflows, processes for browser automation, developers and ops teams. As part of this, we are developing services and inference engine for these automation agents, and techniques for reasoning, planning, and modeling workflows. If you are interested in a startup mode team in Amazon to build the next level of agents then come join us. Scientists in AGI - Automations will develop cutting edge multimodal LLMs to observe, model and derive insights from manual workflows to automate them. You will get to work in a joint scrum with engineers for rapid invention, develop cutting edge automation agent systems, and take them to launch for millions of customers. Key job responsibilities - Build automation agents by developing novel multimodal LLMs. A day in the life An Applied Scientist with the AGI team will support the science solution design, run experiments, research new algorithms, and find new ways of optimizing the customer experience.; while setting examples for the team on good science practice and standards. Besides theoretical analysis and innovation, an Applied Scientist will also work closely with talented engineers and scientists to put algorithms and models into practice. We are open to hiring candidates to work out of one of the following locations: Boston, MA, USA
US, WA, Seattle
Are you motivated to explore research in ambiguous spaces? Are you interested in conducting research that will improve the employee and manager experience at Amazon? Do you want to work on an interdisciplinary team of scientists that collaborate rather than compete? Join us at PXT Central Science! The People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. We are seeking a senior Applied Scientist with expertise in more than one or more of the following areas: machine learning, natural language processing, computational linguistics, algorithmic fairness, statistical inference, causal modeling, reinforcement learning, Bayesian methods, predictive analytics, decision theory, recommender systems, deep learning, time series modeling. In this role, you will lead and support research efforts within all aspects of the employee lifecycle: from candidate identification to recruiting, to onboarding and talent management, to leadership and development, to finally retention and brand advocacy upon exit. The ideal candidate should have strong problem-solving skills, excellent business acumen, the ability to work independently and collaboratively, and have an expertise in both science and engineering. The ideal candidate is not methods-driven, but driven by the research question at hand; in other words, they will select the appropriate method for the problem, rather than searching for questions to answer with a preferred method. The candidate will need to navigate complex and ambiguous business challenges by asking the right questions, understanding what methodologies to employ, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). About the team We are a collegial and multidisciplinary team of researchers in People eXperience and Technology (PXT) that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer. We leverage data and rigorous analysis to help Amazon attract, retain, and develop one of the world’s largest and most talented workforces. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Austin, TX, USA | Chicago, IL, USA | New York, NY, USA | Seattle, WA, USA | Sunnyvale, CA, USA