-
IEEE Transactions on Pattern Analysis and Machine Intelligence2023The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm. Relying on the tokenization process that splits inputs into multiple tokens, transformers are capable of extracting their pairwise relationships using self-attention. While being the stemming building block of transformers, what makes for a good tokenizer
-
2023Ensuring the overall end-user experience is a challenging task in arbitrary style transfer (AST) due to the subjective nature of style transfer quality. A good practice is to provide users many instead of one AST result. However, existing approaches require to run multiple AST models or inference a diversified AST (DAST) solution multiple times, and thus they are either slow in speed or limited in diversity
-
DCC 20232023Incorporating neural networks into a video codec as an in-loop filter has been shown to provide significant improvements in coding efficiency. Unfortunately, the computational complexity associated with the neural network, specifically the number of multiply-accumulate (MAC) operations, makes these approaches intractable in practice. In this paper, we consider using a multiscale approach to reduce complexity
-
Data augmentation is a necessity to enhance data efficiency in deep learning. For vision-language pre-training, data is only augmented either for images or for text in previous works. In this paper, we present MixGen: a joint data augmentation for vision-language representation learning to further improve data efficiency. It generates new imagetext pairs with semantic relationships preserved by interpolating
-
Video quality assessment (VQA) has sparked a lot of interest in the computer vision community, as it plays a critical role in services that provide customers with high quality video content. Due to the lack of high quality reference videos and the difficulties in collecting subjective evaluations, assessing video quality is a challenging and still unsolved problem. Moreover, most of the public research
Related content
-
February 14, 2023A diversity of outputs ensures that style transfer model can satisfy any user’s tastes.
-
January 05, 2023How an AWS customer uses Lookout for Vision to build custom computer vision models to automate quality inspection and detect defects.
-
January 04, 2023As video scales up — in both duration and resolution — it raises new research questions.
-
January 03, 2023Automated methods with a little human guidance use annotators’ time much more efficiently.
-
December 26, 2022Combining contrastive training and selection of hard negative examples establishes new benchmarks.
-
December 16, 2022University of Wisconsin-Madison associate professor and ARA recipient has authored a series of pioneering papers on real-time object instance segmentation.
-
December 09, 2022Why multimodal identification is a crucial step in automating item identification at Amazon scale.
-
November 22, 2022Francesco Locatello on the four NeurIPS papers he coauthored this year, which largely concern generalization to out-of-distribution test data.
-
November 15, 2022Models that map spoken language to objects in an image would make it easier for customers to communicate with multimodal devices.
-
November 10, 2022New approach can cut the setup time required to develop vision-based machine learning solutions from between six to twelve months to one or two.
-
November 08, 2022Eliminating the need for annotation makes bias testing much more practical.
-
November 01, 2022By plotting nonlinear trajectories through a GAN’s latent space, the method enables certain image attributes to vary while others are held fixed.
-
October 27, 2022A model that estimates depth from 2-D images learns to adjust to differences between images produced by different cameras, reducing error by about 20%.
-
October 26, 2022Research topics range from visual anomaly detection to road network extraction, regression-constrained neural-architecture search to self-supervised learning for video representations.
-
October 24, 2022Company is testing a new class of robots that use artificial intelligence and computer vision to move freely throughout facilities.
-
October 05, 2022Inaugural recipients named as part of the JHU + Amazon Initiative for Interactive AI (AI2AI).
-
September 19, 2022Why detecting damage is so tricky at Amazon’s scale — and how researchers are training robots to help with that gargantuan task.
-
August 16, 2022ARA recipient is using artificial intelligence to help doctors make decisions based on radiological data.
-
July 20, 2022Violetta Shevchenko, an Amazon applied scientist and former intern, combines vision and language to create solutions to challenging problems.
-
July 01, 2022Two methods presented at CVPR achieve state-of-the-art results by imposing additional structure on the representational space.
-
June 24, 2022Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.