two men laughing about what's on a laptop screen in an office above the city with glass walls.

Serious technological advancements have been made by Microsoft, as the tech giant introduces a new advanced Visual Language (VL) system to its plethora of pre-existing software. VinVL revolutionises the way in which AI software reads images and translates their content into text; in other words, identifying an image’s content and accurately captioning it. The development of VinVL is but one way in which the field of Artificial Intelligence has rapidly progressed in leaps and bounds.

What Makes VinVL Special?

Andy Readman, the Principal Cloud Architect at Wirehive, has noted that what makes VinVL particularly exciting is that the software has recently outperformed humans at image reading and captioning. Andy continued: ‘when creating captions, there are online systems that automatically label systems using VL. VinVL is able to recognise a greater number of images and can create more detailed and accurate captions.’ By combining VinVL with state-of-the-art VL fusion models like OSCAR and VIVO, Microsoft has claimed the top spot in several AI benchmarks as of the 31st of December 2020. Such benchmarks include the likes of Visual Question Answering (VQA), Microsoft COCO Image Captioning and Novel Object Captioning (nopcaps).

How is VinVL Expected to Impact Lives and Businesses?

The expected social and industrial benefits of VinVL are significant. The software is expected to be particularly beneficial to the visually impaired community. Andy points out that ‘companies will be able to automate their image captioning, affording their audience greater accessibility to the images that are shared.’ By making consistent image alt-tagging accessible to businesses across entire industries, a visually impaired individual will be able to access accurate image descriptions of consistent quality.

VinVL also holds potential to streamline the process of linking texts and images together for businesses. Currently, big data, advanced analytics and machine learning all require text to operate- they cannot process images. VinVL would allow these systems to accurately take into account information shared within an image. The exciting new VL system opens up a whole world of possibility.

What does VinVL Mean for CSP Resellers?

Before long, the benefits associated with VinVL will be made available to all through Azure. Once the advanced VL system is made accessible, CSP partners and resellers will be able to pass it onto customers. In doing so, partners and resellers will play a key role in positively impacting the wider community as countless individuals benefit from Cognitive Service APIs and advanced image captioning. VinVL is set to majorly benefit entire industries, and make the world a more accessible place for the visually impaired community.


Sources: https://www.itpro.co.uk/cloud/microsoft-azure/358422/microsofts-new-vision-language-model-outranks-humans-at-image