Many businesses aspire to expand globally to reach new customer and accelerate growth. For Bridgeman Images, this meant engaging customers who spoke languages other than English. They needed a scalable solution to overcoming the language barrier since having everything translated manually wasn’t fast enough or cost efficient. Using Amazon Translate, they reduced the time needed to localize content from several months down to a few weeks, translating 570 million English characters into Italian, French, German, and Spanish.
Bridgeman Images is a rights-managed image licensing company that has nearly three million active assets in its archive. To be easily searchable on their site, each of these assets has a title, a description, and a set of keywords/mediums that they index into the Amazon Elasticsearch Service (Amazon ES). Their research showed that between 20 and 30 percent of customers aggregated across all platforms required the image data to appear in a language other than English—either Italian, French, German, or Spanish. Therefore, they decided to provide translations for all of their metadata to provide the best possible experience for their customers.
Bridgeman Images researched a number of different options and decided that machine translations would provide the best overall value for their business. When preparing for the new translations, they took the opportunity to overhaul their internal metadata structures and implement a robust workflow that would minimize duplication and save on translation costs.
First they updated their keyword system. It was originally created as a flat data structure with semi-colon delimited records. They de-duplicated these entries and created a relational structure that would allow multiple assets to share the same keyword alongside its translations. The keywords are stored on an Amazon RDS MySQL instance and are updated into Amazon Elasticsearch Service index whenever a change is triggered to a keyword or a new one is entered into the system.
To handle the translations of their keywords (and other data), their next task was to create a simple wrapper for the Amazon Translate service using Python, Boto3, and the Flask API deployed with Zappa onto AWS Lambda.
They then designed a trigger so that any time a new keyword was added to their system, a task was put into a queue to their RabbitMQ cluster, which would in turn call a worker to query an AWS Lambda function to grab the translation from Amazon Translate.
Next, they needed to bulk translate nearly 700 million characters of data, which consisted of their titles and descriptions, into four different languages. Some of the source metadata is in more than one language so they extended the Lambda translation function to detect the original language using Amazon Comprehend.
To efficiently process and translate this large volume of data, Bridgeman Images relied on a RabbitMQ cluster hosted on AWS and an AWS Auto Scaling stack of Amazon EC2 instances that ran worker listeners inside Docker containers deployed with AWS Elastic Beanstalk. This setup allowed them to process nearly 14,000 assets per hour, with each asset averaging approximately 100-300 characters per translation.
“We translated roughly 570 million characters per language in the aggregate span of about 15 days. The time saving was significant – likely on the magnitude of months vs a couple of weeks to build and easily integrate with our existing technology infrastructure that AWS provides. The development cycle was super short especially refactoring as it took one developer a week to deliver it and we didn’t need to pile resources or re-skill our developers” said Sean Chambers, IT Director of Bridgeman Images.
Finally, to support ongoing translations, Bridgeman Images designed a newly structured cataloguing interface where their team could input metadata. They simply enter the source language (English, for example) and let the system provide automatic translations for Italian, German, French, and Spanish. These are put into a queue similar to the queue for their keyword triggers. They are updated on a regular basis into an Amazon Elasticsearch Service index so that they become searchable.
Here’s a simple architecture that shows how Bridgeman Images uses Amazon Translate to provide real-time translation for their customers.
“For me one of the reasons for choosing Amazon Translate was cost – 40 percent less than the other competitor we were considering,” says Sean Chambers, IT Director of Bridgeman Images.
Here’s a sneak peek at the Bridgeman Images site in action:
About the Author
Shafreen Sayyed is an AWS Solutions Architect based in London. She helps customers across the UK and Ireland, supporting various industry verticals to transform their businesses and build industry-leading cloud solutions. She has a special interest in Machine Learning and Artificial Intelligence and is passionate about finding ways to help our customers integrate these new and exciting technologies into all aspects of their business.