By Mark McQuade, Practice Manager, Data Science & Engineering – Rackspace
By Khobaib Zaamout, Data Scientist – Rackspace
Artificial intelligence (AI) and machine learning (ML) have become widely recognized for their unique capabilities in helping companies utilize data.
With AI and ML, organizations can leverage data for a wide range of use cases, including generating insights that power product and content recommendations and financial forecasting that aid in strategic business planning and growth.
The prowess of these technologies in making sense of data has become a vital asset during the COVID-19 pandemic.
Many organizations have started applying machine learning expertise to scale customer communications and accelerating research.
These use cases extend from utilizing ML-based chatbots to improve communication and availability, to helping industries that have seen disrupted business, such as agriculture, where AI-driven solutions help farmers monitor crop growth and flag potential issues to manage supply.
ML solutions have also been used to scan the internet for global news and outbreak data and predict the likelihood and severity of spread.
Leaders worldwide are using these technologies to guide decision making, from hospital resourcing to quarantine mandate lengths. AI and ML are becoming vital technologies behind understanding, assessing, and navigating our pandemic-afflicted world.
Supporting COVID-19 Response Efforts
Amazon and Amazon Web Services (AWS) have been actively participating in a global effort to fight the COVID-19 pandemic. They offer numerous initiatives such as providing infrastructure capacity and technical support to businesses fighting the virus, to specific AWS services built to allow for research.
One example of a key service AWS offers is the COVID-19 public data lake, which is a public and centralized repository of up-to-date curated datasets on or related to the spread and characteristics of the novel coronavirus.
AWS offered this repository based on their belief that any breakthroughs in combating this virus can be accomplished faster when the data needed for experiments, research, and analysis are publicly available and in one central location.
Onica, an AWS Premier Consulting Partner, has also been actively involved in these efforts, leveraging AWS technologies to help decision makers navigate this pandemic. Onica holds AWS Competencies in Machine Learning, IoT, and many other areas, and is a member of the AWS Managed Service Provider (MSP) Partner Program.
Onica has been assisting customers in utilizing AWS services and toolsets to develop applications that can accelerate and improve our overall understanding of the pandemic and ascertain the best steps to manage these challenges.
In this post, we dive into the technical details of two COVID-19-related solutions that Onica produced and highlight their results and impact.
Plan4 Co: Utilizing Amazon Forecast for Improved COVID-19 Forecasts
Time-series forecasting is one key area of machine learning that helps businesses drive critical insights from their data, involving a time dimension.
Time-series analysis can be useful to see how a given asset or variable changes over time. By learning from historical data, forecasting predicts the future state of a wide range of phenomena, whether it’s churn or demand for a product or service.
Amazon Forecast brings the power of forecasting to the hands of users with no data science expertise. You only need to provide historical data, plus any additional data you believe may impact your forecasts. Amazon Forecast uses ML to combine time-series data with additional variables to build forecasts.
Onica worked with Plan4 Co, a company looking to improve the accuracy of predicting COVID-19 spread and conditions, to plan and prepare responses to the pandemic.
Plan4 Co sought to make accurate predictions about various pandemic-related metrics, with the ultimate goal of producing forecasts that outperform Institute for Health Metrics and Evaluation (IHME) deaths and hospitalization forecasts in New York for a two-week forecast horizon. To this end, they enlisted Onica’s services.
The Onica team utilized several AWS services to combine data from multiple sources and produce forecasts. The team gathered 15 time-series from various sources that capture crucial aspects of the COVID-19 pandemic in New York, including mobility and COVID-19 test results.
We cleaned, transformed, and trained several DeepAR+ models using these time-series data streams, yielding forecasts that met and exceeded the goals set out for the project.
Amazon QuickSight was used to graph and visualize the forecasts and compare them to other well-known forecasts.
The Onica team set out to produce two-week COVID-19 deaths and hospitalization forecasts for New York state with a lower Mean Absolute Percentage Errors (MAPEs) than the IHME within the same period.
We forecasted two COVID-19 time-series:
- Deaths time-series: Daily number of individuals who die due to COVID-19 in New York state.
- Hospitalization time-series: Daily total number of patients in New York hospitals for COVID-19-related reasons.
The nature of the COVID-19 time-series makes forecasting extremely challenging. For example, because COVID-19 is a novel and ongoing phenomenon, the deaths and hospitalization time-series were short and did not cover a “full cycle” of the COVID-19 phenomenon.
This lack of data is a serious challenge to performing meaningful analysis, such as time-series decomposition. Thus, it was not possible to extract any reliable repeated patterns, trends, and residuals.
Another example is COVID-19’s extreme sensitivity to an intractably large number of related factors, including mobility, government-enforced and elective cautionary measures, and public awareness of the pandemic. This makes the resulting COVID-19 time-series radically different in each state.
When faced with such challenges, local forecasting methods, such as AutoRegressive Integrated Moving Average (ARIMA) and Exponential Smoothing (ETS), become naïve.
Some approaches, such as the curve-fitting approach, handle these challenges by assuming the shape of the underlying time-series and then estimating the shape parameters from existing data. This approach is also naïve since it does not natively account for the related factors’ effect on the target time-series and exhibits poor performance, especially when the underlying phenomenon does not follow a known shape.
These challenges require a solution capable of incorporating related time-series in the learning and the forecast generation processes.
AWS has a proprietary algorithm (called DeepAR+) capable of learning patterns between related time-series and the target time-series alongside learning to forecast the target time-series. DeepAR+ is a supervised deep learning algorithm for forecasting time-series using Recurrent Neural Networks (RNNs) capable of learning from the target and the related time-series.
Using several AWS native services—including Amazon Simple Storage Service (Amazon S3) as the storage layer, AWS Glue for crawling, structuring, and cataloging the data, AWS Lambda for data cleansing and preparation, Amazon QuickSight for visualizations and analysis, and Amazon Forecast as the primary time-series forecasting tool—the Onica team crawled and centralized 15 related time-series relevant to the target.
These time-series included death and hospitalization forecasts from other models, mobility data, COVID-19 testing data and New York weekend and holidays time-series.
The related time-series had radically different magnitudes, which biases the RNN-based algorithm’s learning process. Therefore, we preprocessed all of the time-series by applying the min-max transformation, resulting in time-series with the same magnitudes.
We used Amazon Forecast to train DeepAR+ models. Amazon Forecast randomly initializes the underlying RNN weights, and so different models can converge to different solutions. Thus, we trained several models using massive HyperParameter Optimization (HPO) configurations and picked the best models.
Since Amazon Forecast seeks to minimize MAPE as the objective function, we pick the model with the lowest MAPE as the best model and produce its forecasts.
We compare our best model’s performance with that of IHME’s using MAPE and the Area Under the Curve (AUC) measures.
The results show that DeepAR+ outperforms IHME during the two weeks by producing models that forecast the target time-series with a lower MAPE and AUC scores than IHME’s (see the figures below). These results indicate that the DeepAR+ can outperform IHME’s curve-fitting model, especially when more data is available.
Figure 1 – New York state’s hospitalization forecasts.
Figure 2 – New York state’s deaths forecasts.
The Onica team’s use of AWS services allowed Plan4 Co to automatically gather the necessary data for training and forecasting COVID-19 daily. These readily-available forecasts offer timely and critical information about the pandemic’s future to decision makers looking to navigate their organizations through our new reality.
Plan4 Co and Onica are looking to analyze the models further in many ways, including what-if analysis, quantifying the effect of the related time-series on the target time-series, and allowing Plan4 Co and Onica to carry out scenario-based investigations.
Blackline Safety: Building an IoT Contact Tracing Solution
The Internet of Things (IoT) is a rapidly growing area that interests organizations, businesses, and individuals alike. IoT systems consist of many interconnected devices that capture and stream data in real-time, and often require centralized data storage and real-time analysis that serve some useful purpose.
IoT has seen an explosion of applications in many aspects of life, such as smart homes and preventative maintenance. The COVID-19 pandemic has increased the need for advanced IoT technology and real-time data processing for applications, such as contact tracing.
These systems face several critical challenges regarding scaling and real-time processing. AWS offers various services that allow us to capture and stream the IoT data, process it, and derive real-time insights with minimal infrastructure management and operational costs. This includes Amazon Kinesis for data streaming, Amazon EMR for high-throughput computing, and Amazon Redshift for data storage.
Onica has recently utilized these services to deploy a contact tracing solution for Blackline Safety, a globally connected safety technology provider. Blackline worked with Onica to develop a high-volume streaming and data processing infrastructure on AWS to support a contact tracing IoT solution that can track worker movement.
The effort involved migrating Blackline’s data from their conventional data ingestion and storage solution to a modern AWS infrastructure that improved the quality of insights through higher data rate reporting.
Figure 3 – Blackline Safety use case architecture diagram.
Blackline Safety required a contact tracing reporting solution to keep workers safe amidst the COVID-19 pandemic. It needed to be a highly available, scalable, and real-time solution that can handle processing, storage, and analysis of high volumes of data to support Blackline’s quickly expanding customer base and growing solutions portfolio.
Furthermore, the solution needed to be deployed promptly amidst the urgency that COVID-19 put on the company’s systems.
The Onica team deployed an AWS solution that uses Amazon Kinesis and Amazon EMR to collect the high volume data streamed from Blackline Safety devices via integrated cellular and satellite connectivity.
The result is a workflow to ingest the higher data rate raw messages, and then enrich them to deliver high-value data in Amazon Redshift for reporting. It also integrated Delta Lake, providing scalable handling and cohesive streaming of data.
As the world comes together to fight the COVID-19 pandemic, artificial intelligence, machine learning, and data science initiatives have been instrumental in developing effective solutions to make sense of the uncertainties brought about by this pandemic.
From tracking and tracing the disease to predicting future spread to analyzing COVID-19 research, AI and ML can help in a wide variety of ways.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
Onica – AWS Partner Spotlight
Onica is an AWS Premier Consulting Partner that provides cloud consulting, infrastructure, and managed services, ensuring customers have the best technical solutions to solve their business challenges and deliver value for their organization.
*Already worked with Onica? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.