To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
Edge computing is a term that has steadily gained prominence. It refers to a distributed framework where data processing is done close to the source of the data, rather than centrally in the cloud or private datacenter. The rise of edge computing comes with its possibilities and challenges. One of the latter is the need for a way to replicate data between the edge and the cloud, in real-time, so that information stays the same across all resources.
This is exactly what time-series database platform provider InfluxData has addressed with its recently announced Edge Data Replication feature, a new built-in capability of its flagship InfluxDB platform that provides synchronous replication of data between devices at the edge and datacenters in the cloud. The company says the availability of the new feature marks a key step for InfluxData’s goal of helping to solve the challenges of its customers using InfluxDB at the edge.
Edge computing’s paradigm shift
It’s worth taking a quick step back for a further discussion of both edge computing and its real-world scenarios. Edge computing represents a paradigm shift from the centralized model of the cloud to a more distributed one. This model is designed to reduce the latency resulting from transmitting data to the cloud, to increase bandwidth by only transmitting preprocessed data, and to prioritize agility.
In edge computing, data processing and decision-making shifts from the cloud to an endpoint device (“edge” device). The edge device performs data processing in place (“at the edge”), and the processed data is transmitted back to the cloud for storage and analysis. It’s important to note that the cloud remains important in this paradigm for storage and as a place where machine learning occurs on the pre-processed data.
As Rick Spencer, vice president of product at InfluxData, explains, “The edge environment keeps very granular data for local detailed analysis… but also sends data to the cloud, so analysts have a more accurate picture of what happened at the distributed locations.”
Out and about
What does edge computing look like in the real world?
“Every industry has its edge – the retail store has its point of sales systems, the bank has its ATMs, and even traditional offices have their desktops and remote workers’ laptops,” Spencer said.
He highlighted three specific examples in more detail: industrial machinery used in manufacturing, wind turbines in the energy industry, and high-frequency trading in the financial services industry.
In manufacturing, data is generated on the factory floor by industrial machines. Edge computing would allow the machines to process that data in place and only transmit the aggregated, transformed and/or labeled sets to the central data repository – whether cloud or private datacenter, for storage and further analysis. This allows for precision and agility at the edge and a unified view from the cloud.
In energy, wind turbines are another instance of a real-world edge scenario. Each turbine generates its own data, and there may be many turbines within a local network. As each turbine may generate thousands or millions of data samples per second, aggregating and summarizing the operational data from the group of turbines and replicating only the aggregate to the cloud optimizes operation and performance.
In financial services, high-frequency trading algorithms are deployed on devices installed physically close to the market’s servers. The data generated is then aggregated and replicated to a central repository, where AI models are trained on the aggregated data for faster time to insight.
In the cloud and at the edge
Since the cloud is still present and important for storage and analysis, one of the challenges introduced when processing significant amounts of data at the edge is the need to make sure the data across all resources – in the cloud and at the edge – remains the same.
Spencer describes the development of Edge Data Replication as stemming directly from the observation that customers already using InfluxDB at the edge were building their own makeshift solutions to this challenge, and the engineering team, therefore, “developed this functionality very much based on the needs they were seeing in their user community.”
Data replication at the edge
InfluxData’s Edge Data Replication performs synchronous replication, where data is replicated between the source and target simultaneously, and is backed with a disk-based durable queue.
As an illustration, Spencer gave the example of customers who “Run InfluxDB on ferries that are disconnected from the Internet during passages. When those ferries reach docks, they reconnect to the Internet, and the replicated data in the durable queue can then be sent along to the central cloud account.”
In-line data aggregations, transformation, and enrichment are enabled with Flux, InfluxData’s open-source data scripting and query language.
Edge Data Replication is a native capability within InfluxDB OSS version 2.2 and above. The feature does not have an added cost for users and is available out of the box.
As the edge computing paradigm shift continues, real-world edge scenarios are sure to become more prevalent, and vendors paying attention to the pain points of their customers will continue to develop new and creative solutions for them. With Edge Data Replication, InfluxData has made a significant first move in addressing the importance of the Edge to its customers, identifying a hurdle and presenting its own solution.