Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper Discussion 13a: FRAME: Fault Tolerant and Real-Time Messaging for Edge Computing #107

Open
Others opened this issue Apr 5, 2020 · 11 comments
Labels
paper discussion s20 The discussion of a paper for the spring 2020 class.

Comments

@Others
Copy link
Contributor

Others commented Apr 5, 2020

paper

@searri
Copy link
Contributor

searri commented Apr 5, 2020

Reviewer: Rick Sear
Review Type: Skim

Problem being solved

Making efficient use of edge computing is tricky: edge devices need to quickly differentiate between how urgent a message is, as well as continue to work if messages are dropped. Current systems don't seem to balance these tasks well.

Important contributions

The paper introduces FRAME, which seeks to mitigate these issues. Each message passed to the edge device running FRAME middleware will have a "loss tolerance level," or number of consecutive message drops that are acceptable, as well as a "latency deadline". In their evaluation of FRAME, the researchers show that, using these features, it accomplishes its goals by meeting deadlines and fault tolerances, as well as reducing latency from fault recovery.

@gparmer gparmer added the paper discussion s20 The discussion of a paper for the spring 2020 class. label Apr 5, 2020
@bushidocodes
Copy link
Contributor

Reviewer: Sean McBride

Review Type: Simple Skim

Problem Being Solved:

The consumers of edge computing services have a high degree of QOS differentiation. How can edge compute infrastructure support this heterogeneity and ensure that the most critical workloads are appropriately prioritized over cloud-bound systems and non-latency sensitive workloads?

Main Contributions

  1. Developed a scheduling policy that uses loss-tolerance level (the number of consecutive message loses) and end-to-end latency deadlines to differentiate the QOS requirements of messages, and created a mathematical proof for the resulting temporal semantics.
  2. Implemented Fault-tolerant Real-time Messaging Architecture (FRAME), a clustered message broker that supports differentiated message delivery using this aforementioned scheduling policy. It is built upon the TAO real-time event service.
  3. Evaluated FCFS, FCFS- (FCFS w/o dispatch-backup replication), FRAME, and FRAME+ (no replication to secondary broker, fault tolerance via re-transmission), noting FRAME roughly reduced CPU utilization at high numbers of topics (7525) by roughly 50% over FCFS.

@rachellkm
Copy link
Contributor

Reviewer: Rachell Kim
Review Type: Skim

Problem Being Solved:

Systems supporting edge computing within Industrial Internet of Things (IIoT) applications often need to differentiate between different latency and loss tolerance requirements in order to ensure all levels of assurance within the system are met in a timely manner. The challenge is in keeping track of the discrepancies in heterogeneous requirements for the efficient operation of such systems.

Main Contributions:

The authors of this paper propose a fault-tolerant, real-time messaging architecture called FRAME which differentiates message topics by loss-tolerance levels and end-to-end latency deadlines. FRAME mitigates latency penalties and supports configurable scheduling policies to handle message topics according to their fault-tolerant and real-time requirements. The authors evaluated four different configurations of real-time messaging architectures using variations of FRAME features and revealed that FRAME was capable and efficient in terms of fault recovery, CPU utilization, and loss-tolerance.

@s-hanna15
Copy link
Contributor

Reviewer: Sam Hanna
Review Type: Skim

Problem Being Solved:
This paper talks about the need for fault-tolerant real-time messaging to use on the edge for IIoT (Industrial Internet of Things) devices. These devices must have message delivery and keep under a certain degree of message latency because of the importance of these messages, such as emergency response. This paper looks at providing the degree of message latency along with the need fault-tolerance for different degrees of importance in order to provide the best messaging scheme, so people get what they need.

Important Areas:
The way this paper goes about this is to create a method that has both publisher resend and broker backup as methods of fault tolerance. They also prove that the worst-case response time is still within acceptable limits. They propose the FRAME architecture and test it against FCFS, FCFS -, and FRAME +. The results of these tests showed that their architecture met the requirements that they wanted and outperformed FCFS and FCFS-.

@huachuan
Copy link
Contributor

huachuan commented Apr 6, 2020

Reviewer: Huachuan Wang
Review Type: Skim

Problem being solved

Industrial Internet of Things (IIoT) applications at the edge require fault-tolerant, real-time message delivery. Service differentiation is difficult and challenging depending on the latency discrepancies, heterogeneous loss-tolerance, and latency requirements. However, common fault-tolerant mechanisms tend to introduce additional latency, and cloud traffic may impede local, time-sensitive message delivery.

Important contributions

  1. This paper describes a new fault-tolerant real-time messaging model. This model describes timing semantics, identifies a message lost condition, and demonstrates how timing bound support message differentiation to meet each requirement.
  2. This paper introduced a differentiated Fault-tolerant ReAl-time MEssaging architecture called FRAME. The FRAME architecture can mitigate latency penalties caused by fault recovery, via an online algorithm that prunes the set of messages to be recovered.
  3. This paper describes their implementation of FRAME within the TAO real-time event service. FRAME can efficiently meet both types of requirements and mitigate the latency penalties caused by fault recovery.

@rebeccc
Copy link
Contributor

rebeccc commented Apr 6, 2020

Reviewer: Becky Shanley
Review Type: Skim

Problem Being Solved

The IIoT (Industrial Internet of Things), like many other real-time sensitive systems, have specific messaging latency/reliability requirements that cannot be generalized (ie, some systems need to be guaranteed no message loss but latency is unimportant, others need both no message loss and low latency). These variable systems are currently not differentiating between these two scenarios so they are either 1. Not meeting the strictest of their requirements or 2. Overusing resources for less time-sensitive requests.

Main Contributions

While they acknowledge that some suboptimal solutions exist (premature scheduling of cloud-bound traffic, service application), this paper suggests the need for an overhaul to many of the current conventions. They provide

  1. A fault-tolerant real-time messaging model
  2. FRAME, "a differentiated Fault-tolerant ReAl-time MEssaging architecture"
  3. An implementation and evaluation of FRAME using a popular real-time middleware.

@hjaensch7
Copy link
Contributor

hjaensch7 commented Apr 6, 2020

Reviewer: Henry Jaensch
Review Type: Critical

Problem Being Solved

Industrial Internet of Things applications have a unique set of requirements. These requirements manifest in a message passing scheme that is both timely and reliable. Message passing with fault tolerance typically incurs more latency while real-time message passing typically comes without fault tolerance. This paper introduces FRAME which is a real-time fault-tolerant messaging architecture. This is an interesting area of Iot because it focuses on communication between the devices which is crucial especially in an industrial setting.

Main Contributions

The FRAME messaging architecture allows for customization of messaging requirements. Fault-tolerant is typically implemented as a boolean yes it is or no it isn't. FRAME allows certain messages from less critical components to allow for some amount of loss. This way resources aren't wasted trying to guarantee fault-tolerance for low criticality components. This differentiation is also applied to latency requirements in a similar manor. Emergency systems that are highly critical demand low-latency and loss-tolerance while other components can support some loss and some degraded speed. FRAME uses a loss-tolerance level and an end-to-end latency dead-line to track latency/loss-tolerance requirements. Primarily this papers contributions include this way to delimit and negotiate differentiated latency/loss-tolerance requirements as well as providing an implementation and testing framework using TAO event service.

Questions

  1. While the pub/sub relationship is typical for messaging architectures does IIot always conform to it? Is it necessary to have all components use the same messaging architecture thus producing the need for varying criticality?

  2. I'm somewhat confused about why FCFS was already overloaded in the page with 12 graphs on it? Is that normal latency for 50 topics?

Critiques

  • While the evaluation in comparison with other architectures was thorough some of the tables felt unnecessary. Lines of 100.0 didn't appear useful and seemed to waste space.

  • If failover is triggered once a broker crashes isn't every messages in route lost? Assuming that broker is responsible for replication?

@samfrey99
Copy link
Contributor

Reviewer: Sam Frey
Review Type: Skim

Problem
Industrial IoT applications often require far stricter bounds on latency and message loss than consumer applications. In addition, each industrial application accepts different levels of latency and message loss. A system that requires low latency but receives thousands of messages a minute likely doesn't need to spend computational power ensuring every message is received, whereas a system receiving only 100 messages each minute should guarantee each is read. This makes building a general solution almost extremely challenging, and the solutions available are less than perfect.

Contributions:
The authors propose FRAME, a fault-tolerant real-time messaging architecture. FRAME takes in constraints for latency and message loss and with them computes relative deadlines for the system. Every time a message is received, it gets copied to a buffer, and a job is created to address the message. With FRAME, the authors were able to meet the same message loss requirements as other systems even with varying latency.

@pcodes
Copy link
Contributor

pcodes commented Apr 6, 2020

Reviewer: Pat Cody
Review Type: Skim

Problem Being Solved

Industrial Internet of Things devices (IIoT) often have latency-sensitive applications running on edge nodes. These edge nodes, however, cannot distinguish between different granularities of "latency-sensitive" or "loss-tolerance", and as such there is a one-size-fits-all approach to it. This causes problems when there are some messages from applications that can be dropped, or delivered late, and others that absolutely must be delivered on time with a reliable guarantee.

Paper Contributions

This paper addresses those problems by introducing a new a model to deliver real-time messages, and an architecture called FRAME that allows for message differentiation according to the model they created. The model has proven timing bounds to allow real-time operation, and provides a way to describe when a message can be lost. In their implementation of FRAME, they used the TAO middleware and successfully created a system that can utilize their model and mitigate latency penalties from fault recovery.

@RyanFisk2
Copy link
Contributor

Reviewer: Ryan Fisk

Review Type: Critical

Problem being solved

IoT devices on the edge need to be able to talk to each other reliably and in a timely manner to ensure that time critical tasks are completed on schedule. However, these devices may have different requirements in terms of latency and message reliability. Safety critical systems may need to ensure zero message loss with minimal latency, while non-critical systems can be more fault tolerant. Edge devices need to be able to schedule communications with both the devices themselves and the cloud in such a way that latency-sensitive tasks are prioritized and messages are delivered reliably.

Contributions

This paper introduces FRAME, a fault tolerant and real-time messaging system that allows for message prioritizing based on latency requirements. FRAME has proven bounds for latency and is able to differentiate between processes with different requirements and prioritize more latency-sensitive tasks. The fault-tolerance comes from a backup message broker that retains a set of recent messages in case the primary broker crashes. The message subscriber also retains a copy of the message in case the broker crashes when before it can copy the message over to the backup system.

Questions

  1. What, if any, are the security concerns of this system?

  2. Why doesn't the publisher send message to both the primary and backup broker, wouldn't this create less work for the primary broker and speed up message delivery?

Critiques

  1. The paper introduces this system as something that would be used when critical information needs to be delivered reliably and quickly. If this is going to be used on safety-critical systems I would have liked to see some consideration about how secure this system is.

  2. The paper assumes only one broker failure (the primary one), what would happen if the backup crashed before the primary or if the whole system crashed (in the case of a power outage or electrical failure)?

@tuhinadasgupta
Copy link
Contributor

Reviewer: Tuhina
Review Type: Comprehension

Problem :
Industrial Internet of Things devices (IIoT) often have latency-sensitive applications running on edge nodes. The edge nodes can't differentiate between "latency-sensitive" aka safety-critical and "loss-tolerant" aka non-critical, which is an issue because one requires being completed on schedule. This paper introduces FRAME which is a real-time fault-tolerant messaging architecture as a possible solution to the problem.

Contribution:
FRAME has proven bounds for latency and is able to differentiate between processes with different requirements and prioritize more latency-sensitive tasks; it takes latency constraints as input and computes relative deadlines for components of the system. Every time a message is received, it gets copied to a buffer, and a job is created to address the message. The fault tolerance is provided through a backup message broker that contains the set of recent messages in case the primary broker crashes.

Questions:

  1. This paper doesn't really address security, which is pretty critical in this IoT use case, what's happening there? I guess it's a safe assumption that this system has all of the security concerns that are a part of edge computing but would still like to know.
  2. On a smaller scale, (till about 7525) FCFS is more efficient than FRAME. Are the target users of FRAME all larger than that? Who are the ideal users of this system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
paper discussion s20 The discussion of a paper for the spring 2020 class.
Projects
None yet
Development

No branches or pull requests