+
Skip to content

Resilient verk improvements #180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 22, 2019
Merged

Resilient verk improvements #180

merged 7 commits into from
Jan 22, 2019

Conversation

edgurgel
Copy link
Owner

@edgurgel edgurgel commented Jan 22, 2019

Here are some changes I made from the original PR #159

  • Change so that QueueManager maintains verk_nodes and verk:node:#{node_id}:queues up-to-date. It avoids possible de-synchronization between Redis and local state of the Verk node. The main idea behind this changes is: "If any job is added to the inprogress list, this node and this queue must be tracked so that other nodes can rescue their failure if it happens."
    QueueManager will conditionally maintain these data structures if generate_node_id is true.
  • Change Node.Manager to not crash if heartbeat failed.

edgurgel added 6 commits January 17, 2019 13:38
If generate_node_id is true it will ensure that the set of nodes
includes the local node_id and that the set of tracked queues includes
the managed queue

If generate_node_id is false then it will work as previously
This data will be tracked automatically by QueueManager
Verk.Node.register/3 is not necessary anymore. Verk.Node.expire_in/3 can
be used instead as QueueManager adds the running node to the nodes key
@edgurgel edgurgel requested a review from alissonsales January 22, 2019 02:11
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 88.679% when pulling 0bb2164 on resilient-verk-improvements into 36666a9 on master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 88.679% when pulling 0bb2164 on resilient-verk-improvements into 36666a9 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 88.679% when pulling 0bb2164 on resilient-verk-improvements into 36666a9 on master.

@coveralls
Copy link

coveralls commented Jan 22, 2019

Coverage Status

Coverage decreased (-0.5%) to 88.679% when pulling be7d769 on resilient-verk-improvements into 36666a9 on master.

@edgurgel
Copy link
Owner Author

Updated algorithm:

  • Each time a job is moved to the list of jobs inprogress of a queue this node is added to verk_nodes (SADD verk_nodes node_id) and the queue is added to verk:node:#{node_id}:queues (SADD verk:node:123:queues queue_name)

  • Each frequency seconds we set the node key to expire in 2 * frequency
    PSETEX verk:node:#{node_id} 2 * frequency alive]

  • Check for all the keys of all nodes. If the key expired it means that this node is dead.

  • To restore we go through all the running queues (verk:node:#{node_id}:queues) of that node and enqueue them from inprogress back to the queue. Each "enqueue back from in progress" is atomic (<3 lua) so we won't have duplicates.

How to use:

  • Set the application env generate_node_id to true

If it's not true it won't use this new code. It will basically work as before.

Copy link
Collaborator

@alissonsales alissonsales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@edgurgel edgurgel merged commit 719939a into master Jan 22, 2019
@edgurgel edgurgel deleted the resilient-verk-improvements branch January 22, 2019 22:07
Copy link

@jimsynz jimsynz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载