Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ops issues #290

Open
Evanfeenstra opened this issue Sep 9, 2024 · 8 comments
Open

ops issues #290

Evanfeenstra opened this issue Sep 9, 2024 · 8 comments
Assignees

Comments

@Evanfeenstra
Copy link
Collaborator

crashing ec2

  • limit memory per container, and total docker limit?
  • rate limiting in traefik
  • logs outside sometimes - cloudwatch. also make sure to have good logs
  • log rotation if its local

ip addresses changing

  • static IPs on lightning nodes ($3/month)
  • could still have a load balancer (for domains) Forward to traefik

better logs in swarm UI

@Evanfeenstra
Copy link
Collaborator Author

superadmin

  • creating swarms
  • restarting EC2
  • update route53 stuff

@tomsmith8
Copy link
Contributor

@Evanfeenstra could you prioritise setting docker and container limits.

@kevkevinpal could you prioritise migrating the btc graph, updating the github actions pipeline and deprecating the non swarm ec2 instances

Next up then would be setting up cloud watch?

@Evanfeenstra
Copy link
Collaborator Author

just merged a per container memory limit, set it once and it applies to every container

84ab225

Its global_mem_limit in the yaml config file, its a number in bytes

@Evanfeenstra
Copy link
Collaborator Author

@tobi-bams here's a new SetGlobalMemLimit cmd, maybe u can add a frontend for it? https://github.com/stakwork/sphinx-swarm/blob/master/src/cmd.rs#L152

@tobi-bams
Copy link
Contributor

@tobi-bams here's a new SetGlobalMemLimit cmd, maybe u can add a frontend for it? https://github.com/stakwork/sphinx-swarm/blob/master/src/cmd.rs#L152

Yea, sure I can.

@Evanfeenstra
Copy link
Collaborator Author

@tomsmith8
Copy link
Contributor

Update all swarms to m5.large or higher.

Do not use t groups due to CPU credits and spikes causes machines to become unavailable.

@tomsmith8
Copy link
Contributor

@Evanfeenstra any updates on keeping logs?

  • not deleting and keeping locally
  • future -> stream logs to something like cloudwatch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants