Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heat stuck at bastion #397

Open
ghost opened this issue Dec 21, 2017 · 6 comments
Open

Heat stuck at bastion #397

ghost opened this issue Dec 21, 2017 · 6 comments

Comments

@ghost
Copy link

ghost commented Dec 21, 2017

Hi Everybody,

I am trying to deploy OCP 3.5 (even 3.7) on OSP 11 from Red Hat.
When I run the heat script, it does create the stack, all the necessary networks are created and creates the bastion and does the usual cloud-init provisioning steps (adding repos, updating, installation basic packages) and cloud init send the finished signal and get the HTTP 200.

After that, it get stuck at

$> openstack stack resource list -n 2 ocp2 | grep -i progress
| bastion_host                    | 98bd1fee-87c3-4360-bd4b-549e39d1345e | file:///Users/myself/projects/openshift-on-openstack/bastion.yaml                                              | CREATE_IN_PROGRESS | 2017-12-21T16:00:41Z | ocp2                                                     |
| deployment_write_templates      | c8be1435-3125-4e06-8234-b620dd556fa8 | OS::Heat::SoftwareDeployment                                                                                        | CREATE_IN_PROGRESS | 2017-12-21T16:01:12Z | ocp2-bastion_host-n4vsl5fz4maw                           |
| deployment_update_node_count    | 79327e5c-579d-4a95-a0b4-e93c52385afd | OS::Heat::SoftwareDeployment                                                                                        | CREATE_IN_PROGRESS | 2017-12-21T16:01:12Z | ocp2-bastion_host-n4vsl5fz4maw                           |
| deployment_tune_ansible         | a705f997-3cf0-44aa-90f1-af21e3a23ca1 | OS::Heat::SoftwareDeployment

If I force the signal with openstack heat resource signal ... it goes to the next step but I see that the ansible template isn't create and the usual pushed files aren't present.
The /etc/os-collect-config.conf points to the good endpoint:

$> cat /etc/os-collect-config.conf
[DEFAULT]
command = os-refresh-config
collectors = ec2
collectors = cfn
collectors = local

[cfn]
metadata_url = https://10.1.3.11:13005/v1/
stack_name = ocp2-bastion_host-n4vsl5fz4maw
secret_access_key = 7e7214750d1a48c9a4cad81010fe2173
access_key_id = 494ab1ed83b441168423aec7d868267c
path = host.Metadata
$> openstack endpoint list | grep heat
| 1b24a4cf65a74e38992c4d8230a6e7da | regionOne | heat-cfn     | cloudformation | True    | internal  | http://172.17.1.16:8000/v1               |
| 2f666c5f3f25445682d8cc6ca51f9488 | regionOne | heat         | orchestration  | True    | admin     | http://172.17.1.16:8004/v1/%(tenant_id)s |
| 557a1fc9ff2549a8bc142bd305ac26bb | regionOne | heat-cfn     | cloudformation | True    | public    | https://10.1.3.11:13005/v1               |
| 622df692e35b424b93cd24f54c577df4 | regionOne | heat         | orchestration  | True    | public    | https://10.1.3.11:13004/v1/%(tenant_id)s |
| da4ed879390b4b6c9d97e114aa011f49 | regionOne | heat         | orchestration  | True    | internal  | http://172.17.1.16:8004/v1/%(tenant_id)s |
| fba19a090ed6437f86513a91e9cdc0ba | regionOne | heat-cfn     | cloudformation | True    | admin     | http://172.17.1.16:8000/v1

After few hours, it times out and the stack is failed.

Does anyone might have a clue why?

Thanks a lot for your support
P.

parameters.yaml

parameters:
  ssh_key_name: myself
  bastion_image: rhel-guest-image-7.2-20160302.0.x86_64
  bastion_flavor: m1.medium
  master_image: rhel-guest-image-7.2-20160302.0.x86_64
  master_flavor: m1.medium
  infra_image: rhel-atomic-cloud-7.2-10.x86_64
  infra_flavor: m1.medium
  node_image: rhel-atomic-cloud-7.2-10.x86_64
  node_flavor: m1.medium
  loadbalancer_image: rhel-atomic-cloud-7.2-10.x86_64
  loadbalancer_flavor: m1.medium
  ocp_version: 3.5
  osp_version: 11

  external_network: internet_access
  container_subnet: 192.168.1.0/24
  loadbalancer_type: neutron

  dns_nameserver: 8.8.4.4,8.8.8.8
  node_count: 2

  rhn_username: ""
  rhn_password: "."
  rhn_pool: ""
  extra_rhn_pools: ""
  deployment_type: openshift-enterprise
  domain_name: "example.com"
  master_hostname: "openshift-master"
  node_hostname: "openshift-node"
  ssh_user: cloud-user
  master_docker_volume_size_gb: 25
  infra_docker_volume_size_gb: 25
  node_docker_volume_size_gb: 25

  system_update: false

resource_registry:
  #OOShift::LoadBalancer: ../openshift-on-openstack/loadbalancer_dedicated.yaml
  OOShift::LoadBalancer: ../openshift-on-openstack/loadbalancer_neutron.yaml
  OOShift::ContainerPort: ../openshift-on-openstack/sdn_openshift_sdn.yaml
  OOShift::IPFailover: ../openshift-on-openstack/ipfailover_keepalived.yaml
  OOShift::DockerVolume: ../openshift-on-openstack/volume_docker.yaml
  OOShift::DockerVolumeAttachment: ../openshift-on-openstack/volume_attachment_docker.yaml
  OOShift::RegistryVolume: ../openshift-on-openstack/registry_ephemeral.yaml
@Doc-Savage
Copy link

@pburgisser - Did you ever figure out this problem? I seem to be stuck in exactly the same place...

-Andy

@daleking
Copy link

I have the same issue.

What I've noticed is that the wait_handle in bastion.yaml is not set up until after the success signal is sent by fragments/bastion-boot.sh. I can see this in /var/log/containers/heat/heat-engine.log on the controller node(s). Moving the order of wait_condition resource to the top helps but I haven't worked out the exact dependencies to make it work properly yet.

@Doc-Savage
Copy link

@daleking - Thanks for the info. What I have done is gone over to openshift-on-openstack-123 and have made it a bunch further. Of course I had to flail about wildly. I may come back to this problem once I get over the hump.

@tomassedovic
Copy link
Contributor

Hey folks, I'm really sorry but none of the past maintainers of this repo are able to dedicate much time to it (including myself).

The good news is that the openshift-ansible project (the main OpenShift installer -- this repo uses it under the hood, too) now includes playbooks for various cloud providers including OpenStack:

https://github.com/openshift/openshift-ansible/tree/master/playbooks/openstack

If it helps any, this is what most Red Hat engineers involved with running OpenShift on OpenStack these days are working on.

I'll update the readme to reflect this, but in the meantime, this project is not really maintained anymore.

@hakanelgin
Copy link

Hi Doc,

May be I can help you.

My setup is RH OCP 3.7 on OSP 12 RHEL 7.5
It looks like ready signal not back to your stack engine.

  • Wich VM are deployd already? Bastion, master, infra
  • If bastion deployd and if you can login via vip ip of OSP console than check /var/log/cloud-init-output.log, search in that file ‘part-0’ if you see e.g. part-012 it means cloud-init user-data script part-012 has some trouble and it is not executed fully, you can find those files /var/lib/cloud/instance/scripts/, those are linux commandos thus check ech commado if its exe well
  • Is OSP api work TLS if so do you have server certificate in your bastion host, send curl command to heat-cfn endpoint then you will know
  • Check if all packages are installed

@daleking
Copy link

OK, solved my issue - the WaitCondition signals were OK but the heat agents were not installed in my cloud image (official RedHat 7.5) so the SoftwareDeployment steps were not being run.

The following work around ensures that openstack-heat-agents is installed so that the OS::Heat::SoftwareDeployment tasks do not time out:

daleking@475e997

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants