Discussion:
Juju Openstack - Blocked
James Beedy
2018-02-14 20:21:54 UTC
Permalink
Hello,

I am experiencing some issues with a base-openstack deploy.

I can get a base-openstack to deploy legitimately using MAAS with no
apparent errors from Juju's perspective. Following some init ops and the
launching of an instance, I find myself stuck with errors I'm unsure how to
diagnose. I upload an image, create my networks, flavors, and launch an
instance, and see the instance erring out with a "host not found" error
when I try to launch them.

My nova/ceph node and neutron node interface configuration [0] all have a
single flat 1G mgmt-net interface configured via MAAS, and vlans trunked in
on enp4s0f0 (untracked by maas).


Looking to the nova logs, I find [1] [2]


The bundle I'm using [3] is lightly modified version of the openstack base
bundle [4] with modifications to match my machine tags and mac addresses
for my machines.

I've gone back and forth with network and charm config trying different
combinations in hope this error is caused by some misconfiguration on my
end, but I am now convinced this is something outside of my scope as an
operator, and am hoping for some insight from the greater community.

I seem to be able to reproduce this consistently (using both Juju < 2.3.2
and 2.3.2).

Not even sure if I should create a bug somewhere as I'm not 100% sure this
isn't my fault. Let me know if additional info is needed.


Thanks,


James



[0] https://imgur.com/a/x62m9
[1] cat /var/log/nova/nova-api-metadata.log |
http://paste.ubuntu.com/p/cTSjjRnHQ8/
[2] cat /var/log/nova/nova-compute.log |
http://paste.ubuntu.com/p/YFnMs93xhG/
[3] https://paste.ubuntu.com/p/8hsQnmFYh4/
[4]
https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/bundle.yaml
Junaid Ali
2018-02-15 08:58:34 UTC
Permalink
Hi James,

Have you checked neutron-server.log in neutron-api and nova-scheduler.log
in nova-cloud-controller?


--
Junaid
Post by James Beedy
Hello,
I am experiencing some issues with a base-openstack deploy.
I can get a base-openstack to deploy legitimately using MAAS with no
apparent errors from Juju's perspective. Following some init ops and the
launching of an instance, I find myself stuck with errors I'm unsure how to
diagnose. I upload an image, create my networks, flavors, and launch an
instance, and see the instance erring out with a "host not found" error
when I try to launch them.
My nova/ceph node and neutron node interface configuration [0] all have a
single flat 1G mgmt-net interface configured via MAAS, and vlans trunked in
on enp4s0f0 (untracked by maas).
Looking to the nova logs, I find [1] [2]
The bundle I'm using [3] is lightly modified version of the openstack base
bundle [4] with modifications to match my machine tags and mac addresses
for my machines.
I've gone back and forth with network and charm config trying different
combinations in hope this error is caused by some misconfiguration on my
end, but I am now convinced this is something outside of my scope as an
operator, and am hoping for some insight from the greater community.
I seem to be able to reproduce this consistently (using both Juju < 2.3.2
and 2.3.2).
Not even sure if I should create a bug somewhere as I'm not 100% sure this
isn't my fault. Let me know if additional info is needed.
Thanks,
James
[0] https://imgur.com/a/x62m9
[1] cat /var/log/nova/nova-api-metadata.log |
http://paste.ubuntu.com/p/cTSjjRnHQ8/
[2] cat /var/log/nova/nova-compute.log | http://paste.ubuntu.com/p/YFnM
s93xhG/
[3] https://paste.ubuntu.com/p/8hsQnmFYh4/
[4] https://github.com/openstack-charmers/openstack-bundles/blob
/master/stable/openstack-base/bundle.yaml
--
Juju-dev mailing list
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
an/listinfo/juju-dev
James Page
2018-02-15 15:24:46 UTC
Permalink
Hi James
Post by James Beedy
Hello,
I am experiencing some issues with a base-openstack deploy.
I can get a base-openstack to deploy legitimately using MAAS with no
apparent errors from Juju's perspective. Following some init ops and the
launching of an instance, I find myself stuck with errors I'm unsure how to
diagnose. I upload an image, create my networks, flavors, and launch an
instance, and see the instance erring out with a "host not found" error
when I try to launch them.
My nova/ceph node and neutron node interface configuration [0] all have a
single flat 1G mgmt-net interface configured via MAAS, and vlans trunked in
on enp4s0f0 (untracked by maas).
Looking to the nova logs, I find [1] [2]
You can ignore most of those errors - prior to the charm being fully
configured the daemons will log some error messages about broken db/rmq
etc... newer reactive charms tend to disable services until config is
complete, older classic ones do not.

The compute node is not recording and error which would indicate some sort
of scheduler problem - /var/log/nova/nova-scheduler.log from the
nova-cloud-controller would tell us more.

The bundle I'm using [3] is lightly modified version of the openstack base
Post by James Beedy
bundle [4] with modifications to match my machine tags and mac addresses
for my machines.
Seems reasonable - bundles are meant as a start point after all!

I've gone back and forth with network and charm config trying different
Post by James Beedy
combinations in hope this error is caused by some misconfiguration on my
end, but I am now convinced this is something outside of my scope as an
operator, and am hoping for some insight from the greater community.
I seem to be able to reproduce this consistently (using both Juju < 2.3.2
and 2.3.2).
Not even sure if I should create a bug somewhere as I'm not 100% sure this
isn't my fault. Let me know if additional info is needed.
Lets dig into the scheduler log and see.

Cheers

James
James Beedy
2018-02-15 15:49:38 UTC
Permalink
Junaid and James,


Thanks for the response. Here are the logs.


Nova-Cloud-Controller

$ cat /var/log/nova/nova-scheduler.log |
http://paste.ubuntu.com/p/TQjDSXQSDt/

$ cat /var/log/nova/nova-conductor.log |
http://paste.ubuntu.com/p/68TcmMCr82/

$ sudo cat /var/log/nova/nova-api-os-compute.log |
http://paste.ubuntu.com/p/5xWpXbD5PC/


Neutron-Gateway

$ sudo cat /var/log/neutron/neutron-metadata-agent.log |
http://paste.ubuntu.com/p/MW3qkQqntJ/


$ sudo cat /var/log/neutron/neutron-openvswitch-agent.log |
http://paste.ubuntu.com/p/qz3vfzG9b9/


Neutron-api

$ sudo cat /var/log/neutron/neutron-server.log |
http://paste.ubuntu.com/p/sCCNw4bXtW/

Thanks,


James
Post by Junaid Ali
Hi James
Post by James Beedy
Hello,
I am experiencing some issues with a base-openstack deploy.
I can get a base-openstack to deploy legitimately using MAAS with no
apparent errors from Juju's perspective. Following some init ops and the
launching of an instance, I find myself stuck with errors I'm unsure how to
diagnose. I upload an image, create my networks, flavors, and launch an
instance, and see the instance erring out with a "host not found" error
when I try to launch them.
My nova/ceph node and neutron node interface configuration [0] all have a
single flat 1G mgmt-net interface configured via MAAS, and vlans trunked in
on enp4s0f0 (untracked by maas).
Looking to the nova logs, I find [1] [2]
You can ignore most of those errors - prior to the charm being fully
configured the daemons will log some error messages about broken db/rmq
etc... newer reactive charms tend to disable services until config is
complete, older classic ones do not.
The compute node is not recording and error which would indicate some sort
of scheduler problem - /var/log/nova/nova-scheduler.log from the
nova-cloud-controller would tell us more.
The bundle I'm using [3] is lightly modified version of the openstack base
Post by James Beedy
bundle [4] with modifications to match my machine tags and mac addresses
for my machines.
Seems reasonable - bundles are meant as a start point after all!
I've gone back and forth with network and charm config trying different
Post by James Beedy
combinations in hope this error is caused by some misconfiguration on my
end, but I am now convinced this is something outside of my scope as an
operator, and am hoping for some insight from the greater community.
I seem to be able to reproduce this consistently (using both Juju < 2.3.2
and 2.3.2).
Not even sure if I should create a bug somewhere as I'm not 100% sure
this isn't my fault. Let me know if additional info is needed.
Lets dig into the scheduler log and see.
Cheers
James
James Beedy
2018-02-19 22:42:52 UTC
Permalink
I wanted to chime back in with the solution to the issue I was
experiencing.

The source of the issue undoubtedly was me. I was trying to launch a flavor
which had insufficient root-disk for a xenial.img (2G).

The the traceback which indicated the insufficient disk error was being
obfuscated by the user facing error I was seeing, block_device_mapping
error [0].
This alongside the erroneous log error messages from service startup led me
down unwarranted rabbit holes.

I thought I was deploying the most stripped down instance with the bare
essentials from the horizon ui, in reality (thanks beisner) you can deploy
an even more stripped down instance (without a block device) from the
command line with the command `openstack server create foo --flavor
<flavor-name> --image <image-name>`.

Deploying an instance without a block device allowed us to bypass the
block_device_mapping error and get the real underlying traceback [1].

Massive thanks to Beisner for taking the time to talk through that with me,
and pointing out the command^ that led us to the real error.


~James

[0] https://paste.ubuntu.com/p/pTPh5vhBPp/
[1] https://paste.ubuntu.com/p/3XtxTTFXHM/
Post by James Beedy
Junaid and James,
Thanks for the response. Here are the logs.
Nova-Cloud-Controller
$ cat /var/log/nova/nova-scheduler.log | http://paste.ubuntu.com/p/TQjD
SXQSDt/
$ cat /var/log/nova/nova-conductor.log | http://paste.ubuntu.com/p/68Tc
mMCr82/
$ sudo cat /var/log/nova/nova-api-os-compute.log |
http://paste.ubuntu.com/p/5xWpXbD5PC/
Neutron-Gateway
$ sudo cat /var/log/neutron/neutron-metadata-agent.log |
http://paste.ubuntu.com/p/MW3qkQqntJ/
$ sudo cat /var/log/neutron/neutron-openvswitch-agent.log |
http://paste.ubuntu.com/p/qz3vfzG9b9/
Neutron-api
$ sudo cat /var/log/neutron/neutron-server.log |
http://paste.ubuntu.com/p/sCCNw4bXtW/
Thanks,
James
Post by Junaid Ali
Hi James
Post by James Beedy
Hello,
I am experiencing some issues with a base-openstack deploy.
I can get a base-openstack to deploy legitimately using MAAS with no
apparent errors from Juju's perspective. Following some init ops and the
launching of an instance, I find myself stuck with errors I'm unsure how to
diagnose. I upload an image, create my networks, flavors, and launch an
instance, and see the instance erring out with a "host not found" error
when I try to launch them.
My nova/ceph node and neutron node interface configuration [0] all have
a single flat 1G mgmt-net interface configured via MAAS, and vlans trunked
in on enp4s0f0 (untracked by maas).
Looking to the nova logs, I find [1] [2]
You can ignore most of those errors - prior to the charm being fully
configured the daemons will log some error messages about broken db/rmq
etc... newer reactive charms tend to disable services until config is
complete, older classic ones do not.
The compute node is not recording and error which would indicate some
sort of scheduler problem - /var/log/nova/nova-scheduler.log from the
nova-cloud-controller would tell us more.
The bundle I'm using [3] is lightly modified version of the openstack
Post by James Beedy
base bundle [4] with modifications to match my machine tags and mac
addresses for my machines.
Seems reasonable - bundles are meant as a start point after all!
I've gone back and forth with network and charm config trying different
Post by James Beedy
combinations in hope this error is caused by some misconfiguration on my
end, but I am now convinced this is something outside of my scope as an
operator, and am hoping for some insight from the greater community.
I seem to be able to reproduce this consistently (using both Juju <
2.3.2 and 2.3.2).
Not even sure if I should create a bug somewhere as I'm not 100% sure
this isn't my fault. Let me know if additional info is needed.
Lets dig into the scheduler log and see.
Cheers
James
Ryan Beisner
2018-02-20 04:53:02 UTC
Permalink
Great! Thanks for confirming. I'm happy to have helped de-obfuscate the
situation. I may or may not have caused that same little flavor overrun
"once" - cannot confirm. ;-)

Cheers,

Ryan
Post by James Beedy
I wanted to chime back in with the solution to the issue I was
experiencing.
The source of the issue undoubtedly was me. I was trying to launch a
flavor which had insufficient root-disk for a xenial.img (2G).
The the traceback which indicated the insufficient disk error was being
obfuscated by the user facing error I was seeing, block_device_mapping
error [0].
This alongside the erroneous log error messages from service startup led
me down unwarranted rabbit holes.
I thought I was deploying the most stripped down instance with the bare
essentials from the horizon ui, in reality (thanks beisner) you can deploy
an even more stripped down instance (without a block device) from the
command line with the command `openstack server create foo --flavor
<flavor-name> --image <image-name>`.
Deploying an instance without a block device allowed us to bypass the
block_device_mapping error and get the real underlying traceback [1].
Massive thanks to Beisner for taking the time to talk through that with
me, and pointing out the command^ that led us to the real error.
~James
[0] https://paste.ubuntu.com/p/pTPh5vhBPp/
[1] https://paste.ubuntu.com/p/3XtxTTFXHM/
Post by James Beedy
Junaid and James,
Thanks for the response. Here are the logs.
Nova-Cloud-Controller
$ cat /var/log/nova/nova-scheduler.log | http://paste.ubuntu.com/p/TQjD
SXQSDt/
$ cat /var/log/nova/nova-conductor.log | http://paste.ubuntu.com/p/68Tc
mMCr82/
$ sudo cat /var/log/nova/nova-api-os-compute.log |
http://paste.ubuntu.com/p/5xWpXbD5PC/
Neutron-Gateway
$ sudo cat /var/log/neutron/neutron-metadata-agent.log |
http://paste.ubuntu.com/p/MW3qkQqntJ/
$ sudo cat /var/log/neutron/neutron-openvswitch-agent.log |
http://paste.ubuntu.com/p/qz3vfzG9b9/
Neutron-api
$ sudo cat /var/log/neutron/neutron-server.log |
http://paste.ubuntu.com/p/sCCNw4bXtW/
Thanks,
James
Post by Junaid Ali
Hi James
Post by James Beedy
Hello,
I am experiencing some issues with a base-openstack deploy.
I can get a base-openstack to deploy legitimately using MAAS with no
apparent errors from Juju's perspective. Following some init ops and the
launching of an instance, I find myself stuck with errors I'm unsure how to
diagnose. I upload an image, create my networks, flavors, and launch an
instance, and see the instance erring out with a "host not found" error
when I try to launch them.
My nova/ceph node and neutron node interface configuration [0] all have
a single flat 1G mgmt-net interface configured via MAAS, and vlans trunked
in on enp4s0f0 (untracked by maas).
Looking to the nova logs, I find [1] [2]
You can ignore most of those errors - prior to the charm being fully
configured the daemons will log some error messages about broken db/rmq
etc... newer reactive charms tend to disable services until config is
complete, older classic ones do not.
The compute node is not recording and error which would indicate some
sort of scheduler problem - /var/log/nova/nova-scheduler.log from the
nova-cloud-controller would tell us more.
The bundle I'm using [3] is lightly modified version of the openstack
Post by James Beedy
base bundle [4] with modifications to match my machine tags and mac
addresses for my machines.
Seems reasonable - bundles are meant as a start point after all!
I've gone back and forth with network and charm config trying different
Post by James Beedy
combinations in hope this error is caused by some misconfiguration on my
end, but I am now convinced this is something outside of my scope as an
operator, and am hoping for some insight from the greater community.
I seem to be able to reproduce this consistently (using both Juju <
2.3.2 and 2.3.2).
Not even sure if I should create a bug somewhere as I'm not 100% sure
this isn't my fault. Let me know if additional info is needed.
Lets dig into the scheduler log and see.
Cheers
James
--
Juju-dev mailing list
Modify settings or unsubscribe at: https://lists.ubuntu.com/
mailman/listinfo/juju-dev
Loading...