Skip to content

FortiOS: update logic in the initial playbook#3336

Merged
ipspace merged 5 commits intoipspace:devfrom
a-v-popov:fos-initial-update
Apr 23, 2026
Merged

FortiOS: update logic in the initial playbook#3336
ipspace merged 5 commits intoipspace:devfrom
a-v-popov:fos-initial-update

Conversation

@a-v-popov
Copy link
Copy Markdown
Collaborator

Fix for #3335

  • Task "Enable multi-VDOM mode if a traffic VDOM is defined" is always executed and it set only the proper vdom-mode.
  • Task "Wait after VDOM mode change" is executed when netlab_vdom_timer > 0 is set
  • Creation of the traffic VDOM is offloaded to the template

netlab_vdom_timer would be 0, if not set. For backward compatibility it might be better choice to set it to 60.

Jinja template is optimized to reduce flow control after VDOM, but I am not sure if system interfaces stanza has always been available from a VDOM.

- Task "Enable multi-VDOM mode if a traffic VDOM is defined"
  is always executed and it set only the proper vdom-mode.
- Task "Wait after VDOM mode change"
  is executed when netlab_vdom_timer > 0 is set
- Creation of the traffic VDOM is offloaded to the template
@a-v-popov a-v-popov force-pushed the fos-initial-update branch from 0fe8965 to f5372a2 Compare April 19, 2026 20:48
@sdargoeuves
Copy link
Copy Markdown
Collaborator

Thanks for this, it's looking great — and nice tidy-up on the hostname part.

I did a quick test, using a vagrant box on 7.4.8, created without VDOM, just the bare minimum configuration.

Test 1 - no netlab_vdom in the topology file

netlab up completes correctly, I haven't checked much further, but it seems ok!

Test 2 - set netlab_vdom: netlab to let netlab configure a multi-vdom FW

It fails with this error message:

TASK [Ensure `root` VDOM is set as admin] *****************************************************************************************
fatal: [fw1]: FAILED! => {"changed": false, "meta": {"http_status": 429, "raw": "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>429 Too Many Requests</title>\n</head><body>\n<h1>Too Many Requests</h1>\n<p>The user has sent too many requests\nin a given amount of time.</p>\n<p>Additionally, a 429 Too Many Requests\nerror was encountered while trying to use an ErrorDocument to handle the request.</p>\n</body></html>\n"}, "msg": "Error in repo"}

Now if I try to run netlab initial a little while later (I waited more than 1 minute), it works successfully.

Test 3 - set netlab_vdom: netlab and netlab_vdom_timer: 60

It now works correctly, as the 1 minute waits allows for the multi-vdom change to be applied.


My concern is Test 2 — failing unless the user sets an extra variable isn't great. I'd expect the reverse case (multi-VDOM box, topology disables it) to hit the same issue — have you tried it?

You already suggested defaulting netlab_vdom_timer to 60 for backward compatibility — I'd lean that way as the simplest fix.

Alternatively (or additionally), we could add retry logic on this task when we see a 429 (or any failure), which would make the playbook more resilient with or without vdom-mode transition. Hopefully the error response/type is consistent across FortiOS versions!

@ipspace
Copy link
Copy Markdown
Owner

ipspace commented Apr 20, 2026

Alternatively (or additionally), we could add retry logic on this task when we see a 429 (or any failure), which would make the playbook more resilient with or without vdom-mode transition.

I would suggest adding something along these lines to the failing task:

  register: result
  until: result|success
  retries: "{{ netlab_check_retries | default(20) }}"
  delay: "{{ netlab_check_delay | default(5) }}"

We use the same trick in https://github.com/ipspace/netlab/blob/dev/netsim/ansible/tasks/readiness-check/ssh.yml

The playbook continues immediately if the task succeeds; otherwise, it retries for however long we specify. This would also make Fortinet implementation similar to others (and using the same retry/delay variables).

Copy link
Copy Markdown
Owner

@ipspace ipspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of minor nits. At the very minimum, we should optimize the wait loops.

Also, the configuration-related comments are pure guesswork; I'm afraid I'm missing some nuances.

Comment thread netsim/ansible/tasks/fortios/initial.yml Outdated
sleep: 10 # time in seconds between checks
delay: 60 # Initial delay in seconds before first check
sleep: 10 # time in seconds between checks
delay: "{{ netlab_vdom_timer }}" # Initial delay in seconds before first check
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be replaced with until/delay/retries parameters (see my response to @sdargoeuves comment)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding the key part was to sit there quietly for delay seconds first. With until it will hit with http query immediately. But I might be missing something.

Comment thread netsim/ansible/tasks/fortios/initial.yml Outdated
system_global:
hostname: '{{ inventory_hostname.replace("_","-") }}'
register: hostname_result
retries: 5
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the same netlab_check_... variables here for consistency.

retries: 5
delay: 10 # Initial delay and waiting time between retries
delay: 10 # waiting time between retries
until: hostname_result is not failed and hostname_result.meta.http_status == 200
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would simple hostname_result|success work here?

delay: 10 # waiting time between retries
until: hostname_result is not failed and hostname_result.meta.http_status == 200
when: >-
vdom_mode_result.meta is defined and
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, this applies the hostname only when we switched into multi-VDOM mode before, but isn't the hostname applied in the configuration template? I'm probably missing some nuance here.

Also, if the purpose of this task is to check the API readiness (and setting the hostname is just a convoluted way of doing that), then maybe we could use a simpler R/O call and have it executed all the time (without the "when" condition)?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought: if the "when" condition is identifying whether we changed the VDOM, maybe we can just put the two waiting tasks in a block and apply the when condition to the block? That would skip them completely for @a-v-popov setup while still allowing @sdargoeuves setup to work.

@ipspace
Copy link
Copy Markdown
Owner

ipspace commented Apr 20, 2026

One other thought: Do we have to check HTTPS and API readiness? Is there a point where the HTTPS call succeeds but the API is not ready yet?

@a-v-popov
Copy link
Copy Markdown
Collaborator Author

a-v-popov commented Apr 20, 2026

I think I know what the problem is now.
AFAIK, the only officially supported way to authenticate to use API is to use API key.
I am using API keys and this is why I don't see the issue, most likely.
An alternative is to bump rate limiter in FortiOS cloud-init image.

Re hostname, I was trying to keep the guard logic before. But if I understand correctly there is nothing special about hostname specifically. I think we should just remove it from playbook and let it be assigned in the template.

I will update the PR.

UPDATE: or maybe not https://community.fortinet.com/t5/Support-Forum/429-Status-code-with-message-quot-Too-many-Request-quot/m-p/397611
I would try it anyway

@a-v-popov
Copy link
Copy Markdown
Collaborator Author

One other thought: Do we have to check HTTPS and API readiness? Is there a point where the HTTPS call succeeds but the API is not ready yet?

I am not 100% sure about the failure scenario, I suspect some interaction between Ansible FortiOS module frantically trying to do its magic and FortiOS trying to protect itself from DoS attack. API key should short circuit the whole thing.

I pushed a second draft. It switches to HTTPS to reduce number of connections. Keeps netlab_vdom_timer|default(0) for now, we will decide about it depending on your feedback.

- generate api-user key via ssh
- store it in host_vars/<node>/auth.yml
- remove hostname from initial playbook, assigned in the template
- switch to HTTPS port 443
@a-v-popov a-v-popov force-pushed the fos-initial-update branch from 12f58c3 to c0b971a Compare April 20, 2026 12:23
"password": "{{ ansible_ssh_pass }}"
"#":
- config global
- execute api-user generate-key netlab
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line fails as there is no api user netlab when I tested:

    "stdout_lines": [
        "Warning: Permanently added '10.194.59.11' (ECDSA) to the list of known hosts.",
        "",
        "FGVMxxxxxY5C # config global",
        "",
        "",
        "command parse error before 'global'",
        "Command fail. Return code 1",
        "",
        "FGVMxxxxxY5C # execute api-user generate-key netlab",
        "",
        "Could not find api-user netlab.",
        "Command fail. Return code -3",
        "",
        "FGVMxxxxxY5C # exit",
        "",
        "Connection to 10.194.59.11 closed."
    ]
}

If I manually configure the API user like this, it then works:

config system api-user
    edit "netlab"
        set accprofile "super_admin"
    next
end

And also (I would need to test further without having to do this step manually) it seems that I no longer need to wait 60s when configuring multi-vdom mode.

It's looking very good, we just need to add the api-user. Is this something you have built in as part of your vagrant box?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you have built in as part of your vagrant box?

It should be part of vagrant-box if it was built with cloud-init image.
https://github.com/a-v-popov/netlab/blob/205a09a6cde536a3adbe2b781ae1bfb4ccaaeea5/netsim/install/libvirt/fortios/openstack/latest/user_data#L28

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, my issue is that I haven't used the cloud-init image to create my vagrant boxes.
I'd be keen to add the creation of the api-user netlab as part of the task

@ipspace
Copy link
Copy Markdown
Owner

ipspace commented Apr 21, 2026

I like where this is going, and creating an API user as part of the initial configuration definitely makes sense. However, once the two of you agree that the new approach is ready to be used, we need to update the caveats documentation to reflect the new reality. It would also be nice to add information about the minimum FortiOS release that's supposed to work, and if you decide that FortiOS 7.0 is too ancient (I have no problem with that), that has to be documented.

- refactored changes to initial template to allign with a standard
  mutli-vdom ForitOS configuration
@a-v-popov a-v-popov force-pushed the fos-initial-update branch from 6d3228e to f78f89f Compare April 21, 2026 07:51
@a-v-popov
Copy link
Copy Markdown
Collaborator Author

a-v-popov commented Apr 21, 2026

I decided to curb previous optimization in template in favor of alignment with FortiOS saved configuration structure and general readability. Commits should be squashed to cancel out the mess.

I am leaving netlab_vdom_timer default to 0, to let anyone immediately benefit from the change.
Previous behavior can be effectively restored with the following configuration:

netlab_generate_api_token: false
netlab_vdom_timer: 60
ansible_httpapi_use_ssl: false
ansible_httpapi_port: 80

Let me know if this is not to your liking.

I tried to incorporate all the feedback, but please let me know if I missed anything, and/or feel free to amend anything you want.

@a-v-popov a-v-popov marked this pull request as ready for review April 21, 2026 08:05
@a-v-popov
Copy link
Copy Markdown
Collaborator Author

a-v-popov commented Apr 21, 2026

@ipspace I was able to start a topology with your favorite 7.0(.19) with the setting mentioned above with no-vdom, as it seems multi-vdom was not allowed without a proper license on that release.

"password": "{{ ansible_ssh_pass }}"
"#":
- config global
- execute api-user generate-key netlab
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, my issue is that I haven't used the cloud-init image to create my vagrant boxes.
I'd be keen to add the creation of the api-user netlab as part of the task

Comment on lines +16 to +21
responses:
"password": "{{ ansible_ssh_pass }}"
"#":
- config global
- execute api-user generate-key netlab
- exit
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      responses:
        "password": "{{ ansible_ssh_pass }}"
        "#":
        - config system api-user
        - edit "netlab"
        - set accprofile "super_admin"
        - next
        - end
        - config global
        - execute api-user generate-key netlab
        - exit

I've just tested with this, and I've got the famous: it works for me!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically I do not see much difference between admin user and netlab api-user. If one is set in a vagrant box the other could be configured at the same time. Screen scraping works, but it has its drawbacks. On the other end of the spectrum one could prebuild the whole config including both both users and feed it via cloud-image along with a license. Or use admin-scp. But in any case API token is sort of a problem, because it is expendable.

In my setup API keys are generated by a licensing script, so intend to set netlab_generate_api_token: false for my workloads anyway.

- add api-user via SSH
- reuse netlab_check_*
@a-v-popov
Copy link
Copy Markdown
Collaborator Author

Was directed to Native Python Types when my calculations failed.
https://jinja.palletsprojects.com/en/stable/nativetypes/

@sdargoeuves
Copy link
Copy Markdown
Collaborator

@a-v-popov this is looking great, thank you so much!

I've ran a few more tests, including one using a 7.0.3, and it passes as long as I don't use the multi-vdom option, and we need to use port 80 and disable tls:

    # netlab_vdom: "netlab"
    ansible_httpapi_use_ssl: false
    ansible_httpapi_port: 80

The other two variables don't seem to be required with this specific version of fortios.

Other tests involved 7.4.8 version, and now my existing boxes work perfectly. I don't even need to add the vdom_timer variable, even when using a multi-vdom setup.

I think this is great improvement, thank you again!

@ipspace
Copy link
Copy Markdown
Owner

ipspace commented Apr 22, 2026

It's so nice to see things work, but I need two things to get this merged (seeing that we're in "ready for review" stage):

@sdargoeuves
Copy link
Copy Markdown
Collaborator

  • Updated documentation (at least the new variable(s) should be described) -- I can do that if needed, but would need at least the minimum software release we're OK with.

I'm not sure about the minimum release we are ok with. I have 7.0.3 working (which is the only version I have ≤ 7.2), as well as 7.4 and 7.6. I do not have a working 7.2, nor anything before 7.0, but as @a-v-popov mentioned, it works fine with 7.0.19.

Pretty much! I'm OK merging this as long as we agree on the defaults for the new variables, which currently would introduce a breaking change for those older versions (< 7.2, unless it's ≤ 7.2).

If we use the values below as the defaults, we might avoid a breaking change based on the versions I have tested:

ansible_httpapi_use_ssl: false
ansible_httpapi_port: 80

How do you see this part @ipspace @a-v-popov ?

@ipspace
Copy link
Copy Markdown
Owner

ipspace commented Apr 23, 2026

I'm not sure about the minimum release we are ok with. I have 7.0.3 working

7.0 is worth mentioning only because it still uses the old licensing model. From that perspective, we could retain the current "7.0" versus "7.4+" split. Obviously, based on one of the @a-v-popov comments, I'd drop 6.x.

Pretty much! I'm OK merging this as long as we agree on the defaults for the new variables, which currently would introduce a breaking change for those older versions (< 7.2, unless it's ≤ 7.2).

I'm perfectly fine with having the default settings that work well with 7.4+ and a caveat saying "you need to adjust these for 7.0", documenting the whole thing as a breaking change.

If we use the values below as the defaults, we might avoid a breaking change based on the versions I have tested:

The real question is thus "do we want to enforce HTTPS?" Is there an advantage to doing that? If not, I'd stick with HTTP and document how to switch over to HTTPS in the caveats.

What would you prefer @a-v-popov ?

@a-v-popov
Copy link
Copy Markdown
Collaborator Author

a-v-popov commented Apr 23, 2026

The real question is thus "do we want to enforce HTTPS?"

The reason I choose HTTPS is that on 7.6 I saw HTTP simply responding with a redirect to HTTPS.
I don't know if rate limiter counts in this simple path or not, I have no reason to use HTTP.
With 7.0 HTTPS is explicitly listed as an evaluation license restriction. The error is follows:

...
ansible.module_utils.connection.ConnectionError: Could not connect to https://192.168.121.106:443/api/v2/monitor/system/status: [Errno 104] Connection reset by peer

Personally, I don't care much, as I need to have ~/.netlab.yml anyway for other reasons. Up to you.

minimum software release we're OK with

No idea, API keys should work across the board including 6.4/7.0.
In fact this is the only thing which is officially documented by Fortinet AFAIK.
fortinet/fortios collection version might be more be important, I am using the latest, which is 2.5.1 at the moment.

at least the new variable(s) should be described

  • netlab_generate_api_token: false - this is for me, because API token is generated when I license a VM anyway
  • netlab_vdom_timer: 60 - this is to keep people mad about backward compatibility off my back
  • ansible_httpapi_xxx - standard settings which are coming into play because I switched to HTTPS, irrelevant if defaults are set as proposed by @sdargoeuves

@ipspace
Copy link
Copy Markdown
Owner

ipspace commented Apr 23, 2026

The reason I choose HTTPS is that on 7.6 I saw HTTP simply responding with a redirect to HTTPS. I don't know if rate limiter counts in this simple path or not, I have no reason to use HTTP.

I'm OK with HTTPS as the default.

With 7.0 HTTPS is explicitly listed as an evaluation license restriction.

Added a caveat and a breaking change description.

@ipspace ipspace merged commit b965f90 into ipspace:dev Apr 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants