FYI--I gave up on ZeroTier and other ways to deal with CGNAT, so obtained a static IP from my ISP (they gave me a /31). So far, this issue with DNSMasq has become much less prevalent. I don't think it is completely gone, but I have had so many other issues with this change I can't say for sure.
I have been following this subject for quite a while. My R7000 has experienced the "maybe died, we need to re-exec it" issue for quite a few releases now. I don't know which release it started on. Current firmware is r55460.
My router is configured with a static IP address for my WAN. I use DDNS, a WireGuard tunnel, and MAC Priority QOS features of the router. I no longer use the router for DHCP or DNS. I moved those services off to a Raspberry Pi-3 based DNSMasq server running Raspbian light and DNSMasq version 2.85.
I have five R7000's that I have experienced the issue on over several releases of DDWRT. I switch routers each time I upgrade firmware so I have the previous release quickly to fall back on. I have tried defaulting the routers and reconfiguring them from scratch. I've changed power supplies. I've changed the NTP interval and DHCP lease times. None of these changes have prevented the DNSMasq re-exec from occurring.
I moved DHCP and DNS off the router to the Pi & reconfigured DDWRT appropriately. I used the same list of reserved IP addresses. There have been no DHSMasq issues on my Pi. DNS and DHCP work great, and DNS resolution seems quicker than before (new sites load quicker). I have configured rsyslog to send dnsmasq and dnsmasq-dhcp messages to my syslog server so that I can monitor it.
What is interesting to note is that even with DHCP and DNS functions off loaded from the R7000 router, I've found that that the "maybe died, we need to re-exec it" issue still occurs. It does not occur nearly as often on the router as it did before. As far as I can tell the only remaining function that is using DNSMasq on the router is NTP, but there may be others.
Now when the R7000 router's DNSMasq re-exec occurs, it does not interrupt my network activity. The same devices are on my network as before and they are assigned the same reserved IP addresses. To me this has ruled out the likelihood that some rouge IoT device on my network, or my DNSMasq configuration, being the cause of the DNSMasq re-exec issue.
I did not expect to see the "maybe died, we need to re-exec it" issue still occurring on the router after moving DHCP and DNS services to a different device.
Joined: 16 Nov 2015 Posts: 6447 Location: UK, London, just across the river..
Posted: Fri Mar 29, 2024 18:08 Post subject:
I guess my R7000 on 55460 is fine...i just looked at my syslog i dont have any problems with DNSmasq...
Im also using smartdns and vpn client, as well ipset dnsmaq blocking hosts rules and ad-blocker via script..as well some ipset (ip's) firewalling... so, far for a 1 day+ use no issues with it...
Very likely is a misconfiguration...problem on your side..
-i believe you didn't switch save files around different routers?? (it shouldn't be a problem but, in case...)
- or have a spamming device that floods dnsmasq or your DNS with requests, increase that value dns-forward-max=400
- try to add that option to advanced dnsmasq dns-loop-detect so, it will detect loops... _________________ Atheros
TP-Link WR740Nv1 ---DD-WRT 55630 WAP
TP-Link WR1043NDv2 -DD-WRT 55723 Gateway/DoT,Forced DNS,Ad-Block,Firewall,x4VLAN,VPN
TP-Link WR1043NDv2 -Gargoyle OS 1.15.x AP,DNS,QoS,Quotas
Qualcomm-Atheros
Netgear XR500 --DD-WRT 55779 Gateway/DoH,Forced DNS,AP Isolation,4VLAN,Ad-Block,Firewall,Vanilla
Netgear R7800 --DD-WRT 55819 Gateway/DoT,AD-Block,Forced DNS,AP&Net Isolation,x3VLAN,Firewall,Vanilla
Netgear R9000 --DD-WRT 55779 Gateway/DoT,AD-Block,AP Isolation,Firewall,Forced DNS,x2VLAN,Vanilla
Broadcom
Netgear R7000 --DD-WRT 55460 Gateway/SmartDNS/DoH,AD-Block,Firewall,Forced DNS,x3VLAN,VPN
NOT USING 5Ghz ANYWHERE
------------------------------------------------------
Stubby DNS over TLS I DNSCrypt v2 by mac913
I have been rotating these same r7000 routers around for several years without issues with upgrades or unusual issues noted in the syslog. The DNSMasq re-exec issue started for me third quarter last year.
Regarding the current router setup, I do not use Shortcut Forwarding Engine, STP, or SmartDNS
In the Basic Setup > WAN section the connection is set to Static with the IP information as provided by my ISP. Static DNS is set to my network DNSMasq server.
My router has DHCP disabled in the Basic Setup > Network Setup > DHCP Type is set to DHCP Server, DHCP Server is set to disabled. All check boxes in that section are un-checked.
Basic Setup > Network Setup > Router IP > Local DNS is set to my network DHCP server. "Saved" the configuration then "Applied Settings".
Services > Services Management > Dnsmasq Infrastructure > I disabled all options except for Enable DNSMasq, saved the configuration, then disabled DNSMasq, "Saved" the configuration, and "applied Settings". after waiting 120 seconds I rebooted the router.
Again, DNSMasq is running fine on my Pi server without any "maybe died, we need to re-exec it" issues, or any other issues logged.
Because my router still has an occasional DNSmasq re-exec issue logged, I would like to make the DNS option changes that you suggested. Since my router is configured to not use DNSMasq there is no box for me to add these DNSMasq options in.
Where do you suggest that these DNSMasq options be entered? Is there a .config file that I can add them to? Or Administration > Commands > Saved to Startup?
Well I am glad to see I'm not the only one having this issue. I just switched from a WZR-600DHP using r50841 to a WZR-1750DHP using r55460. All setting entered by hand after a factory reset. I'm also being plagued by "[dnsmasq] : maybe died, we need to re-exec it" although when it does die, it seems random. Happens enough to be very annoying. I also have SmartDNS disabled. Never had the issue when using WZR-600DHP with r50841. Hopefully this can be figured out.
OK, so this DNSmasq re-exec issue had me bothered so I went through all the suggestions which included.
Set NTP Server as IP. (I pointed to my own internal NTP server)
Set NTP Update period to 86400
Set ignore WAN DNS (I already had this)
Disable SFE (Didn't apply to me, I use CTF.)
Disable Forced DNS redirection DoT (I don't use it yet)
I tried setting DNS servers in Additional Options with "server=" and removed GUI to 0.0.0.0. I went back to setting them in GUI for both IPv4 and IPv6.
I set dns-forward-max=350 because I did see some errors while it was at 150.
SmartDNS was already disabled and I don't intend to use it.
no-negcache was already set.
None of the above seemed to help. DNSMasq kept dying. So I decided to look at /tmp/dnsmasq.conf directly. Copied that to my machine, dumped that into excel and sorted the options alphabetically so I could eyeball duplicate entries. I found that I had three options in Additional Options that were duplicated in /tmp/dnsmasq.conf:
bogus-priv
domain=mydomain.lan
interface=br0
So I removed those three entries from Additional Option, rebooted and those entries are no longer duped in /tmp/dnsmasq.conf. So far, I have been up for 1 day 4 hours and no more re-exec. It's weird because I had those options in Additional Option on my older WZR-600DHP and they didn't pose an issue. I just copy pasted Additional Options it from one router to another. Maybe this version of DNSmasq is more sensitive to dupes?
Joined: 08 May 2018 Posts: 14249 Location: Texas, USA
Posted: Wed Apr 17, 2024 3:51 Post subject:
There have been many changes where once-required additional options are now default configuration options. Perhaps the person who shall remain nameless who claims that "it just happens because it's ..." should check their configuration on their one device that is giving the log entry? But anyhow, this is probably why I don't have the issues on any devices in inventory, because I have removed all redundant config entries / adapted my configuration with the changes(?). _________________ "Life is but a fleeting moment, a vapor that vanishes quickly; All is vanity"
Contribute To DD-WRT Pogo - A minimal level of ability is expected and needed... DD-WRT Releases 2023 (PolitePol)
DD-WRT Releases 2023 (RSS Everything)
----------------------
Linux User #377467 counter.li.org / linuxcounter.net
Don't get me wrong, I think it's a bug too, it's just not a DDwrt bug. DNSMasq just dying with not so much of a clue even when enabling debug doesn't help much. Looking at /tmp/dnsmasq.conf directly to go through all the settings really helped out most. So far 2 days, 5 hours and still no re-exec. If I had to guess as two which of the three options I removed that caused the re-exec, I am going to wager it's interface=br0 being listed twice. If I have some time this weekend, I'll add those lines back in to see which one triggers a re-exec. Maybe someone can open a ticket with thekelleys.org.uk if they know how.
I am now at 04/16/2024 - r55799 and the issue still exists with Netgear R6400v2. There are moments where dnsmasq reboots even 3 times within an hour and then not even once for a day.
Joined: 08 May 2018 Posts: 14249 Location: Texas, USA
Posted: Thu Apr 18, 2024 21:55 Post subject:
@Megrez7: Have you read through the entire thread and tried everything and ruled out any misconfiguration or duplicated entries in the dnsmasq.conf and other related files on the current release (04-17-2024-r55819 as of this post)? I've run the gamut of various degrees of configuration on dnsmasq alone and cannot seem to brute force the log entry into existence on any of my test / deployed devices and am not having any luck producing it. _________________ "Life is but a fleeting moment, a vapor that vanishes quickly; All is vanity"
Contribute To DD-WRT Pogo - A minimal level of ability is expected and needed... DD-WRT Releases 2023 (PolitePol)
DD-WRT Releases 2023 (RSS Everything)
----------------------
Linux User #377467 counter.li.org / linuxcounter.net
@kernel-panic69 Yes I tried and followed all recommendations and sent logs as per instructions as I was writing in this thread in the past. Furthermore I have reset the router two times in the last 3 months, configuring everything from the scratch. As of yesterday I was running r55779, updated to r55819 today.
I guess this is difficult to diagnose but I see I am not the only one, so there must be some problem.
I have a new idea. Over last months I lost internet connection few times but my provider claimed all was fine at his side and it is just my router not obtaining public IP address for some reason. Restarting the router never helped. The problem disappeared on its own after a few hours. One day I found out that powering off desktop switch helped immediately. This drives to conclusion of potential device inside the network (switch or connected to it devices) causing the issue.
I am leaving for a week, so will have a chance to disconnect devices in the network for that time. I will leave only NAS connected directly to the router (static IP) and a tablet, so there is at least some device connected to and requesting IP from the router. Maybe this is some trace.
Joined: 08 May 2018 Posts: 14249 Location: Texas, USA
Posted: Fri Apr 19, 2024 17:11 Post subject:
Megrez7 wrote:
I have a new idea. Over last months I lost internet connection few times but my provider claimed all was fine at his side and it is just my router not obtaining public IP address for some reason. Restarting the router never helped. The problem disappeared on its own after a few hours. One day I found out that powering off desktop switch helped immediately. This drives to conclusion of potential device inside the network (switch or connected to it devices) causing the issue.
Is the desktop switch a managed switch (with it's own DHCP server or even client, no idea how your devices are connected to the upstream modem) possibly? That would possibly cause an issue. Sounds like a conflict, and I think that is something worth investigating on your end further before we blame dnsmasq or DD-WRT. Still waiting for someone to corroborate whether or not they still have the issue or have checked their configuration, but I am not holding my breath. I have too many devices deployed in service in mixed environments with recent releases (55630) that do not display this issue. _________________ "Life is but a fleeting moment, a vapor that vanishes quickly; All is vanity"
Contribute To DD-WRT Pogo - A minimal level of ability is expected and needed... DD-WRT Releases 2023 (PolitePol)
DD-WRT Releases 2023 (RSS Everything)
----------------------
Linux User #377467 counter.li.org / linuxcounter.net
Not blaming anything nor anyone! Just as I see few more people having the same issue, I am trying to help to solve it.
These are two simple unmanaged TP-Link TL-SG105 switches. Only one DHCP server within Netgear R6400. One subnet for wired and home wireless devices + second for guest wireless. I have attached network graph. I have never seen this behaviour as described earlier in this post.
Again, I followed all recommendations in this post and reset router twice already this year with manual configuration from the scratch.
I will observe through next week, when these two switches and all devices connected through them will be turned off. So this could potentially provide some hints. I can imagine that some broken PC Network Adapter or switch might cause these conflicts.
Meanwhile let me know if there are any other ides or things to check, test.