See the entire conversation

Anyone else on Cloudflare or Linode seeing increased latency and inexplicable timeouts since around May 1? They don’t map to significant changes in server load. Requests just occasionally time out for no apparent reason. Retry and they load normally. (info from Pingdom)
32 replies and sub-replies as of Jan 21 2021

I isolated the problem to 3 (of my 16) Linode webservers consistently performing far below normal speed. Not sure why, but it was fixed immediately by rebooting them, and I fixed it (hopefully) for good by moving them to different physical hosts by resizing to a different plan.
Is there a way you can tell if they have moved to new physical hosts? Or is it just a law of the jungle that resizing a VPS will trigger a move?
You can see the name of the physical host of a given node in their control panel.
Rebooting to fix is only supposed to work on windows!
I’ve seen odd issues after maintenance. Reboots fixed it.
Are you running standard or dedicated instances? If it's standard, it could be another tenant on the host.
These webservers are all dedicated.
Welp, I'm completely out of super obvious ideas you've already thought of then :)
I had to remove one site from Cloudflare, kept having latency and connectivity issues, solved once kicked off.
I'll expect to hear more about this in their next sponsor slot
We used to have terrible problems with CPU steal on Linode. Every so often some nutter would get end up on the same physical host as us and start doing something terrible to it. Each time we’d have to migrate hosts to avoid it. From memory, Linode’s own dash doesn’t show steal.
We had dozens and dozens of boxen. After doing a load of migrations we only found two proper resolutions: scale up to maximum-size nodes (in order to fully tenant the host), or migrate each app to GCP once its boxes got hit.
RPKI related? Cloudflare have been active with this recently in connection with ISP’s that don’t utilise it.
Nothing for me on CloudFlare (I don’t use Linode tho)
I use both and started seeing a slowdown this weekend. No timeouts tho and latency seems to have recovered
Can you identify if it is origin latency or edge latency? Assuming edge, can you narrow it down to region?
Pingdom’s reported timed-out edge regions are all over the world — it seems effectively random. (My origins are all Linode Dallas, so no diversity there.) I don’t think I pay for the right Cloudflare products to get detailed performance breakdowns from edge to origin.
we’d love to get to the bottom of the errors you’re seeing and better understand what features we can build to help you troubleshoot issues like this on your own. can you email me with details on the site in question? rustam@cloudfare.com
I do see a pingdom incident, but it seems questionable: status.pingdom.com Any chance you have pingdom pointed at both origin and edge and can see any correlation of events there? We had some issues on our PPV last night (not yet identified), but that was limited to Fastly.
can you add a pingdom check for the same endpoint but bypassing cloudflare?
I think it started when @caseyliss signed up for a pair of Linode servers to check if he’d remembered to lower the toilet seat.
Read yesterday the @Iconfactory mentioning some Linode dropdowns
I see none of the above from @linode in their Frankfurt location.
We use Cloudflare, Pingom and Linode. Not seeing this latency increase at all. We did notice a change at Linode around April 23rd that effected ipv6 resolution.
I downloaded Overcast on my iPad this morning and was getting a very long spinner when the app first opened, followed by this error. Maybe related? ¯\_(ツ)_/¯
Hey Marco – no known issues with our network and Cloudflare (no other customers seem to be reporting issues recently). If we can be of any help, just let us know!
Do you have any app-server side metrics? Anything fun in server logs? Could be file descriptor limits or ephemeral port exhaustion if tracerts to Cloudflare look good. Could also just be Cloudflare dropping your pingdom bot traffic if you don't see it in person.
I haven't seen this problem with anything I have behind Cloudflare, but I don't have any Linode stuff behind CF. And my Linode stuff hasn't really had problems.
We had exactly that problem (unexplained one-shot timeouts) on Digital Ocean / Frankfurt center too, unsuccessfully reinstalled on multiple servers there, and finally had to move to another region. Timing coincidence sure looks odd. Interested in what you can find out.
I have seen this and it has been driving me crazy trying to track down the issue. Giving a reboot a try.
What did linode support have to say about that? They usually are very responsive.