Convopage : @marcoarment : Anyone else on Cloudflare or Linode seeing increased latency and inexplicable timeouts since around May 1? They don’t map to significant changes in server load. Requests just occasionally time out for no apparent reason. Retry and they load normally. (info from Pingdom)

Convopage

See the entire conversation

Anyone else on Cloudflare or Linode seeing increased latency and inexplicable timeouts since around May 1? They don’t map to significant changes in server load. Requests just occasionally time out for no apparent reason. Retry and they load normally. (info from Pingdom)

32 replies and sub-replies as of Jan 21 2021

Marco Arment@marcoarment

I isolated the problem to 3 (of my 16) Linode webservers consistently performing far below normal speed. Not sure why, but it was fixed immediately by rebooting them, and I fixed it (hopefully) for good by moving them to different physical hosts by resizing to a different plan.

Joel D 🕯🦴@joeld

Is there a way you can tell if they have moved to new physical hosts? Or is it just a law of the jungle that resizing a VPS will trigger a move?

Thomas Mertz@tmertz

You can see the name of the physical host of a given node in their control panel.

Greg Marra@themarranator

Rebooting to fix is only supposed to work on windows!

Greg Benedict@gbenedict

I’ve seen odd issues after maintenance. Reboots fixed it.

Nate Vack@njvack

Are you running standard or dedicated instances? If it's standard, it could be another tenant on the host.

Marco Arment@marcoarment

These webservers are all dedicated.

Nate Vack@njvack

Welp, I'm completely out of super obvious ideas you've already thought of then :)

ViridisFructus@ViridisFructus

I had to remove one site from Cloudflare, kept having latency and connectivity issues, solved once kicked off.

CJ@JonesTheSteam72

I'll expect to hear more about this in their next sponsor slot

Kuba Baran@smartkidpl

😉

Aidan Fitzpatrick@afit

We used to have terrible problems with CPU steal on Linode. Every so often some nutter would get end up on the same physical host as us and start doing something terrible to it. Each time we’d have to migrate hosts to avoid it. From memory, Linode’s own dash doesn’t show steal.

Aidan Fitzpatrick@afit

We had dozens and dozens of boxen. After doing a load of migrations we only found two proper resolutions: scale up to maximum-size nodes (in order to fully tenant the host), or migrate each app to GCP once its boxes got hit.

Tim@timbo_baggins

RPKI related? Cloudflare have been active with this recently in connection with ISP’s that don’t utilise it.

Guilherme Rambo@_inside

Nothing for me on CloudFlare (I don’t use Linode tho)

Ben Cunningham@codeblue87

I use both and started seeing a slowdown this weekend. No timeouts tho and latency seems to have recovered

Weston Houghton@unsane1

Can you identify if it is origin latency or edge latency? Assuming edge, can you narrow it down to region?

Marco Arment@marcoarment

Pingdom’s reported timed-out edge regions are all over the world — it seems effectively random. (My origins are all Linode Dallas, so no diversity there.) I don’t think I pay for the right Cloudflare products to get detailed performance breakdowns from edge to origin.

Rustam X. Lalkaka@lalkaka

we’d love to get to the bottom of the errors you’re seeing and better understand what features we can build to help you troubleshoot issues like this on your own. can you email me with details on the site in question? rustam@cloudfare.com

Weston Houghton@unsane1

I do see a pingdom incident, but it seems questionable: status.pingdom.com Any chance you have pingdom pointed at both origin and edge and can see any correlation of events there? We had some issues on our PPV last night (not yet identified), but that was limited to Fastly.

Adrian Mester@AdrianMester

can you add a pingdom check for the same endpoint but bypassing cloudflare?

David R. Greenberg@CheckwDavid

I think it started when @caseyliss signed up for a pair of Linode servers to check if he’d remembered to lower the toilet seat.

ivanCantarino@ivancantarino

Read yesterday the @Iconfactory mentioning some Linode dropdowns

Thomas Mertz@tmertz

I see none of the above from @linode in their Frankfurt location.

Matthew Gregg@braintube

We use Cloudflare, Pingom and Linode. Not seeing this latency increase at all. We did notice a change at Linode around April 23rd that effected ipv6 resolution.

Parker Wightman@parkerwightman

I downloaded Overcast on my iPad this morning and was getting a very long spinner when the app first opened, followed by this error. Maybe related? ¯\_(ツ)_/¯

Linode@linode

Hey Marco – no known issues with our network and Cloudflare (no other customers seem to be reporting issues recently). If we can be of any help, just let us know!

🧩 Alex@codatory

Do you have any app-server side metrics? Anything fun in server logs? Could be file descriptor limits or ephemeral port exhaustion if tracerts to Cloudflare look good. Could also just be Cloudflare dropping your pingdom bot traffic if you don't see it in person.

🧩 Alex@codatory

I haven't seen this problem with anything I have behind Cloudflare, but I don't have any Linode stuff behind CF. And my Linode stuff hasn't really had problems.

Raphael Sebbe@rsebbe

We had exactly that problem (unexplained one-shot timeouts) on Digital Ocean / Frankfurt center too, unsuccessfully reinstalled on multiple servers there, and finally had to move to another region. Timing coincidence sure looks odd. Interested in what you can find out.

mark dorison@markdorison

I have seen this and it has been driving me crazy trying to track down the issue. Giving a reboot a try.

Red F@frf37

What did linode support have to say about that? They usually are very responsive.