Convopage : @marcoarment : Apple’s platforms offer a few quality modes for variable-speed audio playback. Safari on macOS — not other browsers, and not even iOS Safari — continues to use the worst/fastest one. Depending on the API in use, they could most likely fix this with a single line of code. (url)

Convopage

See the entire conversation

Apple’s platforms offer a few quality modes for variable-speed audio playback. Safari on macOS — not other browsers, and not even iOS Safari — continues to use the worst/fastest one. Depending on the API in use, they could most likely fix this with a single line of code.

Tim Baker 🚀@IAmTimBaker

+@OvercastFM +@marcoarment Why is it that playing podcasts at 2x speed sounds amazing on Overcast on iOS/iPadOS, and even on macOS via M1 support for iOS apps, but very robotic when listening through Safari macOS? Is there a different audio API you use in the app?

34 replies and sub-replies as of Dec 18 2020

CM@chimp_magnet

Seems like Apple’s own Podcast app uses the worst one as well.

Dan Studnicky@danstudnicky

Even Podcasts.app on Big Sur sounds like warbly garbage at anything other than 1x. Podcasts on iOS sounded great at other speeds around 13.2 or so.

Dan Studnicky@danstudnicky

Er, 12.2 not 13.2.

David Klein@diklein

Ha I’ve been wondering about this but assumed the answer

Tim Baker 🚀@IAmTimBaker

This is super interesting, and shocking that Apple has not addressed this issue on Safari.

Patrick@PatrickCahiII

I had absolutely no idea that you could even do this on the website. Had never even considered it. This is amazing.

Ville Turpeinen@vinski_

Could it be that they chose the fastest because they’re too obsessed with battery life 🤔

Jer☢️Noble@jernoble

As the person who wrote the line of code in question, I have no idea what you’re talking about. trac.webkit.org/changeset/1810…

Jer☢️Noble@jernoble

What algorithm specifically would you have Safari switch to?

Marco Arment@marcoarment

Try AVAudioTimePitchAlgorithmTimeDomain.

Jer☢️Noble@jernoble

You’ll note in the WebKit commit I posted, that is the algorithm we moved away from in favor of Spectral, in this case because of user complaints about YouTube playback. WSOL algorithms like TimeDomain work well for speech but fall apart when there’s any harmonics in the audio.

Jer☢️Noble@jernoble

My fever dream would be to let web authors “hint” their content as “speech” vs “music” and pick the right algorithm in response.

Marco Arment@marcoarment

What does iOS Safari use? On macOS, what does Chrome use? Both sound significantly better than Safari, last time I checked.

Jer☢️Noble@jernoble

iOS Safari and macOS Safari use the exact same code in WebKit to pick the pitch algorithm. Chrome, as far as I know, uses their own WSOLA algorithm; not any provided by the OS. It’s possible that iOS and macOS have different implementations of Spectral, but I doubt it.

Chris Liscio@liscio

This is plausible, depending on when the AVPlayer class was written, and when kAudioUnitSubType_NewTimePitch showed up on both platforms. kAudioUnitSubType_TimePitch was macOS-only, and iOS had no equivalent counterpart for quite a while.

Marco Arment@marcoarment

Interesting — I just tested it and found that there is indeed no obvious difference between macOS Safari and iOS Safari on sped-up speech. May be recent — a few releases ago, that wasn't the case. FWIW, Chrome/Brave still sound FAR better than Safari (and always have on Macs).

Chris Liscio@liscio

Time-domain methods (PSOLA/WSOLA) are cheap/easy to implement compared to the frequency-domain methods, and they do very well with speech (and other simple) audio content, but they completely fail for most musical and other complex content.

Chris Liscio@liscio

The ideal solution would be to automatically detect the content and switch accordingly. A decent alternative would be to provide the user with options, but that's some _deep_ knob-twiddling for a web browser. This is why we make purpose-built native apps, Marco! 😀

Jer☢️Noble@jernoble

If you ran a FFT over the decoded audio and looked for multiple, simultaneous frequency spikes, you may be able to detect “speech” vs. “non-speech”. But heuristics like this are easy to get wrong, and importantly, aren’t free.

Jer☢️Noble@jernoble

More “free” would be something like allowing the page to declare their content was “speech” vs. “music”. This would solve Marco/Overcast’s use case, but wouldn’t help sites like YouTube with user generated content.

Jer☢️Noble@jernoble

And making the user bear the cognitive cost of figuring out what pitch-correction algorithm to use isn’t great either. We don’t have any obviously great options here.

Marco Arment@marcoarment

Maybe you’ve chosen the wrong default. Might speech be the more common case in non-1x variable-speed playback in the context of a web browser? (I’d assume so, but you probably know better.) And what’s worse when guessing wrong: speech under freq domain, or music under *SOLA?

Jer☢️Noble@jernoble

That’s certainly possible! When we made this decision at first, YouTube was the primary user of non-default-rate playback in the wild. A lot has changed since then. I’m wary of just flip-flopping what use cases get the short end of the stick, though.

Jer☢️Noble@jernoble

And now Netflix has added UI for non-1x playback. Anecdotally, I suspect between YouTube and Netflix, harmonic content is probably more commonly played at non-1x rates, but I have no data to back that up.

Marco Arment@marcoarment

I bet YouTube and video sites are indeed the most common use for browser audio playback at non-1x. But it doesn’t necessarily follow that music is the most common non-1x content, or that the *SOLA artifacts in music are less acceptable than the spectral artifacts in speech.

Jer☢️Noble@jernoble

As far as what content sounds worse, I don’t have a clear answer. From the samples I’ve been listening to, music and speech sound approximately equally bad, with speech sounding robotic and with music nearly dropping entire instruments.

othermaciej@othermaciej

Our speed-shifting could definitely be better. And it's something we're looking into. I think Jer's point is that it's not just an easy one-liner fix. We can choose a different set of tradeoffs for speech vs music, or else drive improvements to the set of available algorithms.

Anthony Ricaud@anthony_ricaud

What’s the use case for speeding up music ? (genuinely intrigued why people would do that)

othermaciej@othermaciej

I wonder that myself, but apparently there are people who want to do it.

Chris Liscio@liscio

Just to add context, my app Capo offers up to 1.5x playback for music practice, and it’s useful for pushing yourself to play a passage/song faster than the 1x speed so that 1x becomes that much more comfortable to play.

Dale Curtis@DaleCurtis

FWIW, we still get complaints about Chrome's algorithm (particularly at sub-1x rates), so +1 to it being a complicated series of tradeoffs.

Alejandro Martinez@alexito4

I’ve reported this many years ago, with a recording of a video playing in both Chrome and Safari and how Safari sounded worse. The reply? “Can’t reproduce” 🤦🏻‍♂️ This is literally the only reason I have chrome installed. Is my media playback browser.

Jason@Jasonlehmania

I’ve had 3 video chat platforms (zoom,google meet, and an interviewing thing) not work on safari because of the robotic voice thing. Pretty embarrassing for Apple when the other people go “oh are you using safari, yeah switch to chrome it will fix that” 😂

Brad Vrolijk@buffaloseven

It’s a nightmare. Their Catalyst-based Podcasts app for macOS also uses the same lousy method. It makes me think nobody at Apple uses variable-rate playback on the Mac.