ES EN

May 16 to 31 (#10 of 2024)

2024-06-07

This English version is an abridged translation of the original newsletter issue, focused on the sections most useful for English navigation across the site.

News

1. A subtitling and transcription tool at the University of Alicante

José María Fernández Gil is one of those people who quietly improve life for many others. He is a specialist in digital accessibility and an app developer, and has worked at the University of Alicante since 2009, first at the student support center and later as head of the Digital Accessibility Unit.

On May 21, after more than a year of development, he launched an internal subtitling and transcription tool that automatically generates subtitles for videos published on the university platform.

I tried an early beta and was surprised by how well it worked. In just a few minutes, after correcting a handful of transcription errors, usually proper names or references unknown to the model, I had a fully subtitled video more than 20 minutes long. Without the tool, that would have taken me hours.

According to José María himself, the app uses Whisper, OpenAI’s open speech-recognition model, and in its testing phase it had already subtitled more than 2,000 videos and 1,500 hours of footage.

Now that it is in production and integrated into the university website, it will be an enormously valuable resource. It should make it much easier for all videos created by university staff to include subtitles and therefore be accessible to deaf or hard-of-hearing users. And not just to them. Accessibility options end up helping all of us sooner or later.

2. Microsoft, ARM, Copilot+ PCs, and Recall

On May 20, Microsoft presented its new Surface Pro with the Qualcomm Snapdragon X Elite ARM chip as part of its Copilot+ PCs event. The underlying strategy is obvious: compete more directly with Apple’s ARM laptops and tablets, which have already demonstrated excellent performance and efficiency.

The idea that seemed most important to Microsoft was not the hardware itself, but the broader concept of the “AI PC”, a machine that can understand what we are doing and help through a conversational interface. That vision is embodied in the new Copilot+ PC branding, which is basically a way of defining the minimum hardware requirements needed to support future AI features in Windows 11.

The most controversial of those features was Recall, which stores snapshots of your interaction with the computer so you can later search your own history using AI.

This feature immediately raised privacy concerns, for obvious reasons. Microsoft says the whole process is local and that it has no access to the recorded data. Even so, many people remain skeptical. Can the same be guaranteed for the PC manufacturer? Can users be sure that Xiaomi, Dell, or someone else will never be able to access the stored history? At that point it was all still an announcement, not a product in wide release.

My personal reaction was mixed: it looks immensely useful, and I would love Apple to build something similar into macOS. On the Mac side we have Rewind, which does something similar, but I do not really trust a startup with that much sensitive data.

3. Questions ahead of WWDC24

At that moment we were still a week away from Apple’s annual developer conference, WWDC24, so there were no answers yet, only questions.

My main questions before the event were these:

  • Will Apple present a local LLM that runs on the phone? If so, what will it be used for?
  • Will the rumored OpenAI deal be confirmed? If yes, will it mean access to GPT-4, or to a custom model tailored for Apple?
  • Will Apple ship an Xcode copilot trained for Swift and SwiftUI?
  • What direction will Siri take: a web-summarizing conversational agent, or an agent that can orchestrate other apps on the phone?

Those were the questions that shaped the following issue, where Apple finally answered them.

4. Google’s AI Overviews

Google began timidly integrating AI-generated summaries, AI Overviews, into search results. Almost immediately, some obviously bad answers went viral, including the famous “put glue on pizza cheese” answer.

Google responded on May 30 with a blog post saying these were exceptions and that the system was otherwise working well. That may be true. At web scale, some failures are inevitable. But the deeper issue is strategic: if Google answers the user directly, it risks cannibalizing traffic to the open web and to the publishers on which its ecosystem depends.

That is exactly the angle that Antonio Ortiz has been exploring at Error500, and it also came up clearly in conversations around Monos estocásticos and in Nilay Patel’s interview with Sundar Pichai on Decoder. Pichai’s argument is that people respond positively to having more context directly in search results. The real question is whether that new role is compatible with Google’s previous role as a traffic broker for the web.

My own take is that many of the most famous failures are a form of social-media-driven cherry-picking. But even if the specific examples are exceptional, the strategic shift is real.

5. Vision Pro and Marvel’s What If…?

I am interested in the Vision Pro not just as a gadget, but as a concentration of technologies and APIs that developers can use to invent new experiences. That can mean information-rich augmented reality, but also immersive entertainment.

Along those lines, on May 30 Marvel released its first immersive Vision Pro story: a roughly hour-long What If…? experience produced by ILM Immersive. It combines 3D film, immersive scenes, mixed-reality sequences, and first-person interaction.

Some reviewers were not too impressed, while others thought it showed the Vision Pro at its best. My feeling is that, whatever its limitations, it is the kind of early production that helps define the grammar of a new medium. It is trying to figure out what lies between cinema, mixed reality, and videogames.

6. Hans Bacher and Mulan

I also want to mention one beautiful newsletter issue I read during those days: Setting the Stage for “Mulan” from Animation Obsessive. It explains the enormous contribution of production designer Hans Bacher to Mulan (1998).

Bacher defined a visual style inspired by traditional Chinese painting while still preserving the Disney identity. His early guiding idea was “poetic simplicity”: minimalism, clarity, and a careful balance between quiet and busy compositions, straight and curved lines, and positive and negative space.

The studio thought some of the initial designs were too simple and pushed for more detail, so the final film became a negotiation between Bacher’s elegant minimalism and Disney’s richer visual tradition. Even so, his influence remained decisive.

Mulan is one of the Disney films my whole family has watched most often and loved most deeply. I like everything about it: the designs, the colors, the backgrounds, the editing, the animation, the characters, the story. I was so taken by the film that, when I found its art book years ago, I bought it despite its eye-watering price.

My two weeks

Films

The film I would most highlight from those days is Furiosa, George Miller’s prequel to Fury Road. I found it hugely entertaining, with spectacular landscapes and action sequences, and a very solid origin story for Furiosa.

TV

Among the series we watched, the one I would single out is season 2 of Bosch Legacy. It opens with two episodes of pure adrenaline and then settles into the procedural rhythm that the series handles so well, before ending with a very effective final twist.


See you next time.