Hi there, nice to meet you. Follow me at http://belfong.com/tweets
477 stories
·
2 followers

★ Keyboard Maestro Hack of the Week: Don’t Paste Images

1 Comment

My number one tip for becoming a Mac power user is to get into Keyboard Maestro. Using Keyboard Maestro feels like gaining superpowers. I keep meaning to write more about Keyboard Maestro, and so I’m just going to start documenting all the little use cases I find for it. Here’s one from today.

I use MarsEdit to publish at least 99 percent of the posts on this site. (The other 1 percent are posts I create on my phone, using the web interface for Movable Type.) I use MarsEdit a lot. About once a week or so, I accidentally try to paste text in MarsEdit when I think I have text on my clipboard, but it’s actually an image. When you paste an image in MarsEdit, it’s not like pasting into Mail or Notes or TextEdit, where the image just goes into the text. So MarsEdit, trying to be helpful, opens its Upload Utility window — which, if I were using WordPress or some other CMS, might allow to me upload the image to my server for referencing from the HTML of the blog post. That’s not how my system works, and not how I want it to work, so every time this happens I have to close the Upload Utility window. And every time, I try to do this by hitting the Esc key on my keyboard. But the Upload Utility window isn’t a dialog box with a Cancel button that would be triggered by Esc. It’s a regular window. So after hitting the Esc key, which doesn’t do anything in this context, I then remember, once again, that I need to hit ⌘W instead. (I think I don’t naturally think to hit ⌘W because my instincts tell me ⌘W would try to close the blog window I’m writing in.)

Today it happened again, and finally the notion occurred to me that I could fix this with Keyboard Maestro. My first thought was that I could create a macro that would close the frontmost window in MarsEdit if, and only if, the frontmost window was named “Upload Utility”. A second later it occurred to me that I could probably do better than that, and prevent the Upload Utility window from opening in the first place if I ever try to paste an image in MarsEdit.

I was right. This wasn’t just super easy to create in Keyboard Maestro, it was super quick. I’ve spent 10× more time writing about this macro here than I did creating it. I think that’s why I so seldom write about my little hacks in Keyboard Maestro — they not only save me time and eliminate annoyances once they’re created, but they’re so easy to create that I just get back to whatever I was previously doing after making a new one.

First, I have a group (think: folders) in Keyboard Maestro for every app for which I’ve created app-specific macros. You just create a new group and set it to only be available when one (or more) specific applications are active. Inside my group for MarsEdit, I created a new macro named “Don’t Paste Images”.

It’s triggered by the hot key sequence ⌘V. That means every single time I paste in MarsEdit, this macro will run. Keyboard Maestro is so frigging fast that I’ll never notice. (Keyboard Maestro macros execute so fast that in some scenarios, you have to add steps to pause for, say, 0.2 seconds to keep the macro from getting ahead of the user interface it’s manipulating.)

The macro executes a simple if-then-else action with the following pseudocode logic:

if the System Clipboard has an image
    play a sound
else
    simulate the keystroke ⌘V

That’s the whole thing. And it worked perfectly the first time I tried it. Here’s a screenshot of my macro.

So if I type ⌘V in MarsEdit, and the clipboard contains an image, I just hear a beep. (I could just default to the system beep, but I chose the standard MacOS “Bottle” sound just for this macro — I sort of want to know that it’s this macro keeping me from pasting whatever text I wrongly thought was on my clipboard, so I want a distinctive sound to play.) Nothing gets pasted, so MarsEdit’s Upload Utility window doesn’t appear.

If the clipboard doesn’t contain an image, then Keyboard Maestro simulates a ⌘V shortcut and that gets passed to MarsEdit, and from my perspective as a user, it’s just like a normal paste of the text I expected. I have a few macros that work like this, where the macro is trigged by an application’s own keyboard shortcut, and the macro will (if certain conditions are met) pass through the same simulated keyboard shortcut to the application. When I first tried this, many years ago, I was half worried that it would trigger an infinite loop, where the simulated keystroke from the Keyboard Maestro macro would re-trigger the macro. I was wrong to worry — Keyboard Maestro is too clever for that.

You almost certainly don’t have my particular problem with the occasional inadvertent pasting of images into MarsEdit. But I bet you have your own esoteric annoyances related to your own most-used apps and most-frequent tasks. Keyboard Maestro lets you effectively add your own little features to your favorite apps — often with no “scripting” at all. The best part is, while writing this very blog post, my new “Don’t Paste Images” macro saved me from seeing that cursed Upload Utility window once more, because I had the screenshot of the macro on my clipboard, when I thought I had copied the URL for it on my server.

Read the whole story
Belfong
6 days ago
reply
Keyboard Maestro is an essential tool for me, as a Mac user. I probably triggered it hundreds of times a day. This and Alfred are my most treasured tools.
malaysia
Share this story
Delete

‘The Hardest Working Font in Manhattan’

1 Share

Extraordinary illustrated essay by Marcin Wichary, documenting a typeface — and its long, fascinating, splintered history — that exemplifies the difference between beautiful and pretty. The beauty in Gorton isn’t just in its plainness and hardworking mechanical roots — it’s in the history of the 20th century itself. Gorton became such a part of the world that the bygone world of the previous century imbues how this font makes me feel.

Do yourself a favor and read this one in a comfortable chair, with a tasty slow-sipping beverage, on a screen bigger than a phone. Everything about this piece is exemplary and astounding — the writing, the photography, the depths of research. But most of all, Wichary’s clear passion and appreciation. It’s a love letter.

Read the whole story
Belfong
39 days ago
reply
malaysia
Share this story
Delete

★ Nokia’s Next-Day Internal Competitive Analysis of the Original iPhone Largely Got It Right

1 Share

A few days ago Nokia unveiled their Design Archive at Aalto University in Finland. Fahad X spotted a real gem — an internal confidential slide deck shared within the company the day after Apple had introduced the original iPhone at Macworld Expo in January 2007. To the credit of the team that put this presentation together, they mostly got it:

  • “iPhone touch screen UI may set a new standard of state-of-art: The new user interface may change the standards of the superior user experience for the whole market”. They quote Avi Greengart (a keen and sharp observer): “visually stunning and incredibly responsive”.

  • The on-screen keyboard is mentioned only in passing on a deck titled “Other Great Innovations” as “Predictive, corrective input from on-screen qwerty-keypad”. Most of the existing smartphone makers simply could not believe the iPhone’s touchscreen keyboard would work. Steve Ballmer famously laughed his way into getting sacked from Microsoft over the iPhone’s price and lack of a hardware keyboard. BlackBerry was obsessed with hardware keyboards. This Nokia deck is remarkably open to the idea that Apple was onto something. It does have a bullet point, under “iPhone has the biggest impact on the definition of coolness” (true!), that states: “Even though Steve Jobs emphasized iPhone superiority to ‘Buttons’, it is to be expected that the Consumer QWERTY category will continue to succeed.” But still, this deck is remarkable for acknowledging the potential significance of the iPhone’s keyboard.

  • It’s a swing and a miss regarding third party software: “No mention either of Java support, unusual user input method may be the reason. Lack of Java would shut out a big mass of existing SW.” So close to getting it right. The iPhone’s lack of support for the then-“dominant” Java ME (Micro Edition) platform did shut out all existing mobile software. But all of that software sucked, big time. Sucked to develop, sucked to distribute, sucked to install, sucked to use. Not supporting it was a huge win for the iPhone, just like not supporting Flash Player wound up being a huge win for both the iPhone in particular and the mobile web generally.

  • It’s weird in hindsight that the deck makes mention of Java mobile apps but the only mention of the web is in passing: “Browsing - Safari web browser – (reportedly most advanced ever) Full screen with touch zoom-in functionality.” This speaks to Nokia’s pre-iPhone mindset that mobile platforms were not really internet-first devices but instead were mostly for running shitty software from carriers (for consumers) or corporate IT departments. The iPhone’s excellent day-one version of Mobile Safari, with support for the “real internet”, not the “baby internet” (to use Steve Jobs’s own terms). It’s so easy to get blinded by the way things currently work and to assume they’ll keep working that way.

  • The Nokia team nailed the remarkable and potentially industry-changing nature of Apple’s relationship with Cingular:

    • Cingular got multi-year exclusivity to iPhone. In exchange it gave up to Apple in many respects:
      • Cingular has allowed Apple to launch a device with WLAN and inbuilt services
      • The Yahoo and Google experiences are built in to the Apple phone
      • iTunes synch is done in cradle, no OTA
      • The multi-year exclusivity of the Cingular — Apple arrangement makes one ask if Apple gets a share of data revenues or Cingular iTunes revenues?
      • This may be a start for the whole market to change
    • iPhone will be sold both in Apple stores and Cingular stores
    • This is a marriage of convenience built on mutual distrust and recognition that each has assets the other needs

    Nokia concluded, “Other US operators need desperately something against Cingular and Apple” and advised that Nokia should “work very closely especially with T-Mobile.”

The most prescient line in the whole deck was this, recognizing that the iPhone was a UI breakthrough and that the entirety of Nokia’s offerings were far behind:

Nokia needs to develop the touch UI to fight back. S60 should be focus. Maemo platform is critical strength due to openness. Nokia needs a Chief UI architect to re-energize Nokia’s UI innovation across platforms and businesses.

I’d translate that as the presentation team imploring Nokia’s leadership to recognize that Apple’s primary priority was creating and delivering a great experience, both hardware and software, and Nokia had no such institutional value placed on design quality — on coolness — and would need to reinvent itself to prioritize the same ideals if it was going to compete.

Upton Sinclair’s famed adage is apt, as ever: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

Read the whole story
Belfong
68 days ago
reply
malaysia
Share this story
Delete

Trying out QvQ - Qwen's new visual reasoning model

1 Comment and 3 Shares

I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the Apache 2.0 licensed Qwen licensed (the license changed) QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities".

Their blog post is titled QvQ: To See the World with Wisdom - similar flowery language to their QwQ announcement QwQ: Reflect Deeply on the Boundaries of the Unknown a few weeks ago in November.

It's a vision-focused follow-up to QwQ, which I wrote about previousy. QwQ is an impressive openly licensed inference-scaling model: give it a prompt and it will think out loud over many tokens while trying to derive a good answer, similar to OpenAI's o1 and o3 models.

The new QvQ adds vision to the mix. You can try it out on Hugging Face Spaces - it accepts an image and a single prompt and then streams out a very long response where it thinks through the problem you have posed it. There's no option to send a follow-up prompt.

I've tried it out with a bunch of things, with mixed results - but it's really fun seeing how it works through a problem.

My most successful prompt was "Count the pelicans" with this image:

A photograph of four glorious young California Brown Pelicans perched on a dock. The back end of another pelican is visible at the front of the photo.

Here's the full response. Some highlights:

So I've got this picture of some pelicans, and I need to count them. Let me see... First off, I know that pelicans are those big birds with the massive beaks, right? [...]

Let me try counting again, more carefully this time. Starting from the left:

  1. The first pelican is fully visible, standing at the front.
  2. Behind it, slightly to the right, is the second pelican.
  3. Then, further back, there's a third pelican.
  4. And next to that one, there's a fourth pelican.

So, that's four pelicans in total. But, I notice that on the far right of the image, there's part of another bird, maybe a pelican's back or tail feathers, but it's not fully in the frame, so I don't think I should count it as a whole pelican.

Something about the tone of voice it uses, "First off, I know that pelicans are those big birds with the massive beaks, right?", tickles me.

I also tried this on a much harder image - it counted 17 which looks about right to me but honestly I have trouble with that image myself.

I fed in one of the ARC-AGI puzzles that o3 had failed at.

A puzzle with colored squares, where drawing a line between the single blue squares and turning any intersected rectangles blue is clearly the solution.

It produced a very lengthy chain of thought that was almost entirely incorrect, but had some interesting "ideas" in it:

[...] Let me try to think in terms of cellular automata rules, like Conway's Game of Life.

In Game of Life, each cell's state in the next generation is determined by its current state and the states of its eight neighbors.

Perhaps a similar rule applies here.

Finally, I asked it to "Estimate the height of the dinosaur" against this image (which, as it correctly noted, is actually an inflatable dragon):

A large inflatable Christmas decoration of a reindeer riding a dragon - both reindeer and dragon are wearing Christmas hats

I'm not sure this task is even possible given the lack of items in the frame of comparable known height. It had a good go at it though! I particularly enjoyed how it began:

So I've got this inflatable Christmas decoration, and it's a dragon! But not just any dragon—it's a green dragon wearing a Santa hat and a red and white scarf. And get this, there's a reindeer on its back, also wearing a Santa hat and holding onto the dragon's wings. It's pretty festive and colorful, with lots of vibrant greens, reds, and yellows.

The dragon has these big, friendly eyes and a smiling face, which makes it look really cheerful. Its wings are spread out, and it has white claws on its feet. The reindeer is brown with antlers and seems to be enjoying the ride on the dragon's back.

Now, the question is, how tall is this inflatable dragon? I need to estimate its height. Let me see. I'm standing next to it, and I'm about 5 feet 10 inches tall. The dragon is definitely taller than me. Maybe around 8 or 9 feet high? But I should try to be more precise.

I wonder how it decided that its own height was 5 feet 10 inches!

Running QvQ locally

All of my experiments so far have used the hosted Hugging Face demo. I'm hoping to try it out on my own laptop soon - I think it should just fit in 64GB of M2, maybe with a smaller quantization.

Right now the options are:

As a happy user of Ollama's qwq port I'm hoping they add a QvQ release at some point soon as well.

Running QvQ on my Mac using mlx-vlm

Update: I successfully ran QvQ on my 64GB M2 MacBook Pro using Prince Canuma's mlx-vlm via uv run with the following command:

uv run --with 'numpy<2.0' --with mlx-vlm python \
  -m mlx_vlm.generate \
    --model mlx-community/QVQ-72B-Preview-4bit \
    --max-tokens 10000 \
    --temp 0.0 \
    --prompt "describe this" \
    --image pelicans-on-bicycles-veo2.jpg

My image input here was taken from my experiment with Google's Veo 2 video generation model:

A screenshot of the Google VideoFX video generation tool

Here's the full output from running that command, after it first downloaded 38GB to ~/.cache/huggingface/hub/models--mlx-community--QVQ-72B-Preview-4bi:

==========
Image: ['pelicans-on-bicycles-veo2.jpg'] 

Prompt: <|im_start|>system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>
<|im_start|>user
describe this<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant

Alright, I've got this "Text to Video" tool to explore. It seems pretty advanced, allowing me to turn text descriptions into actual videos. The interface looks user-friendly, with a dark theme that's easy on the eyes. On the left side, there's a panel where I can input my text prompt. It already has an example filled in: "A pelican riding a bicycle along a coastal path overlooking a harbor."

That sounds interesting! I can see four different video thumbnails on the right side, each showing variations of this scene. One of them even has the pelican wearing a helmet, which adds a humorous touch. Another shows the pelican standing next to a bicycle, almost as if it's ready to hop on and start pedaling.

Below the text input, there are some style options like "photorealistic," "35mm film," "minimal," "sketchy," "handmade," and "abstract." These probably let me choose the visual style of the video. For now, I'll stick with the default settings.

At the bottom of the left panel, there are two buttons: "Start over" and "Create videos." I guess "Create videos" is the one I need to press to generate the videos based on my text prompt.

Let me think about this. I want to see what kind of videos this tool can produce based on my description. The example prompt is already filled in, so maybe I should just go ahead and click "Create videos" to see what happens.

Wait a minute, the prompt mentions a coastal path overlooking a harbor. In the thumbnails, I can see a background with water and boats, which matches the description. It's cool how the tool has interpreted the text into visual elements.

I should also pay attention to the "Visual Prompt" section, which says "View only." That probably means I can't edit it directly, but I can view the generated videos based on this prompt.

Hmm, maybe I should try changing the prompt to see different results. For example, instead of a pelican riding a bicycle, maybe I could describe a different scene. But for now, I'll stick with the given example to understand how it works.

After clicking "Create videos," I expect to see the videos being generated. There might be a loading process, depending on the complexity of the prompt and the tool's processing speed.

Once the videos are ready, I can preview them to see how well they match my description. If I'm not satisfied, I can always go back and adjust the prompt or try different style options.

Overall, this seems like a powerful tool for creating custom videos based on textual descriptions. It could be really useful for storytelling, presentations, or even just for fun, experimenting with different scenes and styles.

I think I'll spend some time exploring the different style options to see how they affect the final videos. For instance, selecting "35mm film" might give the videos a vintage, cinematic look, while "abstract" could produce more artistic and less realistic visuals.

Also, I wonder if there's a way to customize more specific aspects, like the camera angle, lighting, or movement. Maybe after generating the initial videos, there are additional editing options available.

In summary, I'm excited to use this Text to Video tool and see what kind of creative possibilities it opens up. With a bit of practice, I could probably create some impressive visual content just by describing what I want to see.

**Final Answer**

To proceed with generating videos based on the provided text prompt, I need to click on the "Create videos" button. This action will initiate the process of converting the textual description into visual content, resulting in multiple video thumbnails that can be previewed for further use or adjustments.
==========
Prompt: 0.870 tokens-per-sec
Generation: 7.694 tokens-per-sec

The license changed from Apache 2.0 to Qwen

When I wrote this post yesterday the LICENSE file in the Hugging Face repository was Apache 2.0. Just after midnight UTC on 25th December this commit landed updating the QVQ-72B-Preview license file to the Qwen license instead.

This looks to me like they were correcting a mistake, not changing their policy. The README.md for that repository has this block of YAML:

license: other
license_name: qwen

And commits to that README at one point linked to the Qwen2.5-72B-Instruct copy of the Qwen license.

The QwQ model repository continues to list Apache 2.0, which matches the YAML in its README as well.

So it looks to me like the intention is for QvQ and Qwen2.5-72B-Instruct to be Qwen licensed, while QwQ is Apache 2.0.

Tags: python, ai, generative-ai, llms, hugging-face, vision-llms, uv, qwen, mlx, inference-scaling

Read the whole story
samuel
94 days ago
reply
I wonder how large of an M4 Pro or M4 Max I’ll need to run this model locally. Visual reasoning locally means I can throw some interesting problems at it and run it all night. Find needles in a haystack of my own photo library, for instance.
Cambridge, Massachusetts
kglitchy
93 days ago
That's a great idea! What are some use cases for information you would want to gather?
samuel
93 days ago
I want to classify photos so I can more easily make year-end photo books
samuel
93 days ago
Also, I'm about to launch embeddings (for text) on NewsBlur (try it out at staging.newsblur.com - click on the Related buttons in feed titles and at the bottom of stories) and this sort of transformer based classifier would be hugely helpful
kglitchy
92 days ago
Great ideas and I like the related stories idea!
Belfong
92 days ago
reply
malaysia
Share this story
Delete

Coding Font Selection ‘Tournament’

1 Share

Via Jason Snell (back in October), who points first to this thread on Mastodon where a few of us posted about our preferences for the fonts we use for writing, and then describes this fun “tournament” from Typogram that lets you pick your favorite monospaced coding font from 32 choices. One limitation is that the only options are free fonts — some of my favorite monospaced fonts aren’t free and thus aren’t included (e.g. Consolas, Berkeley Mono, or Apple’s SF Mono). Another limitation is that some of the fonts in the tournament just plain suck. But it’s really pretty fun.

It’s also a good thing I procrastinated on linking to this for two months — it’s improved greatly in the weeks since Snell linked to it. The example code is now JavaScript, not CSS, which is a much better baseline for choosing a programming font. And there are some better font choices now.

I highly recommend you disable showing the font names while you play, to avoid any bias toward fonts you already think you have an opinion about. But no matter how many times I play, I always get the same winner: Adobe’s Source Code Pro. My second favorite in this tournament is IBM Plex Mono. The most conspicuous omission: Intel One Mono.

Read the whole story
Belfong
92 days ago
reply
malaysia
Share this story
Delete

A look at Mozilla’s rebrand. “Mozilla’s new logo is a flag symbol...

1 Comment
A look at Mozilla’s rebrand. “Mozilla’s new logo is a flag symbol built from the Mozilla M that comes to life to reveal the company’s iconic Tyrannosaurus Rex symbol and mascot, originally designed by Shepard Fairey.”

💬 Join the discussion on kottke.org

Read the whole story
Belfong
170 days ago
reply
I love it!
malaysia
Share this story
Delete
Next Page of Stories