dasyatid1: “delta prime” (delta prime)

While experimenting with SDL 3 and poking its pre-init simple GUI message box support, I found out that version 3.4.8, rather than doing the simplistic X thing that I remember it doing before, invokes Zenity on my Arch Linux system. (A skim of the SDL source code suggests that if there's no Zenity, it still falls back to legacy X primitives on X, but fails completely on Wayland.) Seems basically reasonable, right?

Unfortunately, Zenity 4.2.2 on this (somewhat old) machine has a bit of an issue with startup time:

% time zenity --error --title="Oops" --text="Uh oh."
 #  zenity --error --title="Oops" --text="Uh oh.": user=1195ms sys=152ms wall=3040ms mem=115k

That's a full second of CPU time just to show a dialog box, and it does feel like about that long of a delay between hitting Return and seeing the message on the screen. Definitely an interactive flow killer. Let's also see what ldd says about what it's loading:

% ldd /usr/bin/zenity |cut -d' ' -f1
	linux-vdso.so.1
	libadwaita-1.so.0
	libgtk-4.so.1
	libpango-1.0.so.0
	libgio-2.0.so.0
	libgobject-2.0.so.0
	libglib-2.0.so.0
	libgcc_s.so.1
	libc.so.6
	libfribidi.so.0
	libgraphene-1.0.so.0
	libappstream.so.5
	libm.so.6
	libgmodule-2.0.so.0
	libpangocairo-1.0.so.0
	libcairo.so.2
	libharfbuzz.so.0
	libharfbuzz-subset.so.0
	libcairo-gobject.so.2
	libfontconfig.so.1
	libgdk_pixbuf-2.0.so.0
	libepoxy.so.0
	libgstplay-1.0.so.0
	libgstvideo-1.0.so.0
	libgstreamer-1.0.so.0
	libgstgl-1.0.so.0
	libgstallocators-1.0.so.0
	libXi.so.6
	libX11.so.6
	libpangoft2-1.0.so.0
	libcloudproviders.so.0
	libtinysparql-3.0.so.0
	libvulkan.so.1
	libpng16.so.16
	libtiff.so.6
	libjpeg.so.8
	libxkbcommon.so.0
	libwayland-client.so.0
	libwayland-egl.so.1
	libXext.so.6
	libXcursor.so.1
	libXdamage.so.1
	libXfixes.so.3
	libXrandr.so.2
	libXinerama.so.1
	libcairo-script-interpreter.so.2
	libcups.so.2
	libcolord.so.2
	libthai.so.0
	libz.so.1
	libmount.so.1
	libffi.so.8
	libpcre2-8.so.0
	/lib64/ld-linux-x86-64.so.2
	libcurl.so.4
	libxmlb.so.2
	libxml2.so.16
	libfyaml.so.0
	libsystemd.so.0
	libzstd.so.1
	libstemmer.so.0
	libfreetype.so.6
	libXrender.so.1
	libxcb.so.1
	libxcb-render.so.0
	libxcb-shm.so.0
	libpixman-1.so.0
	libgraphite2.so.3
	libexpat.so.1
	libglycin-2.so.0
	libgsttag-1.0.so.0
	libgstpbutils-1.0.so.0
	libgstbase-1.0.so.0
	liborc-0.4.so.0
	libunwind.so.8
	libdw.so.1
	libEGL.so.1
	libGLX.so.0
	libwayland-cursor.so.0
	libX11-xcb.so.1
	libgudev-1.0.so.0
	libdrm.so.2
	libgbm.so.1
	libjson-glib-1.0.so.0
	libsqlite3.so.0
	libdeflate.so.0
	libjbig.so.2.1
	liblzma.so.5
	libwebp.so.7
	liblzo2.so.2
	libavahi-common.so.3
	libavahi-client.so.3
	libgnutls.so.30
	liblcms2.so.2
	libudev.so.1
	libdatrie.so.1
	libblkid.so.1
	libnghttp3.so.9
	libngtcp2_crypto_ossl.so.0
	libngtcp2.so.16
	libnghttp2.so.14
	libidn2.so.0
	libssh2.so.1
	libpsl.so.5
	libssl.so.3
	libcrypto.so.3
	libgssapi_krb5.so.2
	libbrotlidec.so.1
	libicuuc.so.78
	libbz2.so.1.0
	libXau.so.6
	libXdmcp.so.6
	libseccomp.so.2
	libgstaudio-1.0.so.0
	libelf.so.1
	libGLdispatch.so.0
	libsharpyuv.so.0
	libdbus-1.so.3
	libleancrypto.so.1
	libp11-kit.so.0
	libunistring.so.5
	libtasn1.so.6
	libhogweed.so.7
	libnettle.so.9
	libgmp.so.10
	libbrotlienc.so.1
	libkrb5.so.3
	libk5crypto.so.3
	libcom_err.so.2
	libkrb5support.so.0
	libkeyutils.so.1
	libresolv.so.2
	libbrotlicommon.so.1
	libicudata.so.78
	libstdc++.so.6

wc -l says that's 135 mappings. This is easy to overinterpret—code being mapped into memory doesn't mean it's doing anything other than more-or-less cheaply sitting there just in case, and timing zenity --help confirms that the initial CPU suck isn't coming from the dynamic loader—but it's a decent enough proxy metric for exploding coordination and integration complexity that it still makes me cringe some.

Running it under strace -f shows 43 underlying clone calls and two separate execs of bwrap along with an exec of glycin-svg… I assume the sandboxing is for the SVG loader, so is that being used to display the error icon? It doesn't happen if I run zenity --text-info instead, which doesn't use an icon, though the overall CPU usage remains similar, so that's not the bottleneck.

I briefly tried running it under ltrace, but it either dumped core or didn't produce any trace output depending on what filters I set; I assume it doesn't have enough information to reconstruct the library calls here.

Out of curiosity, I also tried timing Yad, which describes itself as a fork of Zenity and which notably seems to use GTK+ 3 instead of GTK+ 4. With Yad, the user CPU time for a basic dialog looks closer to 200 ms, which still feels a little iffy but is overall much more reasonable. Funnily (though not unexpectedly), the ldd output is 146 lines and strace shows 66 clone calls—but something somewhere in that stack is much more efficient.

I guess it makes sense to expect Zenity to be more mainline in availability than one of its forks, in an XDG-ish environment… that said, looking at the source for SDL_Zenity_ShowMessageBox, I also see:

/* https://gitlab.gnome.org/GNOME/zenity/-/commit/c686bdb1b45e95acf010efd9ca0c75527fbb4dea
 * This commit removed --icon-name without adding a deprecation notice.
 * We need to handle it gracefully, otherwise no message box will be shown.
 */

(The commit in question is from 2022, and a skim of the diff does say it changed "icon-name" to "icon" in a few key places.)

Hmm. Bleh. Gee, I dunno. I don't think I'm going to go further down this rabbit hole today, but this is all making me kind of sad.

dasyatid1: “delta prime” (Default)

A strange idea came to me today while thinking about garbage collector design. On priors, probably not original, but I don't remember seeing it before, so!

There's a couple of usual mechanisms for allowing the mutator to hook object reclamation in GCs. Finalizers are a traditional one, but they don't allow the mutator to interact with their timing easily, sometimes leading to exciting concurrency issues, and object resurrection during finalization seems to be a source of exciting complications in both GC and application design. Java has been on the path to deprecating Object.finalize for a while now.

(I'll refer to the procedure that's meant to execute after an object becomes unreachable as the “hook”, below, to make it clearer that it's not always a finalizer in the above narrower sense.)

Then there's guardians, such as the ones in Guile; will executors in Racket seem similar. These accumulate registered objects that would otherwise have been collected, then allow the mutator to dequeue them for cleanup. This avoids the synchronization issues of finalizers, but they interact weirdly with weak references, and resurrection still feels thorny to me; Andy Wingo has written about some related issues.

Things get simpler if you don't allow the hook access to the original object. The most recent place I've run into this personally is in SBCL's sb-ext:finalize, whose documentation warns that you shouldn't close over the object in the hook closure or else it will never get collected. Similarly, in Java, phantom references are now promoted over finalizers; they use a queueing mechanism similar to guardians but can't be dereferenced. The Cleaner class abstracts over this to provide an interface that handles the queue processing in the background; it too warns to not close over the original object in the hook.

But the problem with not being able to get access to the original object is that it's common to need some of the information from it in the hook. For instance, finalization of GC objects is often used to backstop cleanup of foreign resources, but the hook needs the underlying resource handle for that—and if the object also provides explicit cleanup, then the hook needs to know whether it's been done already or not, so duplicating the shared info once isn't enough. The main way I know of to handle this is to box the shared info in a sub-object that the hook closes over, but the most straightforward way to do that also results in extra indirection during normal access. Keeping two copies of the info and only propagating writes avoids indirection on read but is more awkward to implement correctly. The Cleanable objects that Cleaner gives back above promise that they'll only run once whether explicitly or implicitly called, which helps a lot with the ergonomics of the easier cases (you trigger the same path for implicit and explicit cleanup) but still involves jumping through hoops.

So: what if, instead, the GC met the mutator halfway? Rather than either allowing access to the original object during finalization (with attendant complexities around reachability and resurrection) or disallowing it completely (with attendant difficulties around sharing state with the hook), you'd allocate a corpse object at registration time, which the mutator never sees while the original object is live. At finalization time, the GC would automatically copy info from the now-dead object into the corpse and pass that to the hook, while not allowing the original object pointer to escape.

Choosing what to copy could be done in a few ways. I like the idea of being able to designate any subset of the slots (maybe a static subset per concrete type if that makes it easier to synthesize the corpse type), but that could be pretty hairy to provide depending on the details. Simpler alternatives would be to copy everything (but that's probably more than needed) or extract a single slot (but the application has to degrade to the sub-object case above if the data is more complicated). In the case of a single source slot that's a pointer or can be converted into a tagged immediate, you could just reuse the slot where you'd have stored the corpse reference and avoid taking an extra allocation.

What have I reinvented? And is this useful? Or is it just too complicated or otherwise inferior to current practice?

Addendum like half an hour later (sigh): aha, I've found a problem I didn't think of the first time! If any of the copied slots are heap references, they could still create resurrection issues indirectly, so you still have to make sure the sequence of operations of the collector doesn't trip over that. Phooey. That's not so bad I guess—you could queue all the copied stuff to be marked along the way, and I think maybe it'd mean that ‘good’ acyclic cases can avoid triggering weird weak reference interactions or requiring extra collection cycles as compared to the guardian/will-executor approach where the original pointer always escapes. You could check for the ‘bad’ case and raise a ruckus, or try to exclude it a priori using static type restrictions, but those could both get awkward.

dasyatid1: “delta prime” (Default)

(This is also posted on the fediverse at mastodon.online.)

From the catalog of thoughts I had hazy intentions of actually researching and writing up in detail, and then didn't, and sat on it for like five years or whatever—so, screw it, have the sloppy version: a certain perspective on how the silo model of per-app data wound up predominating in the consumer sphere.

(Probably it's been done before, but I don't know where. We still don't have good concept-shape reference search. Maybe someone who reads this will know?)

In UX, it seems common wisdom that your design should try to follow the user's model of the interaction. What do they expect to happen? What meshes with their approach and habits? The deeper version of this (at least from my peanut-gallery perspective) was involved in the focus shift from “UI” to “UX”: if the machine model is too at odds with the user model, remake the application to fit the user, rather than dictating from the manual and expecting the user to comply.

Some of you probably remember growing up with PCs around the time they started coming into the mainstream, and becoming the de facto tech support hub for less computing-oriented family members. So you might remember this conversation:

“But where did you save the letter?”

“In Word.”

Among other things, user-managed storage and lower prevalence of networking made this less practical as a machine model where uses spanned computers. Lower-capacity physical media made user management of storage volumes more pressing. (How many diskettes does that come on?) And, a bit further down the line, lack of sophisticated search infrastructure made user organization of files more immediately beneficial.

With subsidized datacenter storage and an assumption of high-speed Internet access everywhere?

Observe what the default user model was, back in the 1990s.

Solve for the equilibrium.

dasyatid1: “delta prime” (Default)

Welcome to Cubetown

For somewhat ridiculous reasons which will for now remain unspecified, I recently wound up thinking about how one would neatly represent the rotation group of the cube (also known as chiral octahedral symmetry) in software, primarily for the purpose of applying them to 3D geometry. For background, in the general realm of 3D, one can represent arbitrary rotations as linear transformations using 3×3 matrices (or 4×4 ones, in 3DH coordinates), which can be applied to vectors easily through matrix–vector multiplication and can be easily composed with other linear transformations (or projective ones, in 3DH). For rotations specifically, one can also reach for unit quaternions which are more compact and numerically stable for manipulating in themselves. So there we see two different breadths of representation, from any 3D linear transformation down to the rotation group of the sphere—but what about going down to the rotation group of the cube, specifically?

Unlike the conceptually continuous rotation group of the sphere, which thus encourages the use of floating-point numbers to model its elements, the rotation group of the cube is finite, consisting of 24 discrete possibilities. Stack Exchange is full of examples of ways to think about the derivation thereof, of which I find the “choose one of six faces to point up, then choose one of its four edges to be the north one” approach the most geometrically intuitive to imagine. (I wasn't aware until very recently of the isomorphism with the permutation group of four elements, in fact, or the relationship of that to the four long diagonals of the cube!)

Swizzles and sign bits

So what approach do we want to base our representation on? Since the primary use here is transforming 3D vectors, we can start by looking at the 3D rotation matrices for quarter-turns around each axis, since any cube rotation can be composed of a sequence of those. Each one has the effect of swapping the two other axes and negating one of them in the process; for instance, in a right-handed coordinate system, a quarter-turn anticlockwise rotation about the positive Z axis takes (x, y, z) to (−y, x, z). From this we can also observe that the result of a cube rotation can be represented as a permutation of the axes followed by a negation of some subset of them.

There are 6 (= 3!) permutations of the axes and 8 (= 2^3) possible subsets of them to negate. Multiplied together, that gives 48 possibilities, which is twice as many as we want. In fact, this corresponds to the full symmetry group of the cube, allowing both rotation and reflection operations. So how do we constrain that to only the rotations? Looking again at the effect of a quarter-turn about an axis, we find that each quarter-turn simultaneously alters the number of negations by 1 (by turning one negative axis positive or vice versa) and flips the parity of the permutation between even and odd (by doing a single transposition). The identity rotation has zero negations and the (even) identity permutation, so an even permutation must be paired with an even number of negations, and an odd permutation must be paired with an odd number of negations, neatly halving the number of valid transformations and giving us an easy way to tell which are which.

Since we're in discrete territory and aiming at a software use, let's get out our unsigned integers. Among the six permutations of the axes, there's one even and one odd permutation for each initial element, so we can map those to even and odd numbers—equivalent to using the low bit for the parity—and let the next two bits represent the initial element between 0 and 2, giving us the numbering {0=XYZ, 1=XZY, 2=YZX, 3=YXZ, 4=ZXY, 5=ZYX}. Meanwhile, subsets of an ordered finite set can be conveniently numbered in binary as bitmasks, so we could let the negation mask be a number from 0 to 7 with bits 0, 1, and 2 set if the X, Y, and Z axes respectively are negated after the permutation—but due to the no-reflections constraint above, we only need to know the signs of two of the axes to reconstruct the sign of the third based on the permutation parity, so we drop the Z bit and keep only X and Y.

Putting those together, we have the following bitfield-ish representation:

  • Bits 4–3: initial axis of the permutation (0=X, 1=Y, 2=Z)
  • Bit 2: parity of the permutation (0=even, 1=odd)
  • Bit 1: set if the Y axis is negated after permutation
  • Bit 0: set if the X axis is negated after permutation

… and that forms a dense numbering from 0 to 23!

What do the operations look like?

For transforming 3D vectors, it seems more convenient to have the expanded permutation already available as easily addressable elements, and similarly an expanded +1/−1 representation of the signs. In the environment I was originally thinking of this in, the type system only allowed ‘heavyweight’ objects for user-defined types to begin with, so having six extra byte fields in each one seemed fine, with the instances themselves constructed at static init time and stashed in a 24-element array. To get the expanded permutation from an index, you only need the second and third elements since you already have the first, and 4 bits × 6 permutations gives us a 24-bit lookup table that fits in an immediate constant, which feels more sympathetic to the CPU than trying to do axis-number arithmetic modulo 3. The sign bit for the post-permutation Z axis can also be reconstructed as the XOR of the low three bits of the index.

Given a dense numbering, composition and inversion seem like reasonable places for lookup tables. Doing it the easy way, a 24×24 lookup table of uint8 is 576 bytes, but since the indices are only 5 bits long, you could alternatively pack 6 of them to a uint32 or 12 of them to a uint64, for 16 bytes per row, and drop that down to 16×24 = 384 bytes. If you were willing to reconstruct the parity bit as the XOR of the parities of the two input rotations, you could pack eight 4-bit elements per uint32 and lower that to 12×24 = 288 bytes, but the position of the parity bit makes the bit-juggling really awkward. The inversion table only has one row but is otherwise similar.

Fetching the rotation of an angle about an axis also seems like a reasonable place for a tiny lookup table: 4 angles × 3 axes × 5 bits = 60 bits total. An angle expressed as a number of quarter-turns is easily normalized by masking, and inverting the reference direction of the axis or switching between clockwise and anticlockwise both just negate the angle.

Calculating the mapping of a face, given a face index from 0 to 5 where the low bit is the sign and the upper two bits are the axis, is a lookup in the expanded permutation for the new axis and an XOR of the sign bits; depending on whether you store the forward or reverse permutation and whether you want the forward or reverse mapping of the face, you might need to invert the rotation first.

Even though it mostly doesn't map meaningfully for the operations we're performing, I also just kind of love that the identity rotation is represented as 0.

Further directions

(or, that's what I'd call this section if I were trying to pretty it up for a journal)

There's a bunch of obvious micro-variations on this idea that I haven't really explored. Moving the parity bit to the bottom would make using a 4-bit-per-element lookup table for the composition operator easier, but looking up the permutation would become harder because its index would be split up; that could be an improvement overall, especially if you're keeping the pre-expanded objects around per above, but it maybe loses some elegance along the way. Keeping the Z sign instead of the parity is another alternative, which to me feels worse at a glance, but I could see other people having different intuitions there. Whether to index on the forward or reverse permutation of the axes makes a difference in the accesses when transforming vectors, and if you have hardware support for swizzling that wants a certain format, then maybe you want to align with that. Same thing for whether to model the negations as applied before or after the permutation.

In general, if I wanted to know performance differences between micro-level implementation variations, I'd want to actually measure, which I haven't. My baseline guess would be that if you're also doing other 3D geometry manipulation, then most of the stuff in here is cheap enough that it gets overwhelmed, and you might just fuse a given rotation into your existing matrix math before the expensive part happens elsewhere anyway, but some use cases might care.

But mostly I just went “goodness, that's a nice dense numbering for those that has a bunch of cute properties in terms of what you can do with binary integer manipulation”. So there it is. If you've seen it before, please do tell me where! I expect most of my ideas to not be original in an absolute sense, after all…

dasyatid1: “delta prime” (Default)

I find myself wanting an offline traditional word processor. For some documents where other people are involved I've been using Google Docs's word processor, and I actually quite like it from an interface standpoint—but I'd really rather not be locked into Google for other local documents. The most important thing this hinges on seems to be document style-set support, which is also congruent with how I write HTML sometimes. In general I care about very solid support for WYSIWYG rich text here, including HTML-like structure, rich inline styles, and proportional typesetting, which mostly excludes my otherwise-lovely Emacs from being suitable unless there's a really good mode I don't know about.

Things I've tried, all on my Arch Linux desktop:

LibreOffice Writer has horrifically distracting UI flickering when I move my cursor over the formatting toolbar. Toggling the hardware acceleration option in Tools → Options → LibreOffice → View does nothing. I think the schema on the style support looks workable, but when I open the Styles and Formatting sidebar, that flickers too, and it's actually bad enough that I can't find out. It looks like bug #112889 aka “Flickering-UI” might be related, but the dependencies of it (which seem to be sub-bugs / independent reports) often talk about this only happening when OpenGL rendering is off, and if that's the same as the hardware acceleration toggle then that doesn't fix it for me.

In Calligra Words, when I open the style manager, the zillion bibliography and contents styles cluttering things are a relatively minor barrier, but all the styles mysteriously displaying with a parent style of “Alphabetical List” and a next-paragraph style of whatever the dialogue selected first is not confidence-inspiring. Spacing is in centimeters, which in most contexts I like, but I actually do typography in points and inches and I care about matching base line spacing—oh, wait, I just saw in the “Customize” dialog that it's configurable, but it doesn't take its default from my LC_MEASUREMENT nor LC_PAPER settings? Ergh. Also the font selection is pretty hard to navigate, which doesn't seem uncommon for desktop FDO/Linux applications, but still ergh.

Things I haven't fully tried yet:

AbiWord seems like it might actually be workable, though the cluttered nested dialogs for style editing are awkward. This is kind of a surprise to me, because I'd expected AbiWord to get the least development support over the last decade or so. Or maybe that's why it's like that?

Some sites suggest WPS Writer which I hadn't previously heard of… proprietary from a vendor of unclear trust.

I was hoping an HTML editor like Bluefish might work, but at a glance it doesn't seem to be WYSIWYG.

Any thoughts on how to best support this use case? I'd almost be tempted to go Microsoft, but last I checked they might be expensive and have gone cloudpusher too and in any case won't work well on FDO/Linux. Proprietary software is dispreferred but potentially okay.

dasyatid1: “delta prime” (Default)

“We shouldn't have to use this extra binding, but CFFI's setf expander for mem-ref does a get-setf-expansion on the pointer argument to mem-ref. If we quote the pointer value, Clozure CL returns a bogus expansion whose getter evaluates to a quoted gensym, presumably because it's defaulting to treating the compound form as though it were a function call and trying to reproduce the “call” with its “argument” evaluated once. If we don't quote the pointer value, then it's not valid to pass the pointer to get-setf-expansion in the first place, I think, so it more rightfully bombs. This seems to imply that this is both a CFFI bug and a Clozure bug, if I'm reading the HyperSpec right.”

I might expand on this in another post later. SBCL actually does the same thing, and I'm not sure if it's required as an weird interaction in the spec, even. Hmm.

dasyatid1: “delta prime” (delta prime)

In the current function body of com.dasyatidae.sidereal.sdl2-cffi:make-basic-pixel-format: “There's a bit of a clash here. The way the values are defined in the header file use bitfields like this, but not all combinations that you'd expect to be valid work. In particular, there's the endianness-dependent SDL_PIXELFORMAT_RGBA32 and its companions. You'd think from the name that this maps to :array-u8 :rgba, but in fact they're defined within the fuller enum as packed types within a uint32, and functions like SDL_PixelFormatEnumToMasks don't recognize a synthesized array version even though they work on SDL_PIXELFORMAT_RGB24 which is defined as an array pixel format. But it'd be horrible to replicate the entire valid list, too. Bummer.”

Really it may yet just be more authentic to replicate the entire valid list, but…

dasyatid1: “delta prime” (delta prime)

As part of trying to regain some active affinity for my field (that having been in a difficult place lately for various reasons), I've been poking around at seeing if I can access SDL 2 from Common Lisp. The closest I've seen thus far is Gary Hollis's cl-sdl2 repository, but it's quite incomplete and has no license attached, and its design as far as putting Lispier interfaces on top is far enough from what I'd be looking for that I decided to explore building on CFFI myself instead.

One thing I've run into while translating declarations from the C header files is audio callbacks. Traditionally, SDL runs its audio processing on its own thread (or an unspecified thread, at least) and uses a callback-based approach similar to JACK or (from what I am told) Apple's Core Audio. The application is given an empty or full buffer for playback or capture respectively and is expected to provide or consume the audio data at the pleasure of the audio clock.

Now, like many current Lisp users, I'm running SBCL, and while CFFI-on-SBCL supports defcallback forms for making C-callable function pointers on many platforms, unfortunately, §13.9 of the SBCL manual states that running Lisp code is only supported on threads started from Lisp. So it's fine for synchronous callbacks, but we can't use a CFFI callback for the audio callback in that context. What alternatives come to mind?

  1. Use the newer SDL_QueueAudio and SDL_DequeueAudio functions that are intended to allow applications to do read/write style audio I/O from the main thread. This would be the most straightforward, but it feels instinctively iffy to me from an audio processing standpoint, and specifically I don't see a great way to get high-water/low-water notifications into the SDL event loop; the closest thing seems to be polling SDL_GetQueuedAudioSize.

  2. Write a callback in C and have it signal a Lisp-created thread with the buffer pointer, which does the audio processing and then signals back. There are two big extra complexity sources here. One is portable synchronization, though we could potentially use the SDL threading functions themselves for that. The other is getting the callback into the running process in the first place; bringing in C build system considerations is the opposite of what I was hoping for with CFFI bindings. A variation of this would be to make an audio queue similar to (1) but add richer main-loop synchronization using SDL_PushEvent.

  3. Switch to a CL implementation that allows calling into Lisp from foreign threads. This also has to be in a way that doesn't involve intense setup/teardown overhead or having to do something from the foreign thread prior, since I don't control the actual thread arrangement. It looks like Clozure CL might support this; the manual mentions “when native threads that aren't created by Clozure CL first call into lisp”, which implies that it's possible. ECL also seems like it might allow this, but its FFI and performance characteristics were somewhat weaker last I recall, and the manual didn't obviously state outright whether it would work. I vaguely recall having tried and failed to get Clozure CL to install on my machine before, so that could be its own rathole.

  4. Same as (2) except rather than writing in C, I do jiggery pokery to assemble a small machine-code shunt from Lisp and use that as the callback function. This avoids the C build system issue at the cost of being absolutely wacko and also distinctly nonportable and close to unmaintainable—maybe less so if there's a GNU Lightning equivalent as a CL library somewhere?—but then syscalls, and then it all flies apart. Let's not do this (unless I decide it would make a good joke).

  5. Give up on doing the audio processing in Lisp, leaving it with the role of the ‘scripting’ language directing a GStreamer effect graph or whatever. Awkward—it would be beneficial to pull in foreign audio library code in a more general sense (SDL_mixer is a common basic thing to get started with, and then a number of games use FMOD or the like), but losing most-to-all ability to deal with the sample data would be a pain for some use cases, and now we've potentially recreated the “how does the other code communicate with Lisp re the high-level disposition of playback/capture” on another level.

  6. Give up and switch to C++. Not that I haven't done synthesis code in C++ before, but that's not what I'm after here—I'd lose my REPL, and take on heavyweight template juggling and…

  7. Same as (6) but extend the implementation of the C++ language with all the Lisp stuff I want. Problem: then I fall into the dark void shockwave and never come out, and also I might as well design my dream language while I'm doing something like that, which idea leads to perhaps thrice as much falling into the dark void shockwave and never coming out. But flip that around, and does that idea just point in the direction of Clasp? I've never tried it before, but that might be another candidate to look at for (3) that would give me much easier C++ access along the way—which can be important in the gameish/interactive-media sort of space. One big disadvantage here would be that it doesn't currently support Windows.

I'm not sure what'll turn out to be best yet, but seeing if I can get Clozure CL to run seems like a reasonable next thing to try…

dasyatid1: “delta prime” (delta prime)

I think I have just discovered that Alembic (the SQLAlchemy migration tool) has no facility for writing out offline SQL files for merge revisions, and I'm not sure why.

Looking at the code for my installed 1.8.1 version, the underlying “iterate up to the target from multiple heads” functionality seems like it should work just as well as it does in online mode, but the types on the way to the public interface and CLI just have no way of stuffing multiple revision IDs into the from-side of the range.

Specifically, MigrationContext.run_migrations calls get_current_heads for its starting revision set, which supports hydra revision sets in its return type and will happily return them when called on an online database in a hydra migration state. But when get_current_heads is called in offline mode, it reads the _start_from_rev attribute, which is declared elsewhere as Optional[str]—and the promising-looking util.to_list call in there doesn't have any string-splitting in it, sadly. In theory, if we can get a tuple into _start_from_rev we should be golden… that comes from opts, which is ultimately passed through via EnvironmentContext.configure

The argument to that is typed as Optional[str] as well, but it looks like maybe if I could smuggle it in through the configuration context and unpack it in a custom env.py it might sort of maybe work—but out-of-tree jiggery-pokery in violation of the declared type (much less reaching in and setting the attribute directly or something) seems risky.

So assuming I don't want to do that, I need to do at least one of giving up on the offline-primary approach I wanted and just taking a runtime dep on Alembic (presumed workable, just irritating for Reasons), enforcing linear history (also probably workable, differently irritating), or poking the Alembic discussion board until something interesting pops out.

For that third option, I'll need to find the fob for my GitHub account, which I'm pretty sure is somewhere under these piles of stuff on my desk…

dasyatid1: “delta prime” (Default)

On a tiny Fedora server, judging by the journalctl output and audit2allow:

allow ejabberd_t self:process execmem;
allow ejabberd_t tmpfs_t:file write;

I know BEAM's recently gotten a JIT, so I expect that explains the first one. I'm not sure why the second wasn't already allowed by a broader policy cluster.

dasyatid1: “delta prime” (Default)

So I shifted my Arch Linux desktop system into The Future™ to use PipeWire instead of PulseAudio. And getting my X bell working again turned out to be a bit… off?

So I wrote xbell-ringer to do what I want. And in case what you want is similar enough to what I want, version 0.1.1 is now out publicly, with its tarball available from the tags download page. See the README for more details, but basically it runs as a systemd user service and hooks up the XKB bell to XDG Sound Theme events.

A question I haven't investigated yet: does it make any sense to try to get something like this upstreamed for PipeWire? I'm not sure of the original motivation for the PipeWire X11 bell module to be using libcanberra the way it does, and having all the XSETTINGS parsing in there loaded into the same address space as the audio daemon also sounds like a hairy proposition. That's why I wrote this one to be its own bridge process, but I'm guessing users who are used to tinkering with their desktop audio daemon config might expect it to be available from that vantage point?

Anyway, share and enjoy! Feel free to give feedback in comments and report any issues either there or in the Bitbucket issue tracker.

dasyatid1: “delta prime” (Default)

Happy GNU Year! I spent New Year's Eve making an Emacs theme for silly reasons. Permutation Tango is yet another of those dark themes that leans on the Tango palette. (Is it weird that people keep using it as a general-purpose palette when it was originally meant to be for an icon theme? I'm not enough of a visual artist to know.)

Then I discovered the Modus themes, which might do what I was hoping for except much better because a lot more useful work went into them, and which will be included in Emacs 28. Oh well. I do wish theme discoverability for Emacs were better—I now have a copy of the emacsthemes.com gallery on disk to maybe poke around with…

Anyway, share and enjoy if you like!

dasyatid1: “delta prime” (delta prime)

I have a tiny Fedora server which I use for some little services, including an ejabberd instance for chat. (This used to be more useful before broad XMPP adoption started drifting away, but still.) The management of it is still pretty ad-hoc after all this time.

I decided I finally wanted to get HTTP file upload tokens working in ejabberd. I'm also using acme-tiny for obtaining TLS certificates—ejabberd has some builtin ACME support nowadays too, but I'd rather keep my ACME certificates controlled in one place and not take on the complexity of nginx→ejabberd proxying for port 80.

This turned out to be a bit more of a saga than I expected.

ejabberd configuration

First I made sure the ejabberd_http entry in the listen list included a request_handlers entry pointing to mod_http_upload…

listen:
  # ...
  - 
    port: 5243
    ip: "::"
    module: ejabberd_http
    tls: true
    request_handlers:
      # ...
      "/stash": mod_http_upload

… and adjusted the mod_http_upload configuration itself to match:

modules:
  # ...
  mod_http_upload:
    host: "up.jabber.s.@HOST@"
    docroot: "@HOME@/upload/@HOST@"
    put_url: "https://up.jabber.s.@HOST@:5243/stash/"
    thumbnail: false
    max_size: 104857600

The default configuration prefixes the vhost with “upload”, but I wanted something more XMPP-specific per above.

Then I added the private keys and certificates:

certfiles:
  # ...
  - "/etc/pki/tls/private/up_jabber_s_dasyatidae_com.key"
  - "/var/lib/acme/certs/up_jabber_s_dasyatidae_com.crt"

SELinux adjustment

Unfortunately, it turns out ejabberd can't access the acme-tiny output, spitting out a log entry of “Path /var/lib/acme/certs/up_jabber_s_dasyatidae_com.crt is empty, please make sure ejabberd has sufficient rights to read it” instead. Why? Well, looking at the audit log:

type=AVC msg=audit([redacted]): avc: denied { getattr } for pid=[redacted] comm="10_dirty_io_sch" path="/var/lib/acme/certs/up_jabber_s_dasyatidae_com.crt" dev="[redacted]" ino=[redacted] scontext=system_u:system_r:ejabberd_t:s0 tcontext=system_u:object_r:var_lib_t:s0 tclass=file permissive=0

(“dirty_io_sch”, by the way, likely refers to Erlang's ‘dirty schedulers’ for handling long-running native code: see the docs for erl_nif or for the +SDio emulator flag if you want to learn about those.)

Ah, right. So the acme-tiny package doesn't define its own SELinux enforcement. The certificates currently in /etc/pki/tls/certs are of type cert_t, but the ones placed into /var/lib/acme/certs are of the base type var_lib_t. How do we handle this?

One option would be to adjust the file contexts for /var/lib/acme/certs. At a glance, it would seem to make sense for them to be cert_t. Something like that seems more sensible in the long run, but it's harder to analyze in the short run, both regarding whether acme-tiny will run into problems accessing those files as is (though I could manually analyze this by tracing through the systemd unit files and such) and whether it might conflict with later updates to the package.

(Update : it looks like the Pagure page for acme-tiny agrees that adjusting the file contexts should work, so now I do intend to switch to that. At the time I needed to get the functionality up quickly with minimal risk of breaking acme-tiny.)

I do keep SELinux in enforcing mode on this server, but based on the way Fedora configures things, I treat it as a hardening layer, not as a primary access control layer. So I'm okay with opening a somewhat wider hole in the policy than is strictly necessary until and unless I find a better longer-term answer for acme-tiny. And as it happened, I already had an ejabberd_local SELinux module for handling some other accesses that ejabberd's optional modules might need which weren't covered by the default policy. So let's just punch a larger hole for now…

module ejabberd_local 1.1;
require {
        type ejabberd_t;
        type var_lib_t; # for acme-tiny, sigh
        class file { getattr open read };
        # ...
}

# ...
allow ejabberd_t var_lib_t:file { getattr open read };

One installation later:

# I should really have a script for this...
# Wasn't there a Makefile somewhere that was supposed to help?
% checkmodule -m -o ejabberd_local.mod ejabberd_local.te
% semodule_package -o ejabberd_local.pp -m ejabberd_local.mod
% semodule -i ejabberd_local.pp

ejabberd can now read the certificates, and accessing the appropriate port from my browser results in valid HTTPS.

ejabberd ACL debugging

Unfortunately, it seems like I still can't upload any files as a local XMPP user. Gajim complains of “access denied by service policy” and the XML console indeed shows an appropriate forbidden element. The default is supposed to be to allow all local users, so what gives?

Maybe an explicit “access: local” in the mod_http_upload configuration block will help? Nope.

At this point I decided to go look at the ejabberd code. (At the time I looked elsewhere for unrelated reasons, but I confirmed later with the source from Fedora 34.) Omitting the output:

% dnf download --source ejabberd
% rpm2cpio ejabberd-20.07-3.fc34.src.rpm|cpio -i
% tar -zxvf ejabberd-20.07.tar.gz

In src/mod_http_upload.erl, process_slot_request/6 calls acl:match_rule/3, and then src/acl.erl shows the guts of ACL processing. Setting the log level to 5 (debug) reveals a bit more about the internal messages being passed around. So then let's see what this looks like from the Erlang shell:

% sudo -u ejabberd ejabberdctl debug
... (skipping the warning and also suppressing the input numbering below) ...
Erlang/OTP 23 [erts-11.2.2.5] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:1] [hipe]

Eshell V11.2.2.5  (abort with ^G)
(ejabberd@localhost)> acl:match_rule(<<"dasyatidae.com">>, local, {jid, <<"rtt">>, <<"dasyatidae.com">>, <<"some-resource">>, <<"rtt">>, <<"dasyatidae.com">>, <<"some-resource">>}).
deny

Whoops. Why? As it turns out, it looks like the local access rule only works in global context:

(ejabberd@localhost)> acl:match_rule(global, local, {jid, <<"rtt">>, <<"dasyatidae.com">>, <<"some-resource">>, <<"rtt">>, <<"dasyatidae.com">>, <<"some-resource">>}).
allow

From the mod_http_upload source, it looks like it's calling this in a vhost-specific context. read_acl in the acl module seems to be the key indirection here, which itself is just an ETS lookup. And indeed:

(ejabberd@localhost)> ets:lookup(acl, {local, global}).
[{{local,global},
  [{user_regexp,{re_pattern,0,1,0,
                            <<69,82,67,80,71,0,0,0,0,8,0,0,1,128,0,0,255,255,...>>}}]}]
(ejabberd@localhost)> ets:lookup(acl, {local, <<"dasyatidae.com">>}).
[]

This seems like it might be a bug somewhere; surely the ACLs are supposed to fall back to the globally defined ones if a vhost-specific one isn't available? In any case, let's try working around it for now by defining vhost-specific ACLs:

host_config:
  "dasyatidae.com":
    # ...
    acl:
      local2:
        user_regexp:
          "": "dasyatidae.com"
# ...
access_rules:
  # ...
  ## Work around weirdness with not checking 'local' ACL globally in some contexts?
  local2:
    - allow: local2

Really this seems closer to what's wanted anyway—a local user at a vhost should have access to its vhost's instances of file uploads and other auxiliary stuff, but not necessarily other vhosts'… from a brief skim through acl.erl it doesn't seem like that's ‘naturally’ supported, but I could easily be wrong.

It works!

And with that, it works. Pushing the attach-file button in Gajim and feeding it an image successfully executes the upload and spits out an ‘aesgcm’ URI which… appears to be based on an HTTPS URI but including an encryption key. A brief Web search reveals XEP-0454, which tries to integrate OMEMO with HTTP File Upload without revealing file content plaintext to the server. The file is being encrypted with a transient key before upload, and then the transient key is being sent over the OMEMO secure channel along with the file reference. Obligatory objection here to jamming crufties into the URI scheme, but Conversations on my handset also retrieves the image and displays it inline, so it seems like at least some interop works. Also, that it still leaks the filename gives me echos of PGP leaking subject lines… sigh.

Future notes to self:

  • Is the ACL weirdness a bug in ejabberd, or is my configuration borked? This ejabberd instance has been live for quite a long time, and I wouldn't be surprised if there were some cruft in the YAML file that's no longer the recommended WTDI.
  • Is acme-tiny packaged weirdly or incorrectly in Fedora given that it doesn't integrate cert_t or seemingly have any SELinux support at all? (ls -lZ /usr/sbin/acme_tiny says it's just a bin_t too, so I presume it has no SELinux integration.)
  • Where the heck did that script for local policy module integration go? I'm sure I had one before.
  • Where do I eventually want to go with this management-wise? The mainstream nowadays feels like it's all containers, which has its own fountains of complexity, fragmentation, and centralization in the actual production lines for them. I can't say I'm actually fond of the LSB/FHS style in organization, though—besides which, traditional multi-user with globally-installed packages doesn't map so well to dominant use patterns for servers now, and anywhere between pure-manual and Ansible-style is too clumsily stateful to feel clean. DJB-style installation sadly doesn't have much adoption either. At some point I want to see if Nix is any better…
dasyatid1: “delta prime” (Default)
fflush(stdout);
return ferror(stdout)? 1 : 0;

(Yes, many C programs don't check the error flag on output streams, likely including some of mine… sigh.)

Powered by Dreamwidth Studios