Podcast ep. 002 - Paul Davis on the deep rewrite of Ardour
This is the first part of a long interview with Paul Davis, founder and lead developer of Ardour, free/libre digital audio workstation.
Hello Paul!
Hi Alexandre, how are you?
I’m fine. How are you doing?
Great, thank you!
So you recently moved to New Mexico, is what I heard…
Yeah. It’s very different from where I used to live. But enjoying a lot so far. Unfortunately the mess you can see behind me is an indication I’ve only been here a year and unpacking all this stuff hasn’t been a very high priority (laughs).
Do you get out much these days?
Not as much as I was a few months ago. I mean, we live in a remote isolated place, so the virus is not really having a lot of impact here. But I’m not really running or cycling as much as I was.
On the other hand we have a we have a fairly large property here, and I’m building solar panels and doing gardening, and so I’m spending a lot of time outside.
So you are going all green? :)
Yeah. I’m hoping the panels I put in mean that we will be net zero for electricity next year. Not sure that will work out but that’s the goal.
Great, so I think I’m probably going to do everything wrong and start with a question that is unrelated to the new release. A few months ago, there was a very insightful thread in Ardour’s forum about extensibility of digital audio workstations. If I was foolish enough to try summing it up in a few words, I guess that would be your concern, whether making Ardour’s source code open and free to modify and distribute really benefits the vast majority of users in a way they can appreciate. The example that you used was Reaper, a non-libre commercial DAW that is so much extensible that people can make very clever hacks without ever having to touch the core source code. Which they don’t have access to anyway! Now that you’ve had some feedback, what would you say is your takeaway from this discussion?
I think the response was quite interesting. There were some good points made by people coming from different perspectives with different answers to the questions I was asking.
I don’t think the discussion really changed my mind a lot or moved me closer to having a definite answer though. I think your summary is concise and correct. I think you know the heart of it really came from a conversation with someone who’s very familiar with Reaper, who wanted to do various things with Ardour. People can get mildly upset that you have to do it with the source code rather than just having some script.
The conversation sort of pointed out the merits of both sides. The open-source side is really good and if you can have incredibly expensive powerful scripting capabilities — that’s really good too. And if you can have both of them available in the in the same project, then you just got the best of all possible worlds.
Unfortunately I think that although I have never spoken to anyone who’s involved in Reaper about this, I suspect that their decision to have the scripting be really deep and really powerful was something that they decided to do really quite early on in the program’s history. I don’t know if it was in the very beginning because I’m not even sure Lua was an option when they began. But I think relatively early they made the decision to do that and I’m not so certain that retrofitting the level of scripting that they have into a project that didn’t make that decision early… I think that might be very difficult.
So what the future holds for that question, I don’t really know right now. I know I would like to see Ardour having even more powerful scripting than we are already doing. And what we have is pretty powerful. But exactly how we would do that and what that really means — I’m not sure.
But would you stop at continuing to integrate Lua? What I see is that Robin Gareus continues expanding the coverage of API for Lua. So would you stop at Lua or maybe you would try to introduce support for FAUST scripts, for example? Because there was another interesting discussion at cdm.link, if you remember, with Artemio Pavlov of Sinevibes. And you made a very interesting point that while Korg’s SDK is kind of fun to use and easy to deploy, it’s not as powerful as FAUST. So do you see any future for FAUST in Ardour?
I haven’t really thought too much about direct integration of FAUST. I tend to think of FAUST as a language to write DSP in. So in my conception of the role that FAUST plays, it’s more to do with whether people can write plugins easily using FAUST, and then whether we can load and run those plug-ins.
The reason I say that is because Ardour itself doesn’t really do much DSP. We don’t have built-in source code that does anything like that really. The most expensive DSP operation that we do is the fader (laughs). That’s the one part of Ardour that is handwritten in Assembler because it’s actually really expensive, well that, sorry — that’s not the fader, that’s metering that is written in Assembler.
But Ardour itself doesn’t really do much DSP, and so the context in which most people would add DSP would be a plug-in. Now, Ardour has made it possible to write a plug-in in Lua, and I think that provides some justification for maybe saying “Well I’d like to be able to do that, but with FAUST”.
Since it can be done in such a relatively contained way, especially ‘cause FAUST has a just-in-time compiler, I think we just come down to whether on not somebody steps up and says ‘Hey I’m going to implement this". And if they implement, I can’t see a reason why we would say “No, we won’t merge that”.
On the other hand, again, I’m not really sure quite what types of DSP somebody would want to do in that way. Whereas I can imagine lots of reasons why somebody would wanna write an actual plug-in using FAUST. And if they do that, that doesn’t really need us to know anything about FAUST.
Okay, so you have obviously spent a huge amount of time rewriting how Ardour implements some very basic concepts like time. So I guess my question is, how happy are you with Ardour’s architecture? Is there something you think you still really need to rewrite? Or are we basically good to go for another decade or so?
I think the answer to this is… complicated. I think that the changes the Robin and I have done over the last two and a half years have addressed a lot of basic issues that have been there pretty much since the beginning of the program’s life.
We now have complete latency compensation inside the program regardless of how you route a signal any way through Ardour with auxes, with track to track, track to bus, sends… Everything will always be latency-compensated and all properly aligned.
I know it sounds like something that you can just have on top of what was already there. But to do that properly really involves rethinking completely what was happening while we were processing audio. And we’ve done that and I think the basis that we have for that is hopefully good for at least a decade.
There are other changes that we’ve made. I think some of the work that I did, well, I guess, it was really two separate features. One was cue monitoring which is being able to listen to both what you’re inputting and what’s on disk at the same time. And the other feature being wet recording where you can actually record the processed input rather than just the raw input.
The work I did to those also involved some pretty deep changes in how things work. Specifically the signal flow, you know, going through a track or through a bus had to be revisited, and we’ve done that and that put us in a good place.
So most of the things we worked on as part of build-up to 6.0, I think, put us in a good place for the next decade.
But… There are things that we didn’t work on as part of the build-up for 6.0. And those things I still have concerns about.
What things?
Well, the two most significant ones… One of them was something we hoped to include in 6.0. We decided that we didn’t even really fully agree on what the correct solution was. And so we decide to drop that — it’s how to represent and manipulate musical time.
I spent a long time working on a development branch, and it was very complex. There was a lot to think about, there is still a lot to think about. This is also connected with ideas of what you do with the tempo map.
Oh, was that the nutempo
branch?
Yes, that’s the nutempo
branch. And the main goals of it were… Whenever you do operations in a particular time domain, and by time domain I mean — are you talking about musical time like bars and beats or audio time like samples — and whenever you do an operation, like, I want to move a region a certain amount later on the timeline. If you specify that in, for example, bars and beats, then it moves exactly that number of bars and beats. Not some approximation, but exactly that number of bars and beats. If you move it a certain number of samples, then it moves exactly that number of samples. And so we keep the time domain that is relevant. We stay in that time domain as much as possible.
That work did get to a fairly advanced stage, it was all done, the program would run. There were certain bugs that got fixed because of it. But there were a number of other aspects to it that Robin and I didn’t ever fully agree on. We will, it’s not a big problem. We deferred that thing for the time being.
The other part of that work that is also really complex is the tempo map. Because part of the goal of all this was to allow a much more powerful tempo map including what are known as — I believe I got this right — rubato sections, where you got essentially no time. You are playing and all of a sudden you can play in whatever time you like.
So a lot of that work has been done. A lot of the conceptual work has been done, a lot of coding work as been done. But it’s not in 6.0.
I think that until we get that stuff into the code base, it’s going to have problems. So that’s one thing where I think there’s still significant amount of work to be done although I hope I’ve done most of it.
The other area that I think we just don’t know because we haven’t really tried to do it very much is what I sometimes call groove-centric or beat-centric sort of workflow. This is basically Ableton Live, FL Studio model of doing things.
And because we haven’t really tried to do that yet, I’m not completely confident that all of the data structures and the APIs and everything that we have inside of the program are where they need to be to fully support that work. I’m saying that partly based on a little project that I started and probably won’t finish.
A couple months ago, I came across this really cool piece of hardware called noodler (NDLR). And it’s just a little box to generate MIDI. I think it generates four outputs, there’s a drone, and then there’s a chord output, and then there were two sort of melodic outputs. And it’s a really cool little box that you use to generate backgrounds that you might then play over just to mess around with ideas and see what happens.
Some ambienty stuff?
The examples I’ve heard used the most are sort of ambient, yes. You sort of get big pads and then some sort of subtle maybe bass groove or some ostenato or arpeggio type of thing going on.
And, you know, being a programmer, I thought: well, I could spend three hundred bucks on the box or I could just write the software. Wouldn’t it be much more cost effective to just do that?
So I started working on a version of this, and part of the reason for doing it was that it was small and standalone and it was going to let me just play around with some of the issues that we have, if and when we move towards this beat-centric stuff.
And as I started playing with how was I was going to do that in this little program, initially I thought, this is going to be great, I’ll do all this and then we can use it in Ardour. Then I more and more found myself thinking: “Hmmm, I’m not even really sure I know the best way of doing this in a little porgram, let alone the big one”.
So that also raises questions in my mind that as we move towards doing that kind of thing, I’m not quite sure that we’ve got the architecture right for this kind of thing.
But I think that that would be an additive process when we do it. I don’t think it’s going to involve tearing up all the stuff we have. I think we’re going to need to add some API and some structures and stuff like that.
So I think, in general, we’re good hopefully for a decade, perhaps even more. But there are going to be some additions that will come, I hope, reasonable soon to deal with those areas.
I have a little fun story to tell for my next question. So there is this new project, Olive, a non-linear video editor. And the guy who started writing it understood early on that he made some bad internal design decisions. So he started rewriting everything using the right architecture, reusing existing componenents like libltc, OpenTimelineIO, OpenColorIO and so on. Making his software more of a glue between all the right libraries. And at some point, he disabled the bug tracker so that people wouldn’t bother him with feature requests. But one guy found a loophole. He basically created a pull request containing changes between two branches and used the comment section to ask for stuff. So what’s your experience? How much pressure did you get from users during the time of the rewrite? Especially since it’s been two and a half years since the release of version 5.12.
I would say that users have been incredibly understanding. I would say I felt very little pressure. I can’t speak for Robin but I suspect he feels the same way, certainly as far as Ardour users. I think people generally seem to have understood what we are doing and just stayed quiet.
People would show up regularly on IRC, the chat channel, to ask what the state of things was. But in terms of people showing up and saying “Are you going to do this and will you please do that”, that didn’t really happen. In fact, what I have noticed is it’s probably been about a month since
We really should have started saying hey we’re getting very close to 6.0
File bugs, we want to hear about them.
And that has resulted in a lot of new bugs filing and feature requests. And in most senses this is all good. Particularly the bug reports are invaluable because we don’t want to release this with huge things that we just missed.
The feature requests coming up now are a little difficult and challenging. If people were right in front of me, I’d be, like, “People! We’re trying to release this software! We are trying to tidy up loose ends to get this out. I do not want to talk about your incredible cool new idea for something”.
However, that being said, it doesn’t really feel much like pressure. Part of the point of putting things in a bug tracker is that they stick around and they persist, and we can come back to them in a month or two months, 3 months, or next year or whatever.
So I would say overall that the user community has been really great and very understanding. In fact, I’m really amazed there haven’t been more threads and more comments and people saying, like, “What the hell has happened to this project? It’s been a year, it’s been tow years, it’s been 2.5 years since the release…”.
People seem to have understood what has been going on and made it possible for us to do that work without feeling unduly pressured by it.
So that’s been really great.
I also noticed that you bumped the monthly limit of donations one or two times during that period and every time you got a 100% coverage for your expenses. Which also goes to show.
Yeah, the financial side of Ardour right now, especially during this virus period… I know we are not the only project or piece of software that is experiencing a raise. Let’s just say there’s a lot of people at home right now who have decided to try to make music.
Whether this is a new level or whether it will drop off as hopefully does the pandemic situation, I don’t know.
But the financial side of Ardour… I try not to be too proud of it. I try not to be proud of most things really, but it has been really remarkably successful.
It’s not successful in the way that Reaper is successful, in the way that Ableton Live is successful, and generate anything like their revenue. But in the free software context, we bring in more than a hundred thousand dollars a year without doing most of the stuff that companies are supposed to do in order to do that.
And I’m incredibly grateful to all the people, both the people who pay one time or the subscribers who just make it possible for both myself and to a limited extent for Robin as well to carry on viewing this as an actual job and as a thing that we do full-time and not squeeze it in around the edges. [ Editor’s note: for clarification, Robin works full time on Ardour with additional funding from Harrison Consoles. ]
So that’s really great. And the moment, if the virus situation carries on, then we face a more difficult situation. I mean, it’s a good problem which is whether or not we actually try to hire someone in the future, probably not full-time. One of the things the program is really lacking right now, and maybe the biggest obstacle to many people using Ardour properly, is that we do not have enough good documentation and enough video tutorials.
So one of the things I’m thinking about with the optic in revenue that happened with the virus is maybe trying to divert or use some of that to try to help address that situation. I think the problem is, there is so much functionality that people just don’t know about. So many things that you can do, and people have no idea it is in there. I just think it will be a benefit to all users, both the new ones and the long-standing ones, if there was just better documentation and better tutorials.
That sounds like a really good plan. Okay, let’s talk about GTK. You’ve been steadily moving away from it for the past several years. Ardour now only uses it for packing widgets, for the file dialog, and for text input. And if I remember correctly, you were going to switch to using a constraint-based layout manager like emeus in the future. Why was it neccessary to start replacing GTK+ with your custom code?
Well, it’s a very long story, 20 years long in fact. There was a number of things that sort of got the ball rolling. I suspect that the most important one was the issue of the canvas which is the thing we used to draw editor where you have tracks etc.
GTK never provided anything suitable for doing that. So even in the earliest versions, we had to use separate canvas object in order to do that kind of stuff. And we used to use something called gnome-canvas. It was one of the 5 or 6 different canvas libraries that existed for GTK. And as we used that for a while, it became clear that it wasn’t really quite what we needed. And so a guy who was very involved in Ardour for several years, Carl Hetherington, took the task of writing our own canvas object that would be tailored exactly for what we needed in the context of a DAW. So that was the first big break in the sense that you now have a situation where depending on how you use Ardour, 80% of what you’re looking at now is not GTK. It’s a canvas object with things going on on the canvas.
The second part of it is sort of a combination of… There’s two ways of talking about it really. GTK is a desktop graphical user interface toolkit, and so it features buttons and text entries and dialogus and a bunch of other things.
The problem is that an awful lot of things it provided are just not really that useful in the context of creative software. And worst of all, they don’t actually work in the way that you’d like them to.
I think the simplest example I can give this and it is a little bit complicated to explain but… There’s an idea in software engineering called Model-View-Control (MVC) programming. When you are talking about GUI, you have a button on the screen, and you click the button. And when you click the button, the user is making a request to change the state of something, like mute this track or solo this track or turns this on, turns this off. And that’s all they are doing.
It may be that the request can’t be satisfied. It may be they’re asking for something right now that is impossible. The button is also trying to display what the current state is. It’s a view, not just a controller.
And the problem with toolkits like GTK is that they just weren’t written with this idea in mind. So when you click on the button, e.g. if it’s a toggle button to turn something on and off, you click on it and it immediately toggles. It just changes its visual appearance to say “I’ve been toggled”.
But the truth is, GTK and the button don’t know whether anything has really happened. They only know that a user clicked on it.
So we had a bunch of these widgets that, although they work very well or certainly adequately for certain regular desktop applications, they don’t work if you want to use MVC. So we’re also moving away from it because we need to do our own buttons and we needed to do our own drop-downs and all these other things that we needed to replace to make it work.
Well, we could make the GTK ones work but it involves just stupid levels of hacking around. So the things we still need GTK for are text entry, file dialogs, menus, and tree views.
Re-implementing any one of those is a massive task. Text entry to me is the most subtle one. People don’t realize that. You know, you see a little box on the screen, you start typing on a keyboard, character shows up. I mean, how complicated is that?
Even as a developer, it seems pretty obvious. You get a message that user pressed the L, so we put an L in the box. No, it just doesn’t work like that!
No, not so easy!
At all!
And you have right-to-left languages etc. There’s just so much stuff associated with that, that to do it right… If you looked at the code that does that kind of stuff in GTK, it’s just a huge blob of code. And we really don’t want to have to reimplement that.
So we are at the stage where we’d like to avoid doing new interface work using GTK stuff as much as possible. But at the same time I don’t know if anyone really wants to make a commitment to reimplement any of those four things.
So I think GTK will stick around because we need those four things. But new dialogs we’ll try to do in different ways.
The other issue with GTK, at least with the older version that we use, is that it uses what’s called a box packing model. So when you’re laying out the screen, the model is just taking rectangular boxes and stacking them either vertically or horizontally. There’s different ways to construct a user interface, this one is not good, it’s not bad, it is a mechanism. There’s some things it does right very well, there’s some things it’s not very good at.
I think GTK4 has greatly extended their box packing model because of the problems that it faces. The one change I like the most is, you know, if you have a piece of text that you want to display on the screen, and you got a certain area for it, imagine of a long piece of text. And you’ve got lots of width. The obvious thing is, you are going to stick text on one line. On the other hand if the space you want to display is tall and narrow, you want to display wrapped on multiple lines.
So one of the things they added in the new versions of GTK is the idea of asking something: “How tall would you be, if I made you this wide?” or “How wide would you be, if I made you this tall?”. This is one of the ways that they started addressing some of the problems with box packing.
The other model is called constrainted layout. The idea has been around for a long, long time. As long as constraint programming itself. It got a lot more publicity when Apple added support for this to their own native UI.
I think they started it with iOS.
Yeah, I think they started with iOS and then moved back to the regular UI kit on macOS. And with this, instead of stacking boxes and stuff you basically say things like:
- This needs to be to the left of this
- This needs to occupy the half of this space
- This needs to always be one pixel below
So you set all those constraints and say: “Alright, I’ve told you the rules. Figure it out”.
What I love about the solution is that with constraint packing you really could do anything. If you wanted to do per-pixel layouts the way that a few people still do, then you could do that with constraint packing. But you can also do these creative versions of it.
And there is a library, emeus, that exists to do this type of thing. We didn’t switch to it as part of 6.0 because the only good wrapper for using libemeus in C++ needs a newer version of the C++ compiler that we don’t currently use.
I made the executive decision that I think Robin has mostly agreed to, which is after 6.0 comes out, we are going to shift to the incredibly new C++ 2011 version. Only 9 years old! (laughs).
But that will allow us to start using that library. And my hope is that will allow us to write our own packer for the canvas. Will that will let you do any
constraint layout and that means we can use a canvas to start doing new dialogs and new arrangements of things. You could imagine, for example, mixer strips which right now are the GTK box packing. And you could imagine this whole thing is gonna be the canvas. And we’ll do this with a constraint description where anything goes.
We have to go to C++ 11 first and then we can bring in this new library and then I can start playing around with, you know, how that works and what that will let us do.
Again, there’s no reasonable time spent on which we can get rid of GTK. But I think this will allow us to fulfill the goal of not relying on it for new visual interactions.
Giving everything you’ve just said, if you were to start the Ardour project today, what would be your technology stack? Would you go for the web stuff like the GridSound project you probably heard of? Or would you use something like DPF? What would you do?
The most obvious choice to me right now is JUCE. It’s a library that was created in the context of building applications like Ardour. It is cross-platform in a way that e.g. GTK and Qt still struggle a little bit to be truly cross-platform.
There are incredible differences between the way things work on the Mac, Windows, and Linux. I believe JUCE handles this better but nobody I know who’s associated with the Ardour project has ever looked in great detail at JUCE.
And the one thing I know from talking to developers in other audio tech companies is that all GUI toolkits suck basically. Well, ‘suck’ is the wrong word. All of them have their own issues. And when you start using them for something as complicated as a DAW, you start running into them. So I don’t know for certain what it would be.
The other path to go down… Yeah, DPF would be a possibility. And there’s another one of Robin’s projects, Pugl, where you probably wouldn’t do GL anymore just because GL seems to be fading away. So… Whatever the cool 2D graphics layer is… Vulcan, Metal… Something very thin on top of that to let you handle events, and then a bunch of widgets, buttons and blah blah blah.
The problem is, as I said, I don’t want to rewrite menus and previews and so on. So they would need to be part of the toolkit.
I know one that quite matches that description right now. I do think that for something like a DAW for what I know about the history of Blender, for example… Pretty much what they did was they said they were not going to rely on anything else but GL on the bottom and then we are gonna build all those things. And then at some point in history or Blender someone said ‘Oh my God, this has become crazy. We need to actually turn this into a real toolkit. And I’m not sure whether they’re still actually even using that GL-based approach at this point.
But that sort of direction. We’ve got a lot of 2D drawing API, and we’ve got some mechanism for handling events. And we are just going to build everything on top of that and move away from these desktop toolkits.
That would be the other option but, again, we did a port, a long, long time ago, 18 years ago now, from GTK1 to GTK2, which is kinda rather similar. That took between 6 months and a year. And that was when the code base of Ardour was maybe less than half the size that it is now. And I know that if we ever tried to port to another toolkit, it would probably take two years to have it working.
From my perspective, that would be not worthwhile. That would just be wasted time. Even though there would be some users who would look at the final results and say “Oh my god! That’s so cool! It looks so much nicer!”. For most people it would just be, like, “What they have been doing for two years?” (laughs)
Sounds about right :) So you recently had a conversation with the guy we know is Tantacrul. He’s now the head UX/UI designer at MuseScore. Any chance of beans spilling?
At the moment, there isn’t really much to say. Martin and I had a brief conversation yesterday. It was constantly interrupted by connection issues. I was out in town and was using my hotspot and my battery run out, which was pretty embarassing, and then the cell phone company dropped… Anyway, we were going to do it again this morning, and other time scheduling issues came off. We’re ‘going to do it again in the near future. Maybe they’ll be some stuff to talk about.
He’s a fantastic guy. Anybody who hasn’t seen Tantacrul’s videos on the design of Sibelius, MuseScore, and, recently, Dorico… It’s worth watching. Funny but incredibly insightful analysis of things.
And even from my brief interaction with him it’s just clear to me that he’s just an insanely smart guy when it comes to user interface design. And that’s the kind of person that — no offense to anybody who’s been involved in Ardour — but we don’t really have anyone that has the background that he has.
Even the people that we have who are good interface designers — you know, Ben Loftis at Harrison springs to mind — he had a lot of fabulous insight some of which we unfortunately ignored when we shouldn’t have. But I think Martin brings even more really deep-scale understanding of how to do things.
So I’m hoping that even if it’s just, you know, informal, once-in-a-while conversations or interactions via some medium, that we might benefit a little bit from the kind of insights that he has. Even if it’s just encouraging me or somebody else to just do more user testing which is one of the things that he’s been so good at.
Would you ever do the traditional UX focus group testing?
I don’t know how we would do that… It actually becomes somebody’s job, and that often more than one person’s job to do that.
And I don’t think we have anyone in a community right now who could actually do that or would do that.
Well, I mean, like, hiring an agency of sorts.
Yes, but I think I mean one of the reasons I’ve been hitting Martin, I mean, even from my brief conversations, I clarified it, his job at MuseScore is actually a real job, it’s a full-time gig.
I don’t know, how much time he will have. But one of the things I was interested in talking to hin about originally was, you know, “Hey, are you interested in doing this sort of thing for us?”. It doesn’t need to be Martin. But watch his Dorico video, from a few days ago. And some of the insights that you could see being gained so easily from just watching new users trying to do stuff… I watched that and I thought: you have to be stupid to not want to gain those kinds of insights.
The second part of the interview is here.
All interviews take time. There’s always the research stage, conversations on and off the record, editing — especially in podcasts — etc. The thing that really consumed a lot of my time here was preparing the transcript for people who would rather read text. So if you enjoy the work I do, you can subscribe on Patreon or make a one-time donation with BuyMeACoffee or Wise (see here for more info). Going forward, transcripts will be available to Patreon subscribers only.