Fri, 30 Apr 2010
Re: Getting the hobbyist back
Dear Mr Neary, thanks for your thought provoking post, I think it is a problem we need to be aware of as Free Software matures.
Firstly though I would like to say that the apparent ageism present in your argument isn't helpful to your point. Your comments appear to diminish the contributions of a whole generation of people. In addition, we shouldn't just be concerned with attracting young people to contribute, the same changes will have likely reduced the chances that people of all ages will get involved.
Aside from that though there is much to discuss. You talk about the changes in Free Software since you got involved, and it mirrors my observations. While these changes may have forced fewer people to learn all the details of how the system works, they have certainly allowed more people to use the software, bringing many different skills to the party with them.
I would contend that often the experience for those looking to do the compilation that you rate as important has parallels to the experience of just using the software you present from a few years ago. If we can change that experience as much as we have the installation and first use experience then we will empower more people to take part in those activities.
It is instructive then to look at how the changes came about to see if there are any pointers for us. I think there are two causes of the change that are of interest to this discussion.
Firstly, one change has been an increased focus on user experience. Designing and building software that serves the users needs has made it much more palatable for people, and reduced the investment that people have to make before using it. In the same way I think we should focus on developer experience, making it more pleasant to perform some of the tasks needed to be a hobbyist. Yes, this means hiding some of the complexity to start with, but that doesn't mean that it can't be delved in to later. Progressive exposure will help people to learn by not requiring them to master the art before being able to do anything.
Secondly, there has been a push to make informed decisions on behalf of the user when providing them with the initial experience. You no longer get a base system after installation, upon which you are expected to select from the thousands of packages to build your perfect environment. Neither are you led to download multiple CDs that contain the entire contents of a distribution, much of which is installed by default. Instead you are given an environment that is already equipped to do common tasks, where each task is covered by an application that has been selected by experts on your behalf.
We should do something similar with developer tools, making opinionated decisions for the new developer, and allowing them to change things as they learn, similar to the way in which you are still free to choose from the thousands of packages in the distribution repositories. Doing this makes documentation easier to write, allows for knowledge sharing, and reduces the chances of paralysis of choice.
There are obviously difficulties with this given that often the choice of tool that one person makes on a project dicatates or heavily influences the choice other people have to make. If you choose autotools for your projects then I can't build it with CMake. Our development tools are important to us as they shape the environment in which we work, so there are strong opinions, but perhaps consistency could become more of a priority. There are also things we can do with libraries, format specifications and wrappers to allow choice while still providing a good experience for the fledgling developer.
Obviously as we are talking about free software the code will always be available, but that isn't enough in my mind. It needs to be easier to go from code to something you can install and remove, allowing you to dig deeper once you have achieved that.
I believe that our effort around things like https://dev.launchpad.net/BuildBranchToArchive will go some way to helping with this.
Fri, 09 Apr 2010
Summer of Code Student Application Deadline
The deadline for students to submit their applications to Google for Summer of Code is imminent.
If you were waiting for the last minute to submit, that is now!
If you are mentor and have the perfect student you have been working with, check with them that they have submitted the application to Google, otherwise you will be stuck.
Next week we'll start to process the huge number of applications that we have for Ubuntu.
Thu, 08 Apr 2010
Caution: python-multiprocessing, threads and glib don't mix
If you don't want to read this article, then just steer clear of python-multiprocessing, threads and glib in the same application. Let me explain why.
There's a rather famous bug in Gwibber in Ubuntu Lucid, where a gwibber-service process will start taking 100% of the CPU time of one of your cores if it can. While looking in to why this bug happened I learnt a lot about how multiprocessing and GLib work, and wanted to record some of this so that others may avoid the bear traps.
Python's multiprocessing module is a nice module to allow you to easily run some code in a subprocess, to get around the restriction of the GIL for example. It makes it really easy to run a particular function in a subprocess, which is a step up from what you had to do before it existed. However, when using it you should be aware how the way it works can interact with the rest of your app, because there are some possible nasties lurking there.
GLib is a set of building blocks for apps, most notably used by GTK+. It provides an object system, a mainloop and lots more besides. What we are most interested here is the mainloop, signals, and thread integration that it provides.
Let's start the explanation by looking at how multiprocessing does its thing. When you start a subprocess using multiprocessing.Process, or something that uses it, it causes a fork(2), which starts a new process with a copy of the programs current memory, with some exceptions. This is really nice for multiprocessing, as you can just run any code from that program in the subprocess and pass the result back without too much difficulty.
The problems occur because there isn't an exec(3) to accompany the fork(2). This is what makes multiprocessing so easy to use, but doesn't insert a clean process boundary between the processes. Most notably for this example, it means the child inherits the file descriptors of the parent (critically even those marked FD_CLOEXEC).
The other piece to this puzzle is how the GLib mainloop communicates between threads. It requires some mechanism where one thread can alert another that something of interest happened. To do this when you tell GLib that you will be using threads in your app by calling g_thread_init (gobject.threads_init() in Python) then it will create a pipe for use by glib to alert other threads. It also creates a watcher thread that polls one end of this pipe so that it can act when a thread wishes to pass something on to the mainloop.
The final part of the puzzle is what your app does in a subprocess with mutliprocessing. If you purely do something such as number crunching then you won't have any issues. If however you use some glib functions that will cause the child to communicate with the mainloop then you will see problems.
As the child inherits the file descriptors of the parent it will use the same pipe for communication. Therefore if a function in the child writes to this pipe then it can put the parent in to a confused state. What happens in gwibber is that it uses some gnome-keyring functions and that puts the parent in to a state where the watcher thread created by g_thread_init busy-polls on the pipe, taking up as much CPU time as it can get from one core.
In summary, you will see issues if you use python-multiprocessing from a thread and use some glib functions in the children.
There are some ways to fix this, but no silver bullet:
- Don't use threads, just use multiprocessing. However, you can't communicate with glib signals between subprocesses, and there's no equivalent built in to multiprocessing.
- Don't use glib functions from the children.
- Don't use multiprocessing to run the children, use exec(3) a script that does what you want, but this isn't as flexible or as convenient.
It may be possible to use the support for different GMainContexts for different threads to work around this, but:
- You can't access this from Python, and
- I'm not sure that every library you use will correctly implement it, and so you may still get issues.
Note that none of the parties here are doing anything particularly wrong, it's a bad interaction caused by some decisions that are known to cause issues with concurrency. I also think there are issues when using DBus from multiprocessing children, but I haven't thoroughly investigated that. I'm not entirely sure why the multiprocessing child seems to have to be run from a non-main thread in the parent to trigger this, any insight would be welcome. You can find a small script to reproduce the problem here.
Or, to put it another way, global state bad for concurrency.