Thomas Lumley InteRview

Survey analysis in R, community contributions, and more.

Mauricio “Pachá” Vargas S.


April 9, 2020

Today I interviewed Dr. Thomas Lumley, the creator of survey::. While I recover from a hand surgery, I’ll be doing some interviews that I’m recording and then transcribing with the help of software. Also, this is the 1st time I’m using my image Jitsi Server on Digital Ocean outside testing and it worked well for videoconferencing with a person in New Zealand.


1. Why do you use R? R is a language for statistics and it’s open. It has the advantage, that unlike other software, you know how the computation is actually done and you can explore the code. With some other programs, when you see that the output is different to what you get with R, there’s no way to know why the difference, as there is no way to know what the program is doing.

2. Do you consider reproducible research a gold standard or an impossible dream? Reproducible research is a method and it’s good to be very transparent with the results you are providing. But you have to be careful, because when something is reproducible, it doesn’t neccessarily mean that the methodology and other important aspects in a study are correct.

3. How did you get the idea of creating survey::? When I started analyzing surveys with R, I realized that some tasks were really difficult to do, and at the same time, the same could be done more easily with other software.

4. Do you feel a sense of responsibility after realizing that survey:: has many users? Yes. I implement new features and corrections when people ask for them. I’m thankful to see that users who know much more than me about surveys are using the package. For example, I got valuable feedback from Australia Statistics Bureau, and they even improved some functions for an increased performance when computing estimates with large datasets.

5. Many survey:: users write loops or have their tricks to compute degrees of freedom. Do you think the package can be improved to avoid that? The package by default uses the general case. The degrees of freedom with complex designs is more an open problem than something to say “this is the right way”, there is no definitive consensus in the literature.

6. I didn’t find a BugReports section in survey:: DESCRIPTION. How do you handle contributions and suggestions? At some point I thought about creating a GitHub repository, but I need to find the time to learn all the details to get the most out of it. Until now, the package website and direct communication have worked well.

7. How do you test your code and what’s your opinion about testthat:: and continuous integration? I test my code in very classic ways. I have used testthat::, which is an excellent package, and GitHub Actions to explore the potential of it.

8. What are the future plans for survey:: and which contributions would you like to see from other users? If users get interested in extending survey:: to work with more regression methods, it would be fantastic. In the past I received many suggestions and improvements that I included in the package, not just from the Statistics Bureau, and I’m happy for all this interest to contribute to open software.

9. Is performance something relevant in survey::? Performance is important but is not the most important characteristic in my opinion. If a user writes me asking for faster functions, I’ll work on it, but the most important thing is the correctness and stability of the code.

10. Which advice would you give to people starting to write R packages? It is O.K. to have warnings in check results when you start with packages, don’t focus too much on it. When you write functions, start with particular cases, and when you get code that is rock solid, try to generalize your functions.