All posts

Two other patterns for organising R projects


R is an immensely flexible language. There is usually more than couple of ways to do any thing in R and over 6 years of working with it I have developed two unique ways to organise R code. At least I haven't seen anyone else use it.

Anonymous function

Use a nameless function to return an object with all the sub functions needed similar to a node.js package. The main advantage is that you don’t litter your default namespace with lot of function names. For example, if your arith.r file has,

(function(){
  arith<-list();
  arith$square <- function(a){return(a*a)}
  arith$cube <- function(a){return(arith$square(a)*a)}
  return(arith)
})()

In another script/project, you can source this file and store the individual functions to in an object. This is done by calling for its “value” as show below,

> arith <- source("arith.r")$value

After this you can use the functions by using,

> arith$square(2)
 [1] 4

> arith$cube(3)
 [1] 27

This gives us a very preliminary way of creating tiny custom packages. Note that the object returned from anonymous function does have state (internal variables) making the function similar to a class definition in Object oriented programming.

Rscript environment

You can read from and write output to standard IO in linux/unix from Rscript. This helps us immensely when there are millions of small files needs to be processed parallely. We can just make small Rscripts which read from stdin and write to stdout and chain them together with pipes and run the pipe on multiple files/ streams.
For example, create a “square.r” file containing,

!# /bin/Rscript
suppressMessages(library('tidyverse'))
read.table(file('stdin'),sep=",") %>%
  mutate(square=value*value) %>%
  format_csv %>%
  cat

“cube.r” file containing,

!# /bin/Rscript
suppressMessages(library('tidyverse'))
read.table(file('stdin'),sep=",") %>%
  mutate(cube=square*value) %>%
  format_csv %>%
  cat

data.csv containing,

value
1
2
4

Now we can apply previous two functions on the csv by doing,

$ cat data.csv | square.r | cube.r

Will return,

value, square, cube
1,1,1
2,4,8
4,16,64

The advantages of this pattern are,

That concludes our post. Hope this is helpful for people using R for their research.



Comments