[view raw Rmd]

One week ago, the R Consortium ISC announced the second round of ISC Funded Projects under the 2017 edition (and the opening of the Spring 2018 call). As you may know, this program provides financial support for projects that enhance the infrastructure of the R ecosystem or which benefit large segments of the R Community. This second round includes Refactoring and updating the SWIG R module, proposed by Richard Beare; Future Minimal API: Specification with Backend Conformance Test Suite, proposed by Henrik Bengtsson; An Earth data processing backend for testing and evaluating stars, proposed by Edzer Pebesma, and our Quantities for R proposal, which was supported by Edzer Pebesma.

Quantity Calculus for R vectors

As we stated in our project presentation,

The units package has become the reference for quantity calculus in R, with a wide and welcoming response from the R Community. Along the same lines, the errors package integrates and automatises uncertainty propagation and representation for R vectors. A significant fraction of R users, both practitioners and researchers, use R to analyse measurements, and would benefit from a joint processing of quantity values with errors.

This project not only aims at orchestrating units and errors in a new data type, but will also extend the existing frameworks (compatibility with base R as well as other frameworks such as the tidyverse) and standardise how to import/export data with units and errors.

Our long-term goal is to build a robust architecture following the principles established by David Flater in his Architecture for Software-Assisted Quantity Calculus. As this technical note states, there are many software libraries and packages that implement “quantities with units” in many languages, but they differ in how they address several issues and uncertainty (if they deal with it at all). Regarding the latter, there are few but notable examples, such as the Wolfram’s closed-source units framework, and the C++ measurement class included in Boost.Units. Building on the existing units and errors packages, the new quantities package will provide a unified framework to consistently work with both, units and errors, in R.

First steps

To this end, the r-quantities organisation on GitHub serves as a hub for all the related packages, such as the existing CRAN packages units, errors and constants, as well as the new quantities R package. This division becomes an advantage, because it enables separate development and maintenance of each distinct feature. But at the same time, these packages required many changes to play nicely together. The integration stage required 14 PR on units that Edzer carefully revised and merged, as well as some changes on errors. Nonetheless, we still have to learn all the cornerstones that must be preserved to further enhance them in the future without breaking the work done.

This process has led us to some interesting challenges. The first one had to do with S3 method dispatching of generics that accept a variable number of arguments through dots. More especifically, it was about the concatenation method c(...), and the issue arises when you need to modify some arguments (i.e., convert units) and forward the dispatch to the next method in the stack (errors). This problem is fully explained in this repository, examples included, and apparently this is not possible in general. Fortunately, we found a workaround (included in the repo) that reinitialises the dispatch stack by calling the generic again if any argument was modified, and finally calls NextMethod cleanly.

The other challenge had to do with rbind and cbind. These are S3 generics, but they are special in a way: as the documentation states, method dispatching is not done via UseMethod, but by C-internal dispatching. This fact poses a serious obstacle if you need to rely on other S3 method. The final solution required to retrieve it using getS3method and a local assignment to override the generic (here, for those interested) and forward the dispatch.

First working prototype

A first working prototype of quantities can be found on GitHub. To test it, also development versions of units and errors are required. They can be installed using devtools or the remotes package:

remotes::install_github(paste("r-quantities", c("units", "errors", "quantities"), sep="/"))

There are three main functions: quantities<- and set_quantities, to set and convert measurement units and errors on R vectors, arrays and matrices, and quantities, to retrieve them.

library(quantities)

## Loading required package: units

## Loading required package: errors

set.seed(1234)

# time
t_e <- rnorm(10, 0, 0.01)
t_x <- 1:10 + t_e
quantities(t_x) <- list("s", 0.01)
t_x

## Units: s
## Errors: 0.01 0.01 0.01 0.01 0.01 ...
##  [1] 0.9879293 2.0027743 3.0108444 3.9765430 5.0042912 6.0050606 6.9942526
##  [8] 7.9945337 8.9943555 9.9910996

# position
xb <- (1:10)^3
x <- set_quantities(xb + abs(rnorm(10, 0, xb * 0.01)) * sign(t_e), m, xb * 0.01)
x

## Units: m
## Errors: 0.01 0.08 0.27 0.64 1.25 ...
##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

From this point on, you can operate normally with these vectors as if they were plain numeric vectors.

# non-sensical operation
x + t_x

## Error: cannot convert s into m

# speed
t_v <- (t_x[-1] - diff(t_x) / set_quantities(2))
v <- diff(x) / diff(t_x)
v

## Units: m/s
## Errors: 0.1255988 0.3858872 0.9099201 1.6004630 2.7990976 ...
## [1]   6.98101  18.97657  38.05448  60.56018  89.96963 126.37488 166.04077
## [8] 215.60076 253.77087

# acceleration
t_a <- t_x[-c(1, length(t_x))]
a <- diff(v) / diff(t_v)
a

## Units: m/s^2
## Errors: 0.4496879 1.0574082 1.8883107 3.2173529 5.3458905 ...
## [1] 11.85968 19.33145 22.57969 28.99600 36.58889 39.87578 49.55744 38.23576

A certain class hierarchy is set and maintained in order to ensure a proper dispatch order. If units or errors are dropped, the object falls back to be handled by the corresponding package. Furthermore, compatibility methods are provided (units<-.errors and errors<-.units) to be able to restore them seamlessly.

class(x)

## [1] "quantities" "units"      "errors"

u <- units(x)
e <- errors(x)

# drop units (equivalent to 'drop_units(x)')
units(x) <- NULL
class(x)

## [1] "errors"

x

## Errors: 0.01 0.08 0.27 0.64 1.25 ...
##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

# restore them
units(x) <- u
class(x)

## [1] "quantities" "units"      "errors"

x

## Units: m
## Errors: 0.01 0.08 0.27 0.64 1.25 ...
##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

# drop errors (equivalent to 'drop_errors(x)')
errors(x) <- NULL
class(x)

## [1] "units"

x

## Units: m
##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

# restore them
errors(x) <- e
class(x)

## [1] "quantities" "units"      "errors"

x

## Units: m
## Errors: 0.01 0.08 0.27 0.64 1.25 ...
##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

# drop everything (equivalent to 'quantities(x) <- NULL')
drop_quantities(x)

##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

There are mathematical operations that are not meaningful for certain units. They drop units and issue a warning.

exp(x)

## Warning in Math.units(x): Operation exp not meaningful for units

## Errors: 2.705341e-02 2.583053e+02 1.771487e+11 3.829222e+27 8.027845e+54 ...
##  [1]  2.705341e+00  3.228816e+03  6.561062e+11  5.983160e+27  6.422276e+54
##  [6]  8.148249e+93 1.591447e+148 2.151056e+220           Inf           Inf

cos(x)

## Warning in Math.units(x): Operation cos not meaningful for units

## Errors: 0.008388831 0.077967625 0.236159688 0.577972683 0.638012419 ...
##  [1]  0.54431158 -0.22397314 -0.48472698  0.42946750  0.85993122
##  [6] -0.86195837 -0.37503495 -0.03252835  0.94581264 -0.36825300

x2 <- x^2

## Warning: In 'Ops' : non-'errors' operand automatically coerced to an
## 'errors' object with zero error

x2

## Units: m^2
## Errors:   0.01990456   1.29277935  14.69317782  81.86719534 315.49841893 ...
##  [1] 9.904789e-01 6.528431e+01 7.403617e+02 4.090721e+03 1.592628e+04
##  [6] 4.675897e+04 1.164497e+05 2.573885e+05 5.225801e+05 9.522669e+05

sqrt(x)

## Error in Ops.units(x, 0.5): units not divisible

sqrt(x2)

## Units: m
## Errors: 0.01 0.08 0.27 0.64 1.25 ...
##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

Finally, measurements must be correctly expressed. Quantities are properly formatted individually or in data frames, and units and errors are automatically represented in base graphics.

x

## Units: m
## Errors: 0.01 0.08 0.27 0.64 1.25 ...
##  [1]   0.9952281   8.0798709  27.2095886  63.9587464 126.1993676
##  [6] 216.2382167 341.2472374 507.3346795 722.8970185 975.8416482

x[1]; x[2]; x[3]

## 1.00(1) m

## 8.08(8) m

## 27.2(3) m

data.frame(
  t = t_a, 
  x = x[-c(1, length(x))],
  a = set_units(a, km/h/s)  # conversions propagate errors too
)

##           t         x              a
## 1 2.00(1) s 8.08(8) m   43(2) km/h/s
## 2 3.01(1) s 27.2(3) m   70(4) km/h/s
## 3 3.98(1) s 64.0(6) m   81(7) km/h/s
## 4 5.00(1) s  126(1) m 100(10) km/h/s
## 5 6.01(1) s  216(2) m 130(20) km/h/s
## 6 6.99(1) s  341(3) m 140(30) km/h/s
## 7 7.99(1) s  507(5) m 180(40) km/h/s
## 8 8.99(1) s  723(7) m 140(60) km/h/s

plot(t_a, a)
abline(lm(drop_quantities(a) ~ drop_quantities(t_a)))

Next steps

There is plenty to do! Apart from adding documentation and tests, we will next focus on how to import and export data with units and errors. But to this aim, we first need to identify which are the typical formats that can be found out there, e.g.:

  • Units and errors are provided for each value, as in the table above.
  • Errors are provided for each value, but units are included in the header of the table.
  • Separate columns are provided for values and errors, and units are included in the header of the table.

Any input on this from the community would be very welcome. Also there are ongoing efforts to enhance the units package to make it work with user-defined units seamlessly. The current implementation is limited by the functionality of the udunits2 package. There are several branches exploring different alternatives just in case udunits2 cannot grow as units will need in the future.

Acknowledgements

This project gratefully acknowledges financial support from the R Consortium. Also I would like to thank Edzer Pebesma for his kind support and collaboration, and of course for hosting this article.