Rcpp Top StackOverflow Q&A

Top users by number of answers

211
votes
114.8k
views

Speed up the loop operation in R

[see original]
performancerloopsrcppr-faq
Kay asked May 25 '10 at 21:55
Click to expand question

I have a big performance problem in R. I wrote a function that iterates over a data.frame object. It simply adds a new column to a data.frame and accumulates something. (simple operation). The data.frame has roughly 850K rows. My PC is still working (about 10h now) and I have no idea about the runtime.

dayloop2 <- function(temp){
    for (i in 1:nrow(temp)){    
        temp[i,10] <- i
        if (i > 1) {             
            if ((temp[i,6] == temp[i-1,6]) & (temp[i,3] == temp[i-1,3])) { 
                temp[i,10] <- temp[i,9] + temp[i-1,10]                    
            } else {
                temp[i,10] <- temp[i,9]                                    
            }
        } else {
            temp[i,10] <- temp[i,9]
        }
    }
    names(temp)[names(temp) == "V10"] <- "Kumm."
    return(temp)
}

Any ideas how to speed up this operation?

465
votes
Marek replied Jun 03 '10 at 22:34
Click to expand answer

I have a big performance problem in R. I wrote a function that iterates over a data.frame object. It simply adds a new column to a data.frame and accumulates something. (simple operation). The data.frame has roughly 850K rows. My PC is still working (about 10h now) and I have no idea about the runtime.

dayloop2 <- function(temp){
    for (i in 1:nrow(temp)){    
        temp[i,10] <- i
        if (i > 1) {             
            if ((temp[i,6] == temp[i-1,6]) & (temp[i,3] == temp[i-1,3])) { 
                temp[i,10] <- temp[i,9] + temp[i-1,10]                    
            } else {
                temp[i,10] <- temp[i,9]                                    
            }
        } else {
            temp[i,10] <- temp[i,9]
        }
    }
    names(temp)[names(temp) == "V10"] <- "Kumm."
    return(temp)
}

Any ideas how to speed up this operation?

72
votes
100.3k
views

Rcpp package doesn't include Rcpp_precious_remove

[see original]
rrcpp
Laxmi Agarwal asked Jul 17 '21 at 00:14
Click to expand question

I have been trying to create a database and installed the "DBI" package, but I am still facing this error. I reinstalled DBI and RSQLite package, but they don’t seem to work.

library("DBI")
con <- dbConnect
(RSQLite::SQLite(), dbname = ":memory:")
dbListTables(con)

Error:

Error in connection_connect(dbname, loadable.extensions, flags, vfs, extended_types) : function 'Rcpp_precious_remove' not provided by package 'Rcpp'

71
votes
Daniela replied Jul 18 '21 at 12:46
Click to expand answer

I have been trying to create a database and installed the "DBI" package, but I am still facing this error. I reinstalled DBI and RSQLite package, but they don’t seem to work.

library("DBI")
con <- dbConnect
(RSQLite::SQLite(), dbname = ":memory:")
dbListTables(con)

Error:

Error in connection_connect(dbname, loadable.extensions, flags, vfs, extended_types) : function 'Rcpp_precious_remove' not provided by package 'Rcpp'

15
votes
5.9k
views

Rcpp pass by reference vs. by value

[see original]
rrcpp
Ari B. Friedman asked Jul 02 '12 at 19:36
Click to expand question

I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!): Replace negative values by zero

The initial version looked like this:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.

So I fixed it with the following version:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  Rcpp::NumericVector xr(a);
  for(int i=0; i < n_xa; i++) {
    if(xr[i]<0) xr[i] = 0;
  }
  return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):

> cpp_if_src <- '
+   Rcpp::NumericVector xa(a);
+   int n_xa = xa.size();
+   for(int i=0; i < n_xa; i++) {
+     if(xa[i]<0) xa[i] = 0;
+   }
+   return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
> 
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
> cpp_if(p)
 [1] 0 0 0 0 0 0 1 2 3 4 5
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

I'm not the only one who has tried to replicate this behavior and found inconsistent results:

https://chat.stackoverflow.com/transcript/message/4357344#4357344

What's going on here?

22
votes
Dirk is no longer here replied Jul 02 '12 at 20:28
Click to expand answer

I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!): Replace negative values by zero

The initial version looked like this:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.

So I fixed it with the following version:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  Rcpp::NumericVector xr(a);
  for(int i=0; i < n_xa; i++) {
    if(xr[i]<0) xr[i] = 0;
  }
  return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):

> cpp_if_src <- '
+   Rcpp::NumericVector xa(a);
+   int n_xa = xa.size();
+   for(int i=0; i < n_xa; i++) {
+     if(xa[i]<0) xa[i] = 0;
+   }
+   return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
> 
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
> cpp_if(p)
 [1] 0 0 0 0 0 0 1 2 3 4 5
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

I'm not the only one who has tried to replicate this behavior and found inconsistent results:

https://chat.stackoverflow.com/transcript/message/4357344#4357344

What's going on here?

34
votes
101.4k
views

Replace negative values by zero

[see original]
rif-statementfor-loopconditional-statementsrcpp
Fabian Stolz asked Jun 30 '12 at 15:09
Click to expand question

We want to set all values in an array zero that are negative.

I tried out a a lot of stuff but did not yet achieve a working solution. I thought about a for loop with condition, however this seems not to work.

#pred_precipitation is our array
pred_precipitation <-rnorm(25,2,4)     

for (i in nrow(pred_precipitation))
{
  if (pred_precipitation[i]<0) {pred_precipitation[i] = 0}
  else{pred_precipitation[i] = pred_precipitation[i]}
}
72
votes
Ari B. Friedman replied Jun 30 '12 at 15:15
Click to expand answer

We want to set all values in an array zero that are negative.

I tried out a a lot of stuff but did not yet achieve a working solution. I thought about a for loop with condition, however this seems not to work.

#pred_precipitation is our array
pred_precipitation <-rnorm(25,2,4)     

for (i in nrow(pred_precipitation))
{
  if (pred_precipitation[i]<0) {pred_precipitation[i] = 0}
  else{pred_precipitation[i] = pred_precipitation[i]}
}
16
votes
4.3k
views

Using Rcpp within parallel code via snow to make a cluster

[see original]
rrcppsnow
Vincent asked May 20 '11 at 15:40
Click to expand question

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

And here is some code:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)
19
votes
Dirk is no longer here replied May 20 '11 at 15:47
Click to expand answer

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

And here is some code:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)
86
votes
34.8k
views

Understanding the contents of the Makevars file in R (macros, variables, ~/.R/Makevars and pkg/src/Makevars)

[see original]
rrcpp
NoBackingDown asked Apr 24 '17 at 20:59
Click to expand question

I am trying to understand the role and relationship of the macros/variables set in ~/.R/Makevars and package_directory/src/Makevars when installing/building own R packages. Suppose these files look like

~/.R/Makevars

CXX = g++
CXXSTD = -std=c++11
CXXFLAGS = -fsanitize=undefined,address -fno-omit-frame-pointer

CXX98 = g++
CXX98STD = -std=c++98

CXX11 = g++
CXX11STD = -std=c++11

CXX14 = g++
CXX14STD = -std=c++14

package_directory/src/Makevars

PKG_CPPFLAGS = -I../inst/include
CXX_STD = CXX11

As I understand it, with CXX we can select the compiler for C++ when building R packages, with CXXSTD we chose the standard and with CXXFLAGS we add compiler flags. With PKG_CPPFLAGS we add flags for the C++ preprocessor and with CXX_STD we tell that our packages uses C++11.

I have the following questions:

  • What is the relationship between CXX and CXX98, CXX11 and CXX14?
  • What is the meaning of, e.g., CXX11STD = -std=c++11 if C++11 is already implied? Is it between choosing -std=c++11 and -std=gnu++11? Should -std=gnu++11 generally be avoided for portability reasons?
  • Could the flags for CXXSTD and CXXFLAGS not just be added to CXX, such that the first three lines reduce to CXX = g++ -std=c++11 -fsanitize=undefined,address -fno-omit-frame-pointer. What is the advantage in explicity specifying CXXSTD and CXXFLAGS?
  • How does CXX_STD = CXX11 work? How is CXX11 here related to CXX11 in ~/.R/Makevars?
  • What is the relationship between CXXFLAGS and PKG_CXXFLAGS(not included in my example)?

I am aware of the information contained in Writing R Extensions and R Installation and Administration, but I am not able to extract more information beyond my current level of understanding to answer the above questions.

I am adding a Rcpp tag because I suppose that answers to these questions will be most relevant to users of Rcpp, but I am aware that this is probably not directly related to Rcpp, so the tag might be removed if deemed appropriate.

128
votes
coatless replied Apr 24 '17 at 23:30
Click to expand answer

I am trying to understand the role and relationship of the macros/variables set in ~/.R/Makevars and package_directory/src/Makevars when installing/building own R packages. Suppose these files look like

~/.R/Makevars

CXX = g++
CXXSTD = -std=c++11
CXXFLAGS = -fsanitize=undefined,address -fno-omit-frame-pointer

CXX98 = g++
CXX98STD = -std=c++98

CXX11 = g++
CXX11STD = -std=c++11

CXX14 = g++
CXX14STD = -std=c++14

package_directory/src/Makevars

PKG_CPPFLAGS = -I../inst/include
CXX_STD = CXX11

As I understand it, with CXX we can select the compiler for C++ when building R packages, with CXXSTD we chose the standard and with CXXFLAGS we add compiler flags. With PKG_CPPFLAGS we add flags for the C++ preprocessor and with CXX_STD we tell that our packages uses C++11.

I have the following questions:

  • What is the relationship between CXX and CXX98, CXX11 and CXX14?
  • What is the meaning of, e.g., CXX11STD = -std=c++11 if C++11 is already implied? Is it between choosing -std=c++11 and -std=gnu++11? Should -std=gnu++11 generally be avoided for portability reasons?
  • Could the flags for CXXSTD and CXXFLAGS not just be added to CXX, such that the first three lines reduce to CXX = g++ -std=c++11 -fsanitize=undefined,address -fno-omit-frame-pointer. What is the advantage in explicity specifying CXXSTD and CXXFLAGS?
  • How does CXX_STD = CXX11 work? How is CXX11 here related to CXX11 in ~/.R/Makevars?
  • What is the relationship between CXXFLAGS and PKG_CXXFLAGS(not included in my example)?

I am aware of the information contained in Writing R Extensions and R Installation and Administration, but I am not able to extract more information beyond my current level of understanding to answer the above questions.

I am adding a Rcpp tag because I suppose that answers to these questions will be most relevant to users of Rcpp, but I am aware that this is probably not directly related to Rcpp, so the tag might be removed if deemed appropriate.

8
votes
1.7k
views

Multithreaded & SIMD vectorized Mandelbrot in R using Rcpp & OpenMP

[see original]
multithreadingopenmprcppsimdmandelbrot
Tom Wenseleers asked Jan 03 '18 at 01:19
Click to expand question

As an OpenMP & Rcpp performance test I wanted to check how fast I could calculate the Mandelbrot set in R using the most straightforward and simple Rcpp+OpenMP implementation. Currently what I did was:

#include <Rcpp.h>
#include <omp.h>
// [[Rcpp::plugins(openmp)]]

using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericMatrix mandelRcpp(const double x_min, const double x_max, const double y_min, const double y_max,
                         const int res_x, const int res_y, const int nb_iter) {
  Rcpp::NumericMatrix ret(res_x, res_y);
  double x_step = (x_max - x_min) / res_x;
  double y_step = (y_max - y_min) / res_y;
  int r,c;
#pragma omp parallel for default(shared) private(c) schedule(dynamic,1) collapse(2)
  for (r = 0; r < res_y; r++) {
    for (c = 0; c < res_x; c++) {
      double zx = 0.0, zy = 0.0, new_zx;
      double cx = x_min + c*x_step, cy = y_min + r*y_step;
      int n = 0;
      for (n=0;  (zx*zx + zy*zy < 4.0 ) && ( n < nb_iter ); n++ ) {
        new_zx = zx*zx - zy*zy + cx;
        zy = 2.0*zx*zy + cy;
        zx = new_zx;
      }
      ret(c,r) = n;
    }
  }
  return ret;
}

And then in R:

library(Rcpp)
sourceCpp("mandelRcpp.cpp")
xlims=c(-0.74877,-0.74872);
ylims=c(0.065053,0.065103);
x_res=y_res=1080L; nb_iter=10000L;
system.time(m <- mandelRcpp(xlims[[1]], xlims[[2]], ylims[[1]], ylims[[2]], x_res, y_res, nb_iter)) 
# 0.92s
rainbow=c(rgb(0.47,0.11,0.53),rgb(0.27,0.18,0.73),rgb(0.25,0.39,0.81),rgb(0.30,0.57,0.75),rgb(0.39,0.67,0.60),rgb(0.51,0.73,0.44),rgb(0.67,0.74,0.32),rgb(0.81,0.71,0.26),rgb(0.89,0.60,0.22),rgb(0.89,0.39,0.18),rgb(0.86,0.13,0.13))
    cols=c(colorRampPalette(rainbow)(100),rev(colorRampPalette(rainbow)(100)),"black") # palette
par(mar=c(0, 0, 0, 0))
system.time(image(m^(1/7), col=cols, asp=diff(ylims)/diff(xlims), axes=F, useRaster=T)) 
# 0.5s

enter image description here

I was unsure though if there is any other obvious speed improvements I could take advantage of aside from OpenMP multithreading, e.g. via simd vectorization? (using simd options in the openmp #pragma didn't seem to do anything)

PS at first my code was crashing but I later found this was solved by replacing ret[r,c] = n; with ret(r,c) = n; Using Armadillo classes as suggested in the answer below make things very slightly faster, though the timings are almost the same. Also flipped around x and y so it comes out in the right orientation when plotted with image(). Using 8 threads speed is ca. 350 times faster than the vectorized plain R Mandelbrot version here and also about 7.3 times faster than the (non-multithreaded) Python/Numba version here (similar to PyCUDA or PyOpenCL speeds), so quite happy with that... Rasterizing/display now seems the bottleneck in R....

6
votes
coatless replied Jan 03 '18 at 02:04
Click to expand answer

As an OpenMP & Rcpp performance test I wanted to check how fast I could calculate the Mandelbrot set in R using the most straightforward and simple Rcpp+OpenMP implementation. Currently what I did was:

#include <Rcpp.h>
#include <omp.h>
// [[Rcpp::plugins(openmp)]]

using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericMatrix mandelRcpp(const double x_min, const double x_max, const double y_min, const double y_max,
                         const int res_x, const int res_y, const int nb_iter) {
  Rcpp::NumericMatrix ret(res_x, res_y);
  double x_step = (x_max - x_min) / res_x;
  double y_step = (y_max - y_min) / res_y;
  int r,c;
#pragma omp parallel for default(shared) private(c) schedule(dynamic,1) collapse(2)
  for (r = 0; r < res_y; r++) {
    for (c = 0; c < res_x; c++) {
      double zx = 0.0, zy = 0.0, new_zx;
      double cx = x_min + c*x_step, cy = y_min + r*y_step;
      int n = 0;
      for (n=0;  (zx*zx + zy*zy < 4.0 ) && ( n < nb_iter ); n++ ) {
        new_zx = zx*zx - zy*zy + cx;
        zy = 2.0*zx*zy + cy;
        zx = new_zx;
      }
      ret(c,r) = n;
    }
  }
  return ret;
}

And then in R:

library(Rcpp)
sourceCpp("mandelRcpp.cpp")
xlims=c(-0.74877,-0.74872);
ylims=c(0.065053,0.065103);
x_res=y_res=1080L; nb_iter=10000L;
system.time(m <- mandelRcpp(xlims[[1]], xlims[[2]], ylims[[1]], ylims[[2]], x_res, y_res, nb_iter)) 
# 0.92s
rainbow=c(rgb(0.47,0.11,0.53),rgb(0.27,0.18,0.73),rgb(0.25,0.39,0.81),rgb(0.30,0.57,0.75),rgb(0.39,0.67,0.60),rgb(0.51,0.73,0.44),rgb(0.67,0.74,0.32),rgb(0.81,0.71,0.26),rgb(0.89,0.60,0.22),rgb(0.89,0.39,0.18),rgb(0.86,0.13,0.13))
    cols=c(colorRampPalette(rainbow)(100),rev(colorRampPalette(rainbow)(100)),"black") # palette
par(mar=c(0, 0, 0, 0))
system.time(image(m^(1/7), col=cols, asp=diff(ylims)/diff(xlims), axes=F, useRaster=T)) 
# 0.5s

enter image description here

I was unsure though if there is any other obvious speed improvements I could take advantage of aside from OpenMP multithreading, e.g. via simd vectorization? (using simd options in the openmp #pragma didn't seem to do anything)

PS at first my code was crashing but I later found this was solved by replacing ret[r,c] = n; with ret(r,c) = n; Using Armadillo classes as suggested in the answer below make things very slightly faster, though the timings are almost the same. Also flipped around x and y so it comes out in the right orientation when plotted with image(). Using 8 threads speed is ca. 350 times faster than the vectorized plain R Mandelbrot version here and also about 7.3 times faster than the (non-multithreaded) Python/Numba version here (similar to PyCUDA or PyOpenCL speeds), so quite happy with that... Rasterizing/display now seems the bottleneck in R....

62
votes
90.7k
views

Practical limits of R data frame

[see original]
rperformancedataframercpp
Egon asked Mar 08 '11 at 14:24
Click to expand question

I have been reading about how read.table is not efficient for large data files. Also how R is not suited for large data sets. So I was wondering where I can find what the practical limits are and any performance charts for (1) Reading in data of various sizes (2) working with data of varying sizes.

In effect, I want to know when the performance deteriorates and when I hit a road block. Also any comparison against C++/MATLAB or other languages would be really helpful. finally if there is any special performance comparison for Rcpp and RInside, that would be great!

61
votes
Allan Engelhardt replied Mar 08 '11 at 15:07
Click to expand answer

I have been reading about how read.table is not efficient for large data files. Also how R is not suited for large data sets. So I was wondering where I can find what the practical limits are and any performance charts for (1) Reading in data of various sizes (2) working with data of varying sizes.

In effect, I want to know when the performance deteriorates and when I hit a road block. Also any comparison against C++/MATLAB or other languages would be really helpful. finally if there is any special performance comparison for Rcpp and RInside, that would be great!

16
votes
4.8k
views

Rcpp function check if missing value

[see original]
rrcpp
Giorgio Spedicato asked Oct 07 '14 at 16:45
Click to expand question

I'm converting R based code into Rcpp based code. The head of my function is:

NumericMatrix createMatrixOfLinkRatiosC(NumericMatrix matr, double threshold4Clean) {
int i,j; 
NumericMatrix myMatr(matr.nrow(),matr.ncol());
myMatr=matr;
....;

}

I want to handle call to the function where threshold4Clean is missing but I'm not finding how to do... Any help will be greatly appreciated.

33
votes
Kevin Ushey replied Oct 08 '14 at 17:35
Click to expand answer

I'm converting R based code into Rcpp based code. The head of my function is:

NumericMatrix createMatrixOfLinkRatiosC(NumericMatrix matr, double threshold4Clean) {
int i,j; 
NumericMatrix myMatr(matr.nrow(),matr.ncol());
myMatr=matr;
....;

}

I want to handle call to the function where threshold4Clean is missing but I'm not finding how to do... Any help will be greatly appreciated.

19
votes
3.3k
views

using C function from other package in Rcpp

[see original]
c++crrcpp
baptiste asked Dec 09 '13 at 15:28
Click to expand question

I'm trying to call a C routine from the cubature package in a c++ function to perform multidimensional integration.

The basic R example I'm trying to reproduce is

library(cubature)
integrand <- function(x) sin(x)
adaptIntegrate(integrand, 0, pi)

I could just call this R function from Rcpp following this recipe from the gallery, but there would be some performance penalty in switching back and forth from c/c++ to R. It seems more sensible to directly call the C function from C++.

The C routine adapt_integrate is exported from cubature with

 // R_RegisterCCallable("cubature", "adapt_integrate", (DL_FUNC) adapt_integrate);

I don't understand how to call it from c++, however. Here's my lame attempt,

sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double integrand(double x){
 return(sin(x));
}

// [[Rcpp::depends(cubature)]]
// [[Rcpp::export]]
Rcpp::List integratecpp(double llim, double ulim)
{
  Rcpp::Function p_cubature = R_GetCCallable("cubature", "adapt_integrate");

  Rcpp::List result = p_cubature(integrand, llim, ulim);
  return(result);
}
'
)

integratecpp(0, pi)

This fails to compile; clearly I'm doing something very silly and missing some important steps to convert the output of R_GetCCallable into an Rcpp::Function (or call it directly?). I've read several related posts dealing with function pointers, but haven't seen an example using an external C function.

11
votes
Romain Francois replied Dec 09 '13 at 16:51
Click to expand answer

I'm trying to call a C routine from the cubature package in a c++ function to perform multidimensional integration.

The basic R example I'm trying to reproduce is

library(cubature)
integrand <- function(x) sin(x)
adaptIntegrate(integrand, 0, pi)

I could just call this R function from Rcpp following this recipe from the gallery, but there would be some performance penalty in switching back and forth from c/c++ to R. It seems more sensible to directly call the C function from C++.

The C routine adapt_integrate is exported from cubature with

 // R_RegisterCCallable("cubature", "adapt_integrate", (DL_FUNC) adapt_integrate);

I don't understand how to call it from c++, however. Here's my lame attempt,

sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double integrand(double x){
 return(sin(x));
}

// [[Rcpp::depends(cubature)]]
// [[Rcpp::export]]
Rcpp::List integratecpp(double llim, double ulim)
{
  Rcpp::Function p_cubature = R_GetCCallable("cubature", "adapt_integrate");

  Rcpp::List result = p_cubature(integrand, llim, ulim);
  return(result);
}
'
)

integratecpp(0, pi)

This fails to compile; clearly I'm doing something very silly and missing some important steps to convert the output of R_GetCCallable into an Rcpp::Function (or call it directly?). I've read several related posts dealing with function pointers, but haven't seen an example using an external C function.

5
votes
2k
views

Call a function from c++ via environment Rcpp

[see original]
rrcpp
skyindeer asked Apr 13 '16 at 09:48
Click to expand question

I am considering calling a R function from c++ via environment, but I got an error, here is what I did

#include <Rcpp.h>
using namespace Rcpp;



// [[Rcpp::export]]
NumericVector call(NumericVector x){
  Environment env = Environment::global_env();
  Function f = env["fivenum"];
  NumericVector res = f(x);
  return res;
}

Type call(x), this is what I got,

Error: cannot convert to function

I know I can do it right in another way,

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector callFunction(NumericVector x, Function f) {
    NumericVector res = f(x);
    return res;
}

and type

callFunction(x,fivenum)

But still wondering why first method failed.

10
votes
digEmAll replied Apr 13 '16 at 10:13
Click to expand answer

I am considering calling a R function from c++ via environment, but I got an error, here is what I did

#include <Rcpp.h>
using namespace Rcpp;



// [[Rcpp::export]]
NumericVector call(NumericVector x){
  Environment env = Environment::global_env();
  Function f = env["fivenum"];
  NumericVector res = f(x);
  return res;
}

Type call(x), this is what I got,

Error: cannot convert to function

I know I can do it right in another way,

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector callFunction(NumericVector x, Function f) {
    NumericVector res = f(x);
    return res;
}

and type

callFunction(x,fivenum)

But still wondering why first method failed.

3
votes
1.9k
views

Using Rcpp functions inside of R's par*apply functions from the parallel package

[see original]
rparallel-processingrcpp
Scott White asked Jul 22 '16 at 04:50
Click to expand question

I'm trying to understand what is happening behind the Rcpp::sourceCpp() call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.

Within this post, Dirk said,

"You need to run the sourceCpp() call in each spawned process, or else get them your code."

This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:

clusterExport(cl = cl, varlist = "payoff")

I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport() is for.

8
votes
coatless replied Jul 22 '16 at 05:11
Click to expand answer

I'm trying to understand what is happening behind the Rcpp::sourceCpp() call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.

Within this post, Dirk said,

"You need to run the sourceCpp() call in each spawned process, or else get them your code."

This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:

clusterExport(cl = cl, varlist = "payoff")

I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport() is for.

14
votes
2.2k
views

how can I handle vectors without knowing the type in Rcpp

[see original]
rrcpp
eddi asked Nov 06 '13 at 22:12
Click to expand question

I want to replicate the following R function in Rcpp:

fR = function(x) x[1:2]

fR(c(1,2,3))
#[1] 1 2
fR(c('a','b','c'))
#[1] "a" "b"

I can do it for a fixed output type like so:

library(inline)
library(Rcpp)

fint = cxxfunction(signature(x = "SEXP"), '
          List xin(x);
          IntegerVector xout;

          for (int i = 0; i < 2; ++i) xout.push_back(xin[i]);

          return xout;', plugin = "Rcpp")

But this will only work for integers, and if I try replacing the xout type with List (or GenericVector, which are the same) - it works with any input type, but I get back a list instead of a vector.

What's the correct Rcpp way of doing this?

13
votes
Romain Francois replied Nov 07 '13 at 06:33
Click to expand answer

I want to replicate the following R function in Rcpp:

fR = function(x) x[1:2]

fR(c(1,2,3))
#[1] 1 2
fR(c('a','b','c'))
#[1] "a" "b"

I can do it for a fixed output type like so:

library(inline)
library(Rcpp)

fint = cxxfunction(signature(x = "SEXP"), '
          List xin(x);
          IntegerVector xout;

          for (int i = 0; i < 2; ++i) xout.push_back(xin[i]);

          return xout;', plugin = "Rcpp")

But this will only work for integers, and if I try replacing the xout type with List (or GenericVector, which are the same) - it works with any input type, but I get back a list instead of a vector.

What's the correct Rcpp way of doing this?

10
votes
7.4k
views

calling a user-defined R function from C++ using Rcpp

[see original]
rrcpp
Sam asked Jan 20 '14 at 03:11
Click to expand question

I have an R code with a bunch of user-defined R functions. I'm trying to make the code run faster and of course the best option is to use Rcpp. My code involves functions that call each other. Therefore, If I write some functions in C++, I should be able to call and to run some of my R functions in my c++ code. In a simple example consider the code below in R:

mySum <- function(x, y){
 return(2*x + 3*y)
}
x <<- 1
y <<- 1

Now consider the C++ code in which I'm trying to access the function above:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
int mySuminC(){
 Environment myEnv = Environment::global_env();
 Function mySum = myEnv["mySum"];
 int x = myEnv["x"];
 int y = myEnv["y"];
 return wrap(mySum(Rcpp::Named("x", x), Rcpp::Named("y", y)));
 }

When I source the file in R with the inline function sourceCpp(), I get the error:

 "invalid conversion from 'SEXPREC*' to int

Could anyone help me on debugging the code? Is my code efficient? Can it be summarized? Is there any better idea to use mySum function than what I did in my code?

Thanks very much for your help.

14
votes
Kevin Ushey replied Jan 20 '14 at 03:42
Click to expand answer

I have an R code with a bunch of user-defined R functions. I'm trying to make the code run faster and of course the best option is to use Rcpp. My code involves functions that call each other. Therefore, If I write some functions in C++, I should be able to call and to run some of my R functions in my c++ code. In a simple example consider the code below in R:

mySum <- function(x, y){
 return(2*x + 3*y)
}
x <<- 1
y <<- 1

Now consider the C++ code in which I'm trying to access the function above:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
int mySuminC(){
 Environment myEnv = Environment::global_env();
 Function mySum = myEnv["mySum"];
 int x = myEnv["x"];
 int y = myEnv["y"];
 return wrap(mySum(Rcpp::Named("x", x), Rcpp::Named("y", y)));
 }

When I source the file in R with the inline function sourceCpp(), I get the error:

 "invalid conversion from 'SEXPREC*' to int

Could anyone help me on debugging the code? Is my code efficient? Can it be summarized? Is there any better idea to use mySum function than what I did in my code?

Thanks very much for your help.

9
votes
10k
views

Call R functions in Rcpp

[see original]
rrcpp
Yuge Hao asked Jun 24 '16 at 15:20
Click to expand question

I was trying to call the sd(x), which is a R function, in Rcpp. I've seen an example of calling the R function dbate(x) in Rcpp and it works perfectly.

// dens calls the pdf of beta distribution in R
//[[Rcpp::export]]
double dens(double x, double a, double b)
{
    return R::dbeta(x,a,b,false);
}

But when I tired to apply this method to sd(x) as following, it went wrong.

// std calls the sd function in R
//[[Rcpp::export]]
double std(NumericVector x)
{
  return R::sd(x);
}

Does anyone know why this doesn't work?

16
votes
coatless replied Jun 24 '16 at 16:03
Click to expand answer

I was trying to call the sd(x), which is a R function, in Rcpp. I've seen an example of calling the R function dbate(x) in Rcpp and it works perfectly.

// dens calls the pdf of beta distribution in R
//[[Rcpp::export]]
double dens(double x, double a, double b)
{
    return R::dbeta(x,a,b,false);
}

But when I tired to apply this method to sd(x) as following, it went wrong.

// std calls the sd function in R
//[[Rcpp::export]]
double std(NumericVector x)
{
  return R::sd(x);
}

Does anyone know why this doesn't work?

9
votes
1.4k
views

In Rcpp, how to get a user-defined structure from C into R

[see original]
rcpp
Mark Bower asked Jun 29 '18 at 22:45
Click to expand question

Am using Rcpp packages and can get my C function to compile and run in R, but now I want to return a large, user-defined data structure to R. The fields in the structure are either numbers or strings - no new or odd types within the structure. The example below is simplified and doesn't compile, but it conveys the idea of my problem.

typedef struct {
  char*    firstname[128];
  char*    lastname[128];
  int      nbrOfSamples;
} HEADER_INFO;

// [[Rcpp::export]]
HEADER_INFO* read_header(Rcpp::StringVector strings) {
  FILE *fp;
  MEF_HEADER_INFO *header;
    
  char * filename = (char*)(strings(0));
  char * password = (char*)(strings(1));
    
  header = (HEADER_INFO*)malloc(sizeof(HEADER_INFO));
  memset(header, 0, sizeof(HEADER_INFO));
    
  fp = fopen(filename, "r");
  (void)read_header(header, password);
  return header;
}

I'm pretty sure that I could package the entries in the header back into a StringVector, but that seems like a brute-force approach. My question is whether a more elegant solution exists. It is not clear to me what form such a structure would even have in R: a named List?

Thanks!

12
votes
Ralf Stubner replied Jul 01 '18 at 08:58
Click to expand answer

Am using Rcpp packages and can get my C function to compile and run in R, but now I want to return a large, user-defined data structure to R. The fields in the structure are either numbers or strings - no new or odd types within the structure. The example below is simplified and doesn't compile, but it conveys the idea of my problem.

typedef struct {
  char*    firstname[128];
  char*    lastname[128];
  int      nbrOfSamples;
} HEADER_INFO;

// [[Rcpp::export]]
HEADER_INFO* read_header(Rcpp::StringVector strings) {
  FILE *fp;
  MEF_HEADER_INFO *header;
    
  char * filename = (char*)(strings(0));
  char * password = (char*)(strings(1));
    
  header = (HEADER_INFO*)malloc(sizeof(HEADER_INFO));
  memset(header, 0, sizeof(HEADER_INFO));
    
  fp = fopen(filename, "r");
  (void)read_header(header, password);
  return header;
}

I'm pretty sure that I could package the entries in the header back into a StringVector, but that seems like a brute-force approach. My question is whether a more elegant solution exists. It is not clear to me what form such a structure would even have in R: a named List?

Thanks!

6
votes
3.6k
views

Rcpp function to select (and to return) a sub-dataframe

[see original]
rrcpp
Sam asked Apr 03 '14 at 05:11
Click to expand question

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this question, returning a sub-dataframe) ? My code below may make my question more clear:

code:

# Suppose I have the data frame below created in R:
myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10))

# Suppose I want to write a C++ function that gets id as inout and returns 
# a sub-dataframe corresponding to that id (**If it's possible to return 
# DataFrame in C++**)

# Auxiliary function --> helps get a sub vector:
arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){
  arma::uvec AuxVec = find(IDVec == ID);
  arma::vec rslt = arma::vec(AuxVec.size());
  for (int i = 0; i < AuxVec.size(); i++){
    rslt[i] = vecMain[AuxVec[i]];
  }
  return rslt;
}

# Here is my C++ function:
Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){
  arma::vec id = Rcpp::as<arma::vec>(myDF["id"]);
  arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
  arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]);

  // here I take a sub-vector:
  arma::vec id_sub = myVecSubset(id, id, int ID);
  arma::vec alph_sub = myVecSubset(alph, id, int ID);
  arma::vec mess_sub = myVecSubset(mess, id, int ID);

  // here is the CHALLENGE: How to combine these vectors into a new data frame???
  ???
}

In summary, there are actually two main question: 1) Is there any better way to take the sub-dataframe above in C++? (wish I could simple say myDF[myDF$id == ID,]!!!)

2) Is there anyway that I can combine id_sub, alpha_sub, and mess_sub into an R data frame and return it?

I really appreciate your help.

10
votes
Kevin Ushey replied Apr 03 '14 at 09:28
Click to expand answer

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this question, returning a sub-dataframe) ? My code below may make my question more clear:

code:

# Suppose I have the data frame below created in R:
myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10))

# Suppose I want to write a C++ function that gets id as inout and returns 
# a sub-dataframe corresponding to that id (**If it's possible to return 
# DataFrame in C++**)

# Auxiliary function --> helps get a sub vector:
arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){
  arma::uvec AuxVec = find(IDVec == ID);
  arma::vec rslt = arma::vec(AuxVec.size());
  for (int i = 0; i < AuxVec.size(); i++){
    rslt[i] = vecMain[AuxVec[i]];
  }
  return rslt;
}

# Here is my C++ function:
Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){
  arma::vec id = Rcpp::as<arma::vec>(myDF["id"]);
  arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
  arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]);

  // here I take a sub-vector:
  arma::vec id_sub = myVecSubset(id, id, int ID);
  arma::vec alph_sub = myVecSubset(alph, id, int ID);
  arma::vec mess_sub = myVecSubset(mess, id, int ID);

  // here is the CHALLENGE: How to combine these vectors into a new data frame???
  ???
}

In summary, there are actually two main question: 1) Is there any better way to take the sub-dataframe above in C++? (wish I could simple say myDF[myDF$id == ID,]!!!)

2) Is there anyway that I can combine id_sub, alpha_sub, and mess_sub into an R data frame and return it?

I really appreciate your help.

4
votes
2.4k
views

Changing R's Seed from Rcpp to Guarantee Reproducibility

[see original]
c++rrcpp
Yuge Hao asked Apr 05 '17 at 04:13
Click to expand question

I am trying to write a function r(d, n) in rcpp. The function returns n random draws from normal distribution N(0, d). This function should be well defined, therefore the function should return the same draws whenever the d and n do not change their value.

This won't be a problem if d is restricted to be integer, in which case I can set seed and do the job

// set seed
// [[Rcpp::export]]
void set_seed(unsigned int seed) {
  Rcpp::Environment base_env("package:base");
  Rcpp::Function set_seed_r = base_env["set.seed"];
  set_seed_r(seed);  
}

// function r(d, n)
// [[Rcpp::export]]
vec randdraw(int d, int n){
  set_seed(d);
  vec out = randn(n);
  return out;
}

But clearly I don't want to restrict d to be integer. Ideally d should be double. Any thoughts? Thank you!

7
votes
coatless replied Apr 05 '17 at 05:58
Click to expand answer

I am trying to write a function r(d, n) in rcpp. The function returns n random draws from normal distribution N(0, d). This function should be well defined, therefore the function should return the same draws whenever the d and n do not change their value.

This won't be a problem if d is restricted to be integer, in which case I can set seed and do the job

// set seed
// [[Rcpp::export]]
void set_seed(unsigned int seed) {
  Rcpp::Environment base_env("package:base");
  Rcpp::Function set_seed_r = base_env["set.seed"];
  set_seed_r(seed);  
}

// function r(d, n)
// [[Rcpp::export]]
vec randdraw(int d, int n){
  set_seed(d);
  vec out = randn(n);
  return out;
}

But clearly I don't want to restrict d to be integer. Ideally d should be double. Any thoughts? Thank you!

23
votes
11.2k
views

Should I prefer Rcpp::NumericVector over std::vector?

[see original]
c++rcpp
Tim asked Jan 11 '17 at 22:38
Click to expand question

Is there any reason why I should prefer Rcpp::NumericVector over std::vector<double>?

For example, the two functions below

// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector& x) {
  Rcpp::NumericVector tmp(x.length());
  for (int i = 0; i < x.length(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

// [[Rcpp::export]]
std::vector<double> bar(const std::vector<double>& x) {
  std::vector<double> tmp(x.size());
  for (int i = 0; i < x.size(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

Are equivalent when considering their working and benchmarked performance. I understand that Rcpp offers sugar and vectorized operations, but if it is only about taking R's vector as input and returning vector as output, then would there be any difference which one of those I use? Can using std::vector<double> lead to any possible problems when interacting with R?

31
votes
coatless replied Jan 11 '17 at 22:59
Click to expand answer

Is there any reason why I should prefer Rcpp::NumericVector over std::vector<double>?

For example, the two functions below

// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector& x) {
  Rcpp::NumericVector tmp(x.length());
  for (int i = 0; i < x.length(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

// [[Rcpp::export]]
std::vector<double> bar(const std::vector<double>& x) {
  std::vector<double> tmp(x.size());
  for (int i = 0; i < x.size(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

Are equivalent when considering their working and benchmarked performance. I understand that Rcpp offers sugar and vectorized operations, but if it is only about taking R's vector as input and returning vector as output, then would there be any difference which one of those I use? Can using std::vector<double> lead to any possible problems when interacting with R?

14
votes
6.1k
views

1-dimensional Matrix is changed to a vector in R

[see original]
rvectormatrixrcpp
Jona asked Mar 30 '12 at 19:19
Click to expand question
> a<-matrix(c(1:9),3,3)
> a
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> a[3,]*a[,3]  # I expect 1x1 matrix as result of this.
[1] 21 48 81
> class(a)
[1] "matrix"
> class(a[3,])
[1] "integer"

In R, 1-dimensional matrix is changed to a vector. Can I avoid this? I would like to keep 1-D matrix as a matrix. Actually, I need to throw many kind of matrix to RcppArmadillo, even zero-D matrix. Changing matrix to vector by itself is my problem.

19
votes
joran replied Mar 30 '12 at 19:21
Click to expand answer
> a<-matrix(c(1:9),3,3)
> a
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> a[3,]*a[,3]  # I expect 1x1 matrix as result of this.
[1] 21 48 81
> class(a)
[1] "matrix"
> class(a[3,])
[1] "integer"

In R, 1-dimensional matrix is changed to a vector. Can I avoid this? I would like to keep 1-D matrix as a matrix. Actually, I need to throw many kind of matrix to RcppArmadillo, even zero-D matrix. Changing matrix to vector by itself is my problem.

13
votes
3.6k
views

efficiently locf by groups in a single R data.table

[see original]
rdataframedata.tabledplyrrcpp
carl.anderson asked May 05 '16 at 20:59
Click to expand question

I have a large, wide data.table (20m rows) keyed by a person ID but with lots of columns (~150) that have lots of null values. Each column is a recorded state / attribute that I wish to carry forward for each person. Each person may have anywhere from 10 to 10,000 observations and there are about 500,000 people in the set. Values from one person can not 'bleed' into the following person, so my solution must respect the person ID column and group appropriately.

For demonstration purposes - here's a very small sample input:

DT = data.table(
  id=c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
  aa=c("A", NA, "B", "C", NA, NA, "D", "E", "F", NA, NA, NA),
  bb=c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
  cc=c(1, NA, NA, NA, NA, 4, NA, 5, 6, NA, 7, NA)
)

It looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1 NA NA NA
 3:  1  B NA NA
 4:  1  C NA NA
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA NA
 8:  2  E NA  5
 9:  3  F NA  6
10:  3 NA NA NA
11:  3 NA NA  7
12:  3 NA NA NA

My expected output looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1  A NA  1
 3:  1  B NA  1
 4:  1  C NA  1
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA  4
 8:  2  E NA  5
 9:  3  F NA  6
10:  3  F NA  6
11:  3  F NA  7
12:  3  F NA  7

I've found a data.table solution that works, but it's terribly slow on my large data sets:

DT[, na.locf(.SD, na.rm=FALSE), by=id]

I've found equivalent solutions using dplyr that are equally slow.

GRP = DT %>% group_by(id)
data.table(GRP %>% mutate_each(funs(blah=na.locf(., na.rm=FALSE))))

I was hopeful that I could come up with a rolling 'self' join using the data.table functionality, but I just can't seem to get it right (I suspect I would need to use .N but I just haven't figured it out).

At this point I'm thinking I'll have to write something in Rcpp to efficiently apply the grouped locf.

I'm new to R, but I'm not new to C++ - so I'm confident I can do it. I just feel like there should be an efficient way to do this in R using data.table.

28
votes
alexis_laz replied May 06 '16 at 09:16
Click to expand answer

I have a large, wide data.table (20m rows) keyed by a person ID but with lots of columns (~150) that have lots of null values. Each column is a recorded state / attribute that I wish to carry forward for each person. Each person may have anywhere from 10 to 10,000 observations and there are about 500,000 people in the set. Values from one person can not 'bleed' into the following person, so my solution must respect the person ID column and group appropriately.

For demonstration purposes - here's a very small sample input:

DT = data.table(
  id=c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
  aa=c("A", NA, "B", "C", NA, NA, "D", "E", "F", NA, NA, NA),
  bb=c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
  cc=c(1, NA, NA, NA, NA, 4, NA, 5, 6, NA, 7, NA)
)

It looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1 NA NA NA
 3:  1  B NA NA
 4:  1  C NA NA
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA NA
 8:  2  E NA  5
 9:  3  F NA  6
10:  3 NA NA NA
11:  3 NA NA  7
12:  3 NA NA NA

My expected output looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1  A NA  1
 3:  1  B NA  1
 4:  1  C NA  1
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA  4
 8:  2  E NA  5
 9:  3  F NA  6
10:  3  F NA  6
11:  3  F NA  7
12:  3  F NA  7

I've found a data.table solution that works, but it's terribly slow on my large data sets:

DT[, na.locf(.SD, na.rm=FALSE), by=id]

I've found equivalent solutions using dplyr that are equally slow.

GRP = DT %>% group_by(id)
data.table(GRP %>% mutate_each(funs(blah=na.locf(., na.rm=FALSE))))

I was hopeful that I could come up with a rolling 'self' join using the data.table functionality, but I just can't seem to get it right (I suspect I would need to use .N but I just haven't figured it out).

At this point I'm thinking I'll have to write something in Rcpp to efficiently apply the grouped locf.

I'm new to R, but I'm not new to C++ - so I'm confident I can do it. I just feel like there should be an efficient way to do this in R using data.table.

13
votes
6.9k
views

How to build a R package which use Rcpp with external c++ libraries?

[see original]
rcpp
Xiaobo Gu asked Sep 02 '13 at 10:04
Click to expand question

Such as boost, where can I specify the following:

1.External c++ header file include path 
2.External c++ source file 
3.External c++ link library file path
15
votes
Dirk is no longer here replied Sep 02 '13 at 18:01
Click to expand answer

Such as boost, where can I specify the following:

1.External c++ header file include path 
2.External c++ source file 
3.External c++ link library file path
9
votes
493
views

Group vector on conditional sum

[see original]
rperformancercpp
Sotos asked Aug 07 '17 at 14:58
Click to expand question

I want to group a vector based on the sum of the elements being less than or equal to n. Assume the following,

set.seed(1)
x <- sample(10, 20, replace = TRUE)
#[1]  3  4  6 10  3  9 10  7  7  1  3  2  7  4  8  5  8 10  4  8

#Where,
n = 15

The expected output would be to group values while their sum is <= 15, i.e.

y <- c(1, 1, 1, 2, 2, 3, 4, 5 ,5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10)

As you can see the sum is never greater than 15,

sapply(split(x, y), sum)
# 1  2  3  4  5  6  7  8  9 10 
#13 13  9 10 15 12 12 13 14  8 

NOTE: I will be running this on huge datasets (usually > 150 - 200GB) so efficiency is a must.

A method that I tried and comes close but fails is,

as.integer(cut(cumsum(x), breaks = seq(0, max(cumsum(x)) + 15, 15)))
#[1] 1 1 1 2 2 3 3 4 4 4 5 5 5 6 6 6 7 8 8 8
4
votes
David replied Aug 07 '17 at 16:23
Click to expand answer

I want to group a vector based on the sum of the elements being less than or equal to n. Assume the following,

set.seed(1)
x <- sample(10, 20, replace = TRUE)
#[1]  3  4  6 10  3  9 10  7  7  1  3  2  7  4  8  5  8 10  4  8

#Where,
n = 15

The expected output would be to group values while their sum is <= 15, i.e.

y <- c(1, 1, 1, 2, 2, 3, 4, 5 ,5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10)

As you can see the sum is never greater than 15,

sapply(split(x, y), sum)
# 1  2  3  4  5  6  7  8  9 10 
#13 13  9 10 15 12 12 13 14  8 

NOTE: I will be running this on huge datasets (usually > 150 - 200GB) so efficiency is a must.

A method that I tried and comes close but fails is,

as.integer(cut(cumsum(x), breaks = seq(0, max(cumsum(x)) + 15, 15)))
#[1] 1 1 1 2 2 3 3 4 4 4 5 5 5 6 6 6 7 8 8 8
8
votes
1.7k
views

Renaming and Hiding an Exported Rcpp function in an R Package

[see original]
rrcppr-packageroxygen2
Cello asked Sep 04 '17 at 14:24
Click to expand question

Writing an R package, I have a R function that calls a specific Rcpp function. The Rcpp function just serves as a helper function and I do not want to creat a .Rd file for it. My solution so far is to export both functions in the Namespace file which causes the warning to create an .Rd file for the Rcpp function as soon as I run the check command. If I remove the helper function in the Namespace file, I get rid of this warning causing now the problem that the R function is not able to find it anymore.

Is there a way to solve this problem. That means to make the Rcpp function still visible for the R function and at the same time to get rid of the warning that there exists no .Rd file for the Rcpp function?

Thanks a lot :-)

10
votes
coatless replied Sep 04 '17 at 15:16
Click to expand answer

Writing an R package, I have a R function that calls a specific Rcpp function. The Rcpp function just serves as a helper function and I do not want to creat a .Rd file for it. My solution so far is to export both functions in the Namespace file which causes the warning to create an .Rd file for the Rcpp function as soon as I run the check command. If I remove the helper function in the Namespace file, I get rid of this warning causing now the problem that the R function is not able to find it anymore.

Is there a way to solve this problem. That means to make the Rcpp function still visible for the R function and at the same time to get rid of the warning that there exists no .Rd file for the Rcpp function?

Thanks a lot :-)

3
votes
601
views

how many vectors can be added in DataFrame::create( vec1, vec2 ... )?

[see original]
rcpp
Chris Snow asked Dec 09 '14 at 04:40
Click to expand question

I am creating a DataFrame to hold a parsed haproxy http log files which has quite a few fields (25+).

If I add more than 20 vectors (one for each field), I get the compilation error:

no matching function call to 'create'

The create method:

    return DataFrame::create(
      _["clientIp"]     = clientIp,
      _["clientPort"]   = clientPort,
      _["acceptDate"]   = acceptDate,
      _["frontendName"] = frontendName,
      _["backendName"]  = backendName,
      _["serverName"]   = serverName,
      _["tq"]           = tq,
      _["tw"]           = tw,
      _["tc"]           = tc,
      _["tr"]           = tr,
      _["tt"]           = tt,
      _["status_code"]  = statusCode,
      _["bytes_read"]   = bytesRead,

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedRequestCookie"]   = capturedRequestCookie,
#endif     

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedResponseCookie"]   = capturedResponseCookie,
#endif    

      _["terminationState"] = terminationState,
      _["actconn"]          = actconn,
      _["feconn"]           = feconn,
      _["beconn"]           = beconn,
      _["srv_conn"]         = srvConn,
      _["retries"]          = retries,
      _["serverQueue"]      = serverQueue,
      _["backendQueue"]     = backendQueue 
    );

Questions:

  1. Have I hit a hard limit?
  2. Is there a workaround to allow me to add more than 20 vectors to a data frame?
6
votes
Kevin Ushey replied Dec 09 '14 at 05:03
Click to expand answer

I am creating a DataFrame to hold a parsed haproxy http log files which has quite a few fields (25+).

If I add more than 20 vectors (one for each field), I get the compilation error:

no matching function call to 'create'

The create method:

    return DataFrame::create(
      _["clientIp"]     = clientIp,
      _["clientPort"]   = clientPort,
      _["acceptDate"]   = acceptDate,
      _["frontendName"] = frontendName,
      _["backendName"]  = backendName,
      _["serverName"]   = serverName,
      _["tq"]           = tq,
      _["tw"]           = tw,
      _["tc"]           = tc,
      _["tr"]           = tr,
      _["tt"]           = tt,
      _["status_code"]  = statusCode,
      _["bytes_read"]   = bytesRead,

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedRequestCookie"]   = capturedRequestCookie,
#endif     

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedResponseCookie"]   = capturedResponseCookie,
#endif    

      _["terminationState"] = terminationState,
      _["actconn"]          = actconn,
      _["feconn"]           = feconn,
      _["beconn"]           = beconn,
      _["srv_conn"]         = srvConn,
      _["retries"]          = retries,
      _["serverQueue"]      = serverQueue,
      _["backendQueue"]     = backendQueue 
    );

Questions:

  1. Have I hit a hard limit?
  2. Is there a workaround to allow me to add more than 20 vectors to a data frame?
2
votes
1.1k
views

Calling R's optim function from within C++ using Rcpp

[see original]
c++rrcpp
Siqi.Xiang asked Sep 28 '17 at 16:18
Click to expand question

I'm new for Rcpp, and in my code, I must call the R function "optim" from C++. I referred to many examples, but there is still a mistake: "static assertion failed: cannot convert type to SEXP". Here is my code, and the problem is the last function:

#include <RcppArmadillo.h>

using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;
using namespace std;
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
double fr(arma::colvec x){
  double result = 100 * (x(2) - x(1) * x(1)) * (x(2) - x(1) * x(1)) + (1 -     x(1)) * (1 - x(1));
  return result;  
} 


typedef double (*funcPtr)(arma::colvec x);

// [[Rcpp::export]]
XPtr<funcPtr> putFunPtrInXPtr(){
  return(XPtr<funcPtr>(new funcPtr(&fr)));
}


// [[Rcpp::export]]
arma::colvec callOptimFun(SEXP x) {

  RNGScope scope;

  Rcpp::Environment stats("package:stats");
  Rcpp::Function optim = stats["optim"];
  XPtr<funcPtr> xpfun = putFunPtrInXPtr();
  funcPtr fun = *xpfun;
  Rcpp::List y = optim(x, fun);
  arma::colvec r = y["par"];
  return r; 
}

Unfortunately, I have tried lots of methods for my last function, and all methods have the same error. These are my tries: 1.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

2.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

3.

// [[Rcpp::export]]
Rcpp::List callOptimFun(SEXP x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

I am not familiar with C++. What may be the problem? Thanks!

4
votes
coatless replied Sep 28 '17 at 17:35
Click to expand answer

I'm new for Rcpp, and in my code, I must call the R function "optim" from C++. I referred to many examples, but there is still a mistake: "static assertion failed: cannot convert type to SEXP". Here is my code, and the problem is the last function:

#include <RcppArmadillo.h>

using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;
using namespace std;
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
double fr(arma::colvec x){
  double result = 100 * (x(2) - x(1) * x(1)) * (x(2) - x(1) * x(1)) + (1 -     x(1)) * (1 - x(1));
  return result;  
} 


typedef double (*funcPtr)(arma::colvec x);

// [[Rcpp::export]]
XPtr<funcPtr> putFunPtrInXPtr(){
  return(XPtr<funcPtr>(new funcPtr(&fr)));
}


// [[Rcpp::export]]
arma::colvec callOptimFun(SEXP x) {

  RNGScope scope;

  Rcpp::Environment stats("package:stats");
  Rcpp::Function optim = stats["optim"];
  XPtr<funcPtr> xpfun = putFunPtrInXPtr();
  funcPtr fun = *xpfun;
  Rcpp::List y = optim(x, fun);
  arma::colvec r = y["par"];
  return r; 
}

Unfortunately, I have tried lots of methods for my last function, and all methods have the same error. These are my tries: 1.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

2.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

3.

// [[Rcpp::export]]
Rcpp::List callOptimFun(SEXP x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

I am not familiar with C++. What may be the problem? Thanks!

44
votes
12.8k
views

Using 3rd party header files with Rcpp

[see original]
c++rrcpp
politicalEconomist asked Dec 21 '12 at 18:23
Click to expand question

I have a header file called coolStuff.h that contains a function awesomeSauce(arg1) that I would like to use in my cpp source file.

Directory Structure:

  • RworkingDirectory
    • sourceCpp
      • theCppFile.cpp
    • cppHeaders
      • coolStuff.h

The Code:

#include <Rcpp.h>
#include <cppHeaders/coolStuff.h>
using namespace Rcpp;

// [[Rcpp::export]]
double someFunctionCpp(double someInput){

 double someOutput = awesomeSauce(someInput);

return someOutput;
}

I get the error:

 theCppFile.cpp:2:31: error: cppHeaders/coolStuff.h: No such file or directory

I have moved the file and directory all over the place and can't seem to get this to work. I see examples all over the place of using 3rd party headers that say just do this:

#include <boost/array.hpp>

(Thats from Hadley/devtools)

https://github.com/hadley/devtools/wiki/Rcpp

So what gives? I have been searching all morning and can't find an answer to what seems to me like a simple thing.

UPDATE 01.11.12

Ok now that I have figured out how to build packages that use Rcpp in Rstudio let me rephrase the question. I have a stand alone header file coolStuff.h that contains a function I want to use in my cpp code.

1) Where should I place coolStuff.h in the package directory structure so the function it contains can be used by theCppFile.cpp?

2) How do I call coolStuff.h in the cpp files? Thanks again for your help. I learned a lot from the last conversation.

Note: I read the vignette "Writing a package that uses Rcpp" and it does not explain how to do this.

The Answer:

Ok let me summarize the answer to my question since it is scattered across this page. If I get a detail wrong feel free to edit this or let me know and I will edit it:

So you found a .h or .cpp file that contains a function or some other bit of code you want to use in a .cpp file you are writing to use with Rcpp.

Lets keep calling this found code coolStuff.h and call the function you want to use awesomeSauce(). Lets call the file you are writing theCppFile.cpp.

(I should note here that the code in .h files and in .cpp files is all C++ code and the difference between them is for the C++ programer to keep things organized in the proper way. I will leave a discussion of the difference out here, but a simple search here on SO will lead you to discussion of the difference. For you the R programer needing to use a bit o' code you found, there is no real difference.)

IN SHORT: You can use a file like coolStuff.h provided it calls no other libraries, by either cut-and-pasteing into theCppFile.cpp, or if you create a package you can place the file in the \src directory with the theCppFile.cpp file and use #include "coolStuff.h" at the top of the file you are writing. The latter is more flexible and allows you to use functions in coolStuff.h in other .cpp files.

DETAILS:

1) coolStuff.h must not call other libraries. So that means it cannot have any include statements at the top. If it does, what I detail below probably will not work, and the use of found code that calls other libraries is beyond the scope of this answer.

2) If you want to compile the file with sourceCpp() you need to cut and paste coolStuff.h into theCppFile.cpp. I am told there are exceptions, but sourceCpp() is designed to compile one .cpp file, so thats the best route to take.

(NOTE: I make no guarantees that a simple cut and paste will work out of the box. You may have to rename variables, or more likely switch the data types being used to be consistent with those you are using in theCppFile.cpp. But so far, cut-and-paste has worked with minimal fuss for me with 6 different simple .h files)

3) If you only need to use code from coolStuff.h in theCppFile.cpp and nowhere else, then you should cut and paste it into theCppFile.cpp.

(Again I make no guarantees see the note above about cut-and-paste)

4) If you want to use code contained in coolStuff.h in theCppFile.cpp AND other .cpp files, you need to look into building a package. This is not hard, but can be a bit tricky, because the information out there about building packages with Rcpp ranges from the exhaustive thorough documentation you want with any R package (but that is above your head as a newbie), and the newbie sensitive introductions (that may leave out a detail you happen to need).

Here is what I suggest:

A) First get a version of theCppFile.cpp with the code from coolStuff.h cut-and-paste into theCppFile.cpp that compiles with sourceCpp() and works as you expect it to. This is not a must, but if you are new to Rcpp OR packages, it is nice to make sure your code works in this simple situation before you move to the more complicated case below.

B) Now build your package using Rcpp.package.skeleton() or use the Build functionality in RStudio (HIGHLY recommended). You can find details about using Rcpp.package.skeleton() in hadley/devtools or Rcpp Attributes Vignette. The full documentation for writing packages with Rcpp is in the Writing a package that uses Rcpp, however this one assumes you know your way around C++ fairly well, and does not use the new "Attributes" way of doing Rcpp.

Don't forget to "Build & Reload" if using RStudio or compileAttributes() if you are not in RStudio.

C) Now you should see in your \R directory a file called RcppExports.R. Open it and check it out. In RcppExports.R you should see the R wrapper functions for all the .cpp files you have in your \src directory. Pretty sweet.

D) Try out the R function that corresponds to the function you wrote in theCppFile.cpp. Does it work? If so move on.

E) With your package built you can move coolStuff.h into the src folder with theCppFile.cpp.

F) Now you can remove the cut-and-paste code from theCppFile.cpp and at the top of theCppFile.cpp (and any other .cpp file you want to use code from coolStuff.h) put #include "coolStuff.h" just after #include <Rcpp.h>. Note that there are no brackets around ranker.h, rather there are "". This is a C++ convention when including local files provided by the user rather than a library file like Rcpp or STL etc...

G) Now you have to rebuild the package. In RStudio this is just "Build & Reload" in the Build menu. If you are not using RStudio you should run compileAttributes()

H) Now try the R function again just as you did in step D), hopefully it works.

25
votes
jjallaire replied Dec 21 '12 at 19:34
Click to expand answer

I have a header file called coolStuff.h that contains a function awesomeSauce(arg1) that I would like to use in my cpp source file.

Directory Structure:

  • RworkingDirectory
    • sourceCpp
      • theCppFile.cpp
    • cppHeaders
      • coolStuff.h

The Code:

#include <Rcpp.h>
#include <cppHeaders/coolStuff.h>
using namespace Rcpp;

// [[Rcpp::export]]
double someFunctionCpp(double someInput){

 double someOutput = awesomeSauce(someInput);

return someOutput;
}

I get the error:

 theCppFile.cpp:2:31: error: cppHeaders/coolStuff.h: No such file or directory

I have moved the file and directory all over the place and can't seem to get this to work. I see examples all over the place of using 3rd party headers that say just do this:

#include <boost/array.hpp>

(Thats from Hadley/devtools)

https://github.com/hadley/devtools/wiki/Rcpp

So what gives? I have been searching all morning and can't find an answer to what seems to me like a simple thing.

UPDATE 01.11.12

Ok now that I have figured out how to build packages that use Rcpp in Rstudio let me rephrase the question. I have a stand alone header file coolStuff.h that contains a function I want to use in my cpp code.

1) Where should I place coolStuff.h in the package directory structure so the function it contains can be used by theCppFile.cpp?

2) How do I call coolStuff.h in the cpp files? Thanks again for your help. I learned a lot from the last conversation.

Note: I read the vignette "Writing a package that uses Rcpp" and it does not explain how to do this.

The Answer:

Ok let me summarize the answer to my question since it is scattered across this page. If I get a detail wrong feel free to edit this or let me know and I will edit it:

So you found a .h or .cpp file that contains a function or some other bit of code you want to use in a .cpp file you are writing to use with Rcpp.

Lets keep calling this found code coolStuff.h and call the function you want to use awesomeSauce(). Lets call the file you are writing theCppFile.cpp.

(I should note here that the code in .h files and in .cpp files is all C++ code and the difference between them is for the C++ programer to keep things organized in the proper way. I will leave a discussion of the difference out here, but a simple search here on SO will lead you to discussion of the difference. For you the R programer needing to use a bit o' code you found, there is no real difference.)

IN SHORT: You can use a file like coolStuff.h provided it calls no other libraries, by either cut-and-pasteing into theCppFile.cpp, or if you create a package you can place the file in the \src directory with the theCppFile.cpp file and use #include "coolStuff.h" at the top of the file you are writing. The latter is more flexible and allows you to use functions in coolStuff.h in other .cpp files.

DETAILS:

1) coolStuff.h must not call other libraries. So that means it cannot have any include statements at the top. If it does, what I detail below probably will not work, and the use of found code that calls other libraries is beyond the scope of this answer.

2) If you want to compile the file with sourceCpp() you need to cut and paste coolStuff.h into theCppFile.cpp. I am told there are exceptions, but sourceCpp() is designed to compile one .cpp file, so thats the best route to take.

(NOTE: I make no guarantees that a simple cut and paste will work out of the box. You may have to rename variables, or more likely switch the data types being used to be consistent with those you are using in theCppFile.cpp. But so far, cut-and-paste has worked with minimal fuss for me with 6 different simple .h files)

3) If you only need to use code from coolStuff.h in theCppFile.cpp and nowhere else, then you should cut and paste it into theCppFile.cpp.

(Again I make no guarantees see the note above about cut-and-paste)

4) If you want to use code contained in coolStuff.h in theCppFile.cpp AND other .cpp files, you need to look into building a package. This is not hard, but can be a bit tricky, because the information out there about building packages with Rcpp ranges from the exhaustive thorough documentation you want with any R package (but that is above your head as a newbie), and the newbie sensitive introductions (that may leave out a detail you happen to need).

Here is what I suggest:

A) First get a version of theCppFile.cpp with the code from coolStuff.h cut-and-paste into theCppFile.cpp that compiles with sourceCpp() and works as you expect it to. This is not a must, but if you are new to Rcpp OR packages, it is nice to make sure your code works in this simple situation before you move to the more complicated case below.

B) Now build your package using Rcpp.package.skeleton() or use the Build functionality in RStudio (HIGHLY recommended). You can find details about using Rcpp.package.skeleton() in hadley/devtools or Rcpp Attributes Vignette. The full documentation for writing packages with Rcpp is in the Writing a package that uses Rcpp, however this one assumes you know your way around C++ fairly well, and does not use the new "Attributes" way of doing Rcpp.

Don't forget to "Build & Reload" if using RStudio or compileAttributes() if you are not in RStudio.

C) Now you should see in your \R directory a file called RcppExports.R. Open it and check it out. In RcppExports.R you should see the R wrapper functions for all the .cpp files you have in your \src directory. Pretty sweet.

D) Try out the R function that corresponds to the function you wrote in theCppFile.cpp. Does it work? If so move on.

E) With your package built you can move coolStuff.h into the src folder with theCppFile.cpp.

F) Now you can remove the cut-and-paste code from theCppFile.cpp and at the top of theCppFile.cpp (and any other .cpp file you want to use code from coolStuff.h) put #include "coolStuff.h" just after #include <Rcpp.h>. Note that there are no brackets around ranker.h, rather there are "". This is a C++ convention when including local files provided by the user rather than a library file like Rcpp or STL etc...

G) Now you have to rebuild the package. In RStudio this is just "Build & Reload" in the Build menu. If you are not using RStudio you should run compileAttributes()

H) Now try the R function again just as you did in step D), hopefully it works.

35
votes
20.7k
views

How do I create a list of vectors in Rcpp?

[see original]
rrcpp
Jonathan Chang asked Jun 21 '10 at 21:30
Click to expand question

I'm writing an Rcpp module an would like to return as one element of the RcppResultSet list a list whose elements are vectors. E.g., .Call("myfunc")$foo should be something like:

[[1]]
[1] 1

[[2]]
[1] 1 1

[[3]]
[1] 1 1 1

(the exact numbers are not important here). The issue is that I don't know the right Rcpp way of doing this. I tried passing a vector<vector<int> > but this constructs a matrix by silently taking the length of the first vector as the width (even if the matrix is ragged!). I've tried constructing an RcppList but have a hard time casting various objects (like RcppVector) safely into SEXPs.

Anyone have tips on best practices for dealing with complicated structures such as lists of vectors in Rcpp?

51
votes
Dirk is no longer here replied Jun 21 '10 at 21:50
Click to expand answer

I'm writing an Rcpp module an would like to return as one element of the RcppResultSet list a list whose elements are vectors. E.g., .Call("myfunc")$foo should be something like:

[[1]]
[1] 1

[[2]]
[1] 1 1

[[3]]
[1] 1 1 1

(the exact numbers are not important here). The issue is that I don't know the right Rcpp way of doing this. I tried passing a vector<vector<int> > but this constructs a matrix by silently taking the length of the first vector as the width (even if the matrix is ragged!). I've tried constructing an RcppList but have a hard time casting various objects (like RcppVector) safely into SEXPs.

Anyone have tips on best practices for dealing with complicated structures such as lists of vectors in Rcpp?

33
votes
4k
views

Why is this naive matrix multiplication faster than base R's?

[see original]
rperformancercppmatrix-multiplication
Cliff AB asked Jun 27 '18 at 03:41
Click to expand question

In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix-vector multiplication seems reliably 30% faster.

 library(Rcpp)

 # Simple C++ code for matrix multiplication
 mm_code = 
 "NumericVector my_mm(NumericMatrix m, NumericVector v){
   int nRow = m.rows();
   int nCol = m.cols();
   NumericVector ans(nRow);
   double v_j;
   for(int j = 0; j < nCol; j++){
     v_j = v[j];
     for(int i = 0; i < nRow; i++){
       ans[i] += m(i,j) * v_j;
     }
   }
   return(ans);
 }
 "
 # Compiling
 my_mm = cppFunction(code = mm_code)

 # Simulating data to use
 nRow = 10^4
 nCol = 10^4

 m = matrix(rnorm(nRow * nCol), nrow = nRow)
 v = rnorm(nCol)

 system.time(my_ans <- my_mm(m, v))
#>    user  system elapsed 
#>   0.103   0.001   0.103 
 system.time(r_ans <- m %*% v)
#>   user  system elapsed 
#>  0.154   0.001   0.154 

 # Double checking answer is correct
 max(abs(my_ans - r_ans))
 #> [1] 0

Does base R's %*% perform some type of data check that I'm skipping over?

EDIT:

After understanding what's going on (thanks SO!), it's worth noting that this is a worst case scenario for R's %*%, i.e. matrix by vector. For example, @RalfStubner pointed out that using an RcppArmadillo implementation of a matrix-vector multiply is even faster than the naive implementation that I demonstrated, implying considerable faster than base R, but is virtually identical to base R's %*% for matrix-matrix multiply (when both matrices are large and square):

 arma_code <- 
   "arma::mat arma_mm(const arma::mat& m, const arma::mat& m2) {
 return m * m2;
 };"
 arma_mm = cppFunction(code = arma_code, depends = "RcppArmadillo")

 nRow = 10^3 
 nCol = 10^3

 mat1 = matrix(rnorm(nRow * nCol), 
               nrow = nRow)
 mat2 = matrix(rnorm(nRow * nCol), 
               nrow = nRow)

 system.time(arma_mm(mat1, mat2))
#>   user  system elapsed 
#>   0.798   0.008   0.814 
 system.time(mat1 %*% mat2)
#>   user  system elapsed 
#>   0.807   0.005   0.822  

So R's current (v3.5.0) %*% is near optimal for matrix-matrix, but could be significantly sped up for matrix-vector if you're okay skipping the checking.

34
votes
Josh O'Brien replied Jun 27 '18 at 04:38
Click to expand answer

In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix-vector multiplication seems reliably 30% faster.

 library(Rcpp)

 # Simple C++ code for matrix multiplication
 mm_code = 
 "NumericVector my_mm(NumericMatrix m, NumericVector v){
   int nRow = m.rows();
   int nCol = m.cols();
   NumericVector ans(nRow);
   double v_j;
   for(int j = 0; j < nCol; j++){
     v_j = v[j];
     for(int i = 0; i < nRow; i++){
       ans[i] += m(i,j) * v_j;
     }
   }
   return(ans);
 }
 "
 # Compiling
 my_mm = cppFunction(code = mm_code)

 # Simulating data to use
 nRow = 10^4
 nCol = 10^4

 m = matrix(rnorm(nRow * nCol), nrow = nRow)
 v = rnorm(nCol)

 system.time(my_ans <- my_mm(m, v))
#>    user  system elapsed 
#>   0.103   0.001   0.103 
 system.time(r_ans <- m %*% v)
#>   user  system elapsed 
#>  0.154   0.001   0.154 

 # Double checking answer is correct
 max(abs(my_ans - r_ans))
 #> [1] 0

Does base R's %*% perform some type of data check that I'm skipping over?

EDIT:

After understanding what's going on (thanks SO!), it's worth noting that this is a worst case scenario for R's %*%, i.e. matrix by vector. For example, @RalfStubner pointed out that using an RcppArmadillo implementation of a matrix-vector multiply is even faster than the naive implementation that I demonstrated, implying considerable faster than base R, but is virtually identical to base R's %*% for matrix-matrix multiply (when both matrices are large and square):

 arma_code <- 
   "arma::mat arma_mm(const arma::mat& m, const arma::mat& m2) {
 return m * m2;
 };"
 arma_mm = cppFunction(code = arma_code, depends = "RcppArmadillo")

 nRow = 10^3 
 nCol = 10^3

 mat1 = matrix(rnorm(nRow * nCol), 
               nrow = nRow)
 mat2 = matrix(rnorm(nRow * nCol), 
               nrow = nRow)

 system.time(arma_mm(mat1, mat2))
#>   user  system elapsed 
#>   0.798   0.008   0.814 
 system.time(mat1 %*% mat2)
#>   user  system elapsed 
#>   0.807   0.005   0.822  

So R's current (v3.5.0) %*% is near optimal for matrix-matrix, but could be significantly sped up for matrix-vector if you're okay skipping the checking.

31
votes
14k
views

Can't run Rcpp function in foreach - "NULL value passed as symbol address"

[see original]
rrcpp
Patrick McCarthy asked Jul 31 '14 at 15:02
Click to expand question

Let me say first that I've read Writing R Extensions, the Rcpp package vignette, and that I've built a package from Rcpp.package.skeleton().

Since building my package, I added a function, multiGenerateCSVrow(), and then ran compileAttributes() on the package directory before R CMD build/R CMD install. After I load my package, I can run my function either directly or via foreach() with the %do% method.

When I try to run in parallel however, I get an error:

cl <- makePSOCKcluster(8)                                                                                     
registerDoParallel(cl)                                                                                        
rows <- foreach(i=1:8,.combine=rbind,.packages="myPackage") %dopar% multiGenerateCSVrow(scoreMatrix=NIsample,   
                                                                   validMatrix = matrix(1,nrow=10,ncol=10),   
                                                                   cutoffVector = rep(0,10),                  
                                                                   factorVector = randomsCutPlus1[i,],        
                                                                   actualVector = rep(1,10),                  
                                                                   scaleSample = 1)                           
stopCluster(cl)                                                                                               
~                                                                                                             

Error in multiGenerateCSVrow(scoreMatrix = NIsample, validMatrix = matrix(1,  : 
  task 1 failed - "NULL value passed as symbol address"

Here's the package NAMESPACE:

# Generated by roxygen2 (4.0.1): do not edit by hand 
useDynLib(myPackage)                                   
exportPattern("^[[:alpha:]]+")                       
importFrom(Rcpp, evalCpp) 

Here's the relevant chunk of RcppExports.cpp:

// multiGenerateCSVrow
SEXP multiGenerateCSVrow(SEXP scoreMatrix, SEXP validMatrix, SEXP cutoffVector, SEXP factorVector, SEXP actualVector, SEXP scaleSample);
RcppExport SEXP myPackage_multiGenerateCSVrow(SEXP scoreMatrixSEXP, SEXP validMatrixSEXP, SEXP cutoffVectorSEXP, SEXP factorVectorSEXP, SEXP actualVectorSEXP, SEXP scaleSampleSEXP) {
BEGIN_RCPP
    SEXP __sexp_result;
    {
        Rcpp::RNGScope __rngScope;
        Rcpp::traits::input_parameter< SEXP >::type scoreMatrix(scoreMatrixSEXP );
        Rcpp::traits::input_parameter< SEXP >::type validMatrix(validMatrixSEXP );
        Rcpp::traits::input_parameter< SEXP >::type cutoffVector(cutoffVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type factorVector(factorVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type actualVector(actualVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type scaleSample(scaleSampleSEXP );
        SEXP __result = multiGenerateCSVrow(scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample);
        PROTECT(__sexp_result = Rcpp::wrap(__result));
    }
    UNPROTECT(1);
    return __sexp_result;
END_RCPP
}

And RcppExports.R:

multiGenerateCSVrow <- function(scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample) {
    .Call('myPackage_multiGenerateCSVrow', PACKAGE = 'myPackage', scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample)
}   

What could it be looking for?

16
votes
henine replied Jun 23 '17 at 06:02
Click to expand answer

Let me say first that I've read Writing R Extensions, the Rcpp package vignette, and that I've built a package from Rcpp.package.skeleton().

Since building my package, I added a function, multiGenerateCSVrow(), and then ran compileAttributes() on the package directory before R CMD build/R CMD install. After I load my package, I can run my function either directly or via foreach() with the %do% method.

When I try to run in parallel however, I get an error:

cl <- makePSOCKcluster(8)                                                                                     
registerDoParallel(cl)                                                                                        
rows <- foreach(i=1:8,.combine=rbind,.packages="myPackage") %dopar% multiGenerateCSVrow(scoreMatrix=NIsample,   
                                                                   validMatrix = matrix(1,nrow=10,ncol=10),   
                                                                   cutoffVector = rep(0,10),                  
                                                                   factorVector = randomsCutPlus1[i,],        
                                                                   actualVector = rep(1,10),                  
                                                                   scaleSample = 1)                           
stopCluster(cl)                                                                                               
~                                                                                                             

Error in multiGenerateCSVrow(scoreMatrix = NIsample, validMatrix = matrix(1,  : 
  task 1 failed - "NULL value passed as symbol address"

Here's the package NAMESPACE:

# Generated by roxygen2 (4.0.1): do not edit by hand 
useDynLib(myPackage)                                   
exportPattern("^[[:alpha:]]+")                       
importFrom(Rcpp, evalCpp) 

Here's the relevant chunk of RcppExports.cpp:

// multiGenerateCSVrow
SEXP multiGenerateCSVrow(SEXP scoreMatrix, SEXP validMatrix, SEXP cutoffVector, SEXP factorVector, SEXP actualVector, SEXP scaleSample);
RcppExport SEXP myPackage_multiGenerateCSVrow(SEXP scoreMatrixSEXP, SEXP validMatrixSEXP, SEXP cutoffVectorSEXP, SEXP factorVectorSEXP, SEXP actualVectorSEXP, SEXP scaleSampleSEXP) {
BEGIN_RCPP
    SEXP __sexp_result;
    {
        Rcpp::RNGScope __rngScope;
        Rcpp::traits::input_parameter< SEXP >::type scoreMatrix(scoreMatrixSEXP );
        Rcpp::traits::input_parameter< SEXP >::type validMatrix(validMatrixSEXP );
        Rcpp::traits::input_parameter< SEXP >::type cutoffVector(cutoffVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type factorVector(factorVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type actualVector(actualVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type scaleSample(scaleSampleSEXP );
        SEXP __result = multiGenerateCSVrow(scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample);
        PROTECT(__sexp_result = Rcpp::wrap(__result));
    }
    UNPROTECT(1);
    return __sexp_result;
END_RCPP
}

And RcppExports.R:

multiGenerateCSVrow <- function(scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample) {
    .Call('myPackage_multiGenerateCSVrow', PACKAGE = 'myPackage', scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample)
}   

What could it be looking for?