Rcpp Top StackOverflow Q&A

Question 1

211

votes

115.3k

views

Speed up the loop operation in R

[see original]

performancerloopsrcppr-faq

Kay asked May 25 '10 at 21:55

Click to expand question

I have a big performance problem in R. I wrote a function that iterates over a data.frame object. It simply adds a new column to a data.frame and accumulates something. (simple operation). The data.frame has roughly 850K rows. My PC is still working (about 10h now) and I have no idea about the runtime.

dayloop2 <- function(temp){
    for (i in 1:nrow(temp)){    
        temp[i,10] <- i
        if (i > 1) {             
            if ((temp[i,6] == temp[i-1,6]) & (temp[i,3] == temp[i-1,3])) { 
                temp[i,10] <- temp[i,9] + temp[i-1,10]                    
            } else {
                temp[i,10] <- temp[i,9]                                    
            }
        } else {
            temp[i,10] <- temp[i,9]
        }
    }
    names(temp)[names(temp) == "V10"] <- "Kumm."
    return(temp)
}

Any ideas how to speed up this operation?

Answer

466

votes

Marek replied Jun 03 '10 at 22:34

Click to expand answer

I have a big performance problem in R. I wrote a function that iterates over a data.frame object. It simply adds a new column to a data.frame and accumulates something. (simple operation). The data.frame has roughly 850K rows. My PC is still working (about 10h now) and I have no idea about the runtime.

dayloop2 <- function(temp){
    for (i in 1:nrow(temp)){    
        temp[i,10] <- i
        if (i > 1) {             
            if ((temp[i,6] == temp[i-1,6]) & (temp[i,3] == temp[i-1,3])) { 
                temp[i,10] <- temp[i,9] + temp[i-1,10]                    
            } else {
                temp[i,10] <- temp[i,9]                                    
            }
        } else {
            temp[i,10] <- temp[i,9]
        }
    }
    names(temp)[names(temp) == "V10"] <- "Kumm."
    return(temp)
}

Any ideas how to speed up this operation?

Question 2

72

votes

100.5k

views

Rcpp package doesn't include Rcpp_precious_remove

[see original]

rrcpp

Laxmi Agarwal asked Jul 17 '21 at 00:14

Click to expand question

I have been trying to create a database and installed the "DBI" package, but I am still facing this error. I reinstalled DBI and RSQLite package, but they don’t seem to work.

library("DBI")
con <- dbConnect
(RSQLite::SQLite(), dbname = ":memory:")
dbListTables(con)

Error:

Error in connection_connect(dbname, loadable.extensions, flags, vfs, extended_types) : function 'Rcpp_precious_remove' not provided by package 'Rcpp'

Answer

71

votes

Daniela replied Jul 18 '21 at 12:46

Click to expand answer

I have been trying to create a database and installed the "DBI" package, but I am still facing this error. I reinstalled DBI and RSQLite package, but they don’t seem to work.

library("DBI")
con <- dbConnect
(RSQLite::SQLite(), dbname = ":memory:")
dbListTables(con)

Error:

Error in connection_connect(dbname, loadable.extensions, flags, vfs, extended_types) : function 'Rcpp_precious_remove' not provided by package 'Rcpp'

Question 3

15

votes

5.9k

views

Rcpp pass by reference vs. by value

[see original]

rrcpp

Ari B. Friedman asked Jul 02 '12 at 19:36

Click to expand question

I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!): Replace negative values by zero

The initial version looked like this:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.

So I fixed it with the following version:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  Rcpp::NumericVector xr(a);
  for(int i=0; i < n_xa; i++) {
    if(xr[i]<0) xr[i] = 0;
  }
  return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):

> cpp_if_src <- '
+   Rcpp::NumericVector xa(a);
+   int n_xa = xa.size();
+   for(int i=0; i < n_xa; i++) {
+     if(xa[i]<0) xa[i] = 0;
+   }
+   return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
> 
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
> cpp_if(p)
 [1] 0 0 0 0 0 0 1 2 3 4 5
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

I'm not the only one who has tried to replicate this behavior and found inconsistent results:

https://chat.stackoverflow.com/transcript/message/4357344#4357344

What's going on here?

Answer

22

votes

Dirk is no longer here replied Jul 02 '12 at 20:28

Click to expand answer

I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!): Replace negative values by zero

The initial version looked like this:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.

So I fixed it with the following version:

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  Rcpp::NumericVector xr(a);
  for(int i=0; i < n_xa; i++) {
    if(xr[i]<0) xr[i] = 0;
  }
  return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")

Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):

> cpp_if_src <- '
+   Rcpp::NumericVector xa(a);
+   int n_xa = xa.size();
+   for(int i=0; i < n_xa; i++) {
+     if(xa[i]<0) xa[i] = 0;
+   }
+   return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
> 
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
> cpp_if(p)
 [1] 0 0 0 0 0 0 1 2 3 4 5
> p
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

I'm not the only one who has tried to replicate this behavior and found inconsistent results:

https://chat.stackoverflow.com/transcript/message/4357344#4357344

What's going on here?

Question 4

34

votes

102.6k

views

Replace negative values by zero

[see original]

rif-statementfor-loopconditional-statementsrcpp

Fabian Stolz asked Jun 30 '12 at 15:09

Click to expand question

We want to set all values in an array zero that are negative.

I tried out a a lot of stuff but did not yet achieve a working solution. I thought about a for loop with condition, however this seems not to work.

#pred_precipitation is our array
pred_precipitation <-rnorm(25,2,4)     

for (i in nrow(pred_precipitation))
{
  if (pred_precipitation[i]<0) {pred_precipitation[i] = 0}
  else{pred_precipitation[i] = pred_precipitation[i]}
}

Answer

73

votes

Ari B. Friedman replied Jun 30 '12 at 15:15

Click to expand answer

We want to set all values in an array zero that are negative.

I tried out a a lot of stuff but did not yet achieve a working solution. I thought about a for loop with condition, however this seems not to work.

#pred_precipitation is our array
pred_precipitation <-rnorm(25,2,4)     

for (i in nrow(pred_precipitation))
{
  if (pred_precipitation[i]<0) {pred_precipitation[i] = 0}
  else{pred_precipitation[i] = pred_precipitation[i]}
}

Question 5

16

votes

4.3k

views

Using Rcpp within parallel code via snow to make a cluster

[see original]

rrcppsnow

Vincent asked May 20 '11 at 15:40

Click to expand question

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

And here is some code:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)

Answer

19

votes

Dirk is no longer here replied May 20 '11 at 15:47

Click to expand answer

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

And here is some code:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)

Question 6

87

votes

35.3k

views

Understanding the contents of the Makevars file in R (macros, variables, ~/.R/Makevars and pkg/src/Makevars)

[see original]

rrcpp

NoBackingDown asked Apr 24 '17 at 20:59

Click to expand question

I am trying to understand the role and relationship of the macros/variables set in ~/.R/Makevars and package_directory/src/Makevars when installing/building own R packages. Suppose these files look like

~/.R/Makevars

CXX = g++
CXXSTD = -std=c++11
CXXFLAGS = -fsanitize=undefined,address -fno-omit-frame-pointer

CXX98 = g++
CXX98STD = -std=c++98

CXX11 = g++
CXX11STD = -std=c++11

CXX14 = g++
CXX14STD = -std=c++14

package_directory/src/Makevars

PKG_CPPFLAGS = -I../inst/include
CXX_STD = CXX11

As I understand it, with CXX we can select the compiler for C++ when building R packages, with CXXSTD we chose the standard and with CXXFLAGS we add compiler flags. With PKG_CPPFLAGS we add flags for the C++ preprocessor and with CXX_STD we tell that our packages uses C++11.

I have the following questions:

What is the relationship between CXX and CXX98, CXX11 and CXX14?
What is the meaning of, e.g., CXX11STD = -std=c++11 if C++11 is already implied? Is it between choosing -std=c++11 and -std=gnu++11? Should -std=gnu++11 generally be avoided for portability reasons?
Could the flags for CXXSTD and CXXFLAGS not just be added to CXX, such that the first three lines reduce to CXX = g++ -std=c++11 -fsanitize=undefined,address -fno-omit-frame-pointer. What is the advantage in explicity specifying CXXSTD and CXXFLAGS?
How does CXX_STD = CXX11 work? How is CXX11 here related to CXX11 in ~/.R/Makevars?
What is the relationship between CXXFLAGS and PKG_CXXFLAGS(not included in my example)?

I am aware of the information contained in Writing R Extensions and R Installation and Administration, but I am not able to extract more information beyond my current level of understanding to answer the above questions.

I am adding a Rcpp tag because I suppose that answers to these questions will be most relevant to users of Rcpp, but I am aware that this is probably not directly related to Rcpp, so the tag might be removed if deemed appropriate.

Answer

129

votes

coatless replied Apr 24 '17 at 23:30

Click to expand answer

I am trying to understand the role and relationship of the macros/variables set in ~/.R/Makevars and package_directory/src/Makevars when installing/building own R packages. Suppose these files look like

~/.R/Makevars

CXX = g++
CXXSTD = -std=c++11
CXXFLAGS = -fsanitize=undefined,address -fno-omit-frame-pointer

CXX98 = g++
CXX98STD = -std=c++98

CXX11 = g++
CXX11STD = -std=c++11

CXX14 = g++
CXX14STD = -std=c++14

package_directory/src/Makevars

PKG_CPPFLAGS = -I../inst/include
CXX_STD = CXX11

As I understand it, with CXX we can select the compiler for C++ when building R packages, with CXXSTD we chose the standard and with CXXFLAGS we add compiler flags. With PKG_CPPFLAGS we add flags for the C++ preprocessor and with CXX_STD we tell that our packages uses C++11.

I have the following questions:

What is the relationship between CXX and CXX98, CXX11 and CXX14?
What is the meaning of, e.g., CXX11STD = -std=c++11 if C++11 is already implied? Is it between choosing -std=c++11 and -std=gnu++11? Should -std=gnu++11 generally be avoided for portability reasons?
Could the flags for CXXSTD and CXXFLAGS not just be added to CXX, such that the first three lines reduce to CXX = g++ -std=c++11 -fsanitize=undefined,address -fno-omit-frame-pointer. What is the advantage in explicity specifying CXXSTD and CXXFLAGS?
How does CXX_STD = CXX11 work? How is CXX11 here related to CXX11 in ~/.R/Makevars?
What is the relationship between CXXFLAGS and PKG_CXXFLAGS(not included in my example)?

I am aware of the information contained in Writing R Extensions and R Installation and Administration, but I am not able to extract more information beyond my current level of understanding to answer the above questions.

I am adding a Rcpp tag because I suppose that answers to these questions will be most relevant to users of Rcpp, but I am aware that this is probably not directly related to Rcpp, so the tag might be removed if deemed appropriate.

Question 7

8

votes

1.7k

views

Multithreaded & SIMD vectorized Mandelbrot in R using Rcpp & OpenMP

[see original]

multithreadingopenmprcppsimdmandelbrot

Tom Wenseleers asked Jan 03 '18 at 01:19

Click to expand question

As an OpenMP & Rcpp performance test I wanted to check how fast I could calculate the Mandelbrot set in R using the most straightforward and simple Rcpp+OpenMP implementation. Currently what I did was:

#include <Rcpp.h>
#include <omp.h>
// [[Rcpp::plugins(openmp)]]

using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericMatrix mandelRcpp(const double x_min, const double x_max, const double y_min, const double y_max,
                         const int res_x, const int res_y, const int nb_iter) {
  Rcpp::NumericMatrix ret(res_x, res_y);
  double x_step = (x_max - x_min) / res_x;
  double y_step = (y_max - y_min) / res_y;
  int r,c;
#pragma omp parallel for default(shared) private(c) schedule(dynamic,1) collapse(2)
  for (r = 0; r < res_y; r++) {
    for (c = 0; c < res_x; c++) {
      double zx = 0.0, zy = 0.0, new_zx;
      double cx = x_min + c*x_step, cy = y_min + r*y_step;
      int n = 0;
      for (n=0;  (zx*zx + zy*zy < 4.0 ) && ( n < nb_iter ); n++ ) {
        new_zx = zx*zx - zy*zy + cx;
        zy = 2.0*zx*zy + cy;
        zx = new_zx;
      }
      ret(c,r) = n;
    }
  }
  return ret;
}

And then in R:

library(Rcpp)
sourceCpp("mandelRcpp.cpp")
xlims=c(-0.74877,-0.74872);
ylims=c(0.065053,0.065103);
x_res=y_res=1080L; nb_iter=10000L;
system.time(m <- mandelRcpp(xlims[[1]], xlims[[2]], ylims[[1]], ylims[[2]], x_res, y_res, nb_iter)) 
# 0.92s
rainbow=c(rgb(0.47,0.11,0.53),rgb(0.27,0.18,0.73),rgb(0.25,0.39,0.81),rgb(0.30,0.57,0.75),rgb(0.39,0.67,0.60),rgb(0.51,0.73,0.44),rgb(0.67,0.74,0.32),rgb(0.81,0.71,0.26),rgb(0.89,0.60,0.22),rgb(0.89,0.39,0.18),rgb(0.86,0.13,0.13))
    cols=c(colorRampPalette(rainbow)(100),rev(colorRampPalette(rainbow)(100)),"black") # palette
par(mar=c(0, 0, 0, 0))
system.time(image(m^(1/7), col=cols, asp=diff(ylims)/diff(xlims), axes=F, useRaster=T)) 
# 0.5s

I was unsure though if there is any other obvious speed improvements I could take advantage of aside from OpenMP multithreading, e.g. via simd vectorization? (using simd options in the openmp #pragma didn't seem to do anything)

PS at first my code was crashing but I later found this was solved by replacing ret[r,c] = n; with ret(r,c) = n; Using Armadillo classes as suggested in the answer below make things very slightly faster, though the timings are almost the same. Also flipped around x and y so it comes out in the right orientation when plotted with image(). Using 8 threads speed is ca. 350 times faster than the vectorized plain R Mandelbrot version here and also about 7.3 times faster than the (non-multithreaded) Python/Numba version here (similar to PyCUDA or PyOpenCL speeds), so quite happy with that... Rasterizing/display now seems the bottleneck in R....

Answer

6

votes

coatless replied Jan 03 '18 at 02:04

Click to expand answer

As an OpenMP & Rcpp performance test I wanted to check how fast I could calculate the Mandelbrot set in R using the most straightforward and simple Rcpp+OpenMP implementation. Currently what I did was:

#include <Rcpp.h>
#include <omp.h>
// [[Rcpp::plugins(openmp)]]

using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::NumericMatrix mandelRcpp(const double x_min, const double x_max, const double y_min, const double y_max,
                         const int res_x, const int res_y, const int nb_iter) {
  Rcpp::NumericMatrix ret(res_x, res_y);
  double x_step = (x_max - x_min) / res_x;
  double y_step = (y_max - y_min) / res_y;
  int r,c;
#pragma omp parallel for default(shared) private(c) schedule(dynamic,1) collapse(2)
  for (r = 0; r < res_y; r++) {
    for (c = 0; c < res_x; c++) {
      double zx = 0.0, zy = 0.0, new_zx;
      double cx = x_min + c*x_step, cy = y_min + r*y_step;
      int n = 0;
      for (n=0;  (zx*zx + zy*zy < 4.0 ) && ( n < nb_iter ); n++ ) {
        new_zx = zx*zx - zy*zy + cx;
        zy = 2.0*zx*zy + cy;
        zx = new_zx;
      }
      ret(c,r) = n;
    }
  }
  return ret;
}

And then in R:

library(Rcpp)
sourceCpp("mandelRcpp.cpp")
xlims=c(-0.74877,-0.74872);
ylims=c(0.065053,0.065103);
x_res=y_res=1080L; nb_iter=10000L;
system.time(m <- mandelRcpp(xlims[[1]], xlims[[2]], ylims[[1]], ylims[[2]], x_res, y_res, nb_iter)) 
# 0.92s
rainbow=c(rgb(0.47,0.11,0.53),rgb(0.27,0.18,0.73),rgb(0.25,0.39,0.81),rgb(0.30,0.57,0.75),rgb(0.39,0.67,0.60),rgb(0.51,0.73,0.44),rgb(0.67,0.74,0.32),rgb(0.81,0.71,0.26),rgb(0.89,0.60,0.22),rgb(0.89,0.39,0.18),rgb(0.86,0.13,0.13))
    cols=c(colorRampPalette(rainbow)(100),rev(colorRampPalette(rainbow)(100)),"black") # palette
par(mar=c(0, 0, 0, 0))
system.time(image(m^(1/7), col=cols, asp=diff(ylims)/diff(xlims), axes=F, useRaster=T)) 
# 0.5s

I was unsure though if there is any other obvious speed improvements I could take advantage of aside from OpenMP multithreading, e.g. via simd vectorization? (using simd options in the openmp #pragma didn't seem to do anything)

PS at first my code was crashing but I later found this was solved by replacing ret[r,c] = n; with ret(r,c) = n; Using Armadillo classes as suggested in the answer below make things very slightly faster, though the timings are almost the same. Also flipped around x and y so it comes out in the right orientation when plotted with image(). Using 8 threads speed is ca. 350 times faster than the vectorized plain R Mandelbrot version here and also about 7.3 times faster than the (non-multithreaded) Python/Numba version here (similar to PyCUDA or PyOpenCL speeds), so quite happy with that... Rasterizing/display now seems the bottleneck in R....

Question 8

62

votes

91.4k

views

Practical limits of R data frame

[see original]

rperformancedataframercpp

Egon asked Mar 08 '11 at 14:24

Click to expand question

I have been reading about how read.table is not efficient for large data files. Also how R is not suited for large data sets. So I was wondering where I can find what the practical limits are and any performance charts for (1) Reading in data of various sizes (2) working with data of varying sizes.

In effect, I want to know when the performance deteriorates and when I hit a road block. Also any comparison against C++/MATLAB or other languages would be really helpful. finally if there is any special performance comparison for Rcpp and RInside, that would be great!

Answer

61

votes

Allan Engelhardt replied Mar 08 '11 at 15:07

Click to expand answer

I have been reading about how read.table is not efficient for large data files. Also how R is not suited for large data sets. So I was wondering where I can find what the practical limits are and any performance charts for (1) Reading in data of various sizes (2) working with data of varying sizes.

In effect, I want to know when the performance deteriorates and when I hit a road block. Also any comparison against C++/MATLAB or other languages would be really helpful. finally if there is any special performance comparison for Rcpp and RInside, that would be great!

Question 9

16

votes

4.8k

views

Rcpp function check if missing value

[see original]

rrcpp

Giorgio Spedicato asked Oct 07 '14 at 16:45

Click to expand question

I'm converting R based code into Rcpp based code. The head of my function is:

NumericMatrix createMatrixOfLinkRatiosC(NumericMatrix matr, double threshold4Clean) {
int i,j; 
NumericMatrix myMatr(matr.nrow(),matr.ncol());
myMatr=matr;
....;

}

I want to handle call to the function where threshold4Clean is missing but I'm not finding how to do... Any help will be greatly appreciated.

Answer

33

votes

Kevin Ushey replied Oct 08 '14 at 17:35

Click to expand answer

I'm converting R based code into Rcpp based code. The head of my function is:

NumericMatrix createMatrixOfLinkRatiosC(NumericMatrix matr, double threshold4Clean) {
int i,j; 
NumericMatrix myMatr(matr.nrow(),matr.ncol());
myMatr=matr;
....;

}

I want to handle call to the function where threshold4Clean is missing but I'm not finding how to do... Any help will be greatly appreciated.

Question 10

19

votes

3.4k

views

using C function from other package in Rcpp

[see original]

c++crrcpp

baptiste asked Dec 09 '13 at 15:28

Click to expand question

I'm trying to call a C routine from the cubature package in a c++ function to perform multidimensional integration.

The basic R example I'm trying to reproduce is

library(cubature)
integrand <- function(x) sin(x)
adaptIntegrate(integrand, 0, pi)

I could just call this R function from Rcpp following this recipe from the gallery, but there would be some performance penalty in switching back and forth from c/c++ to R. It seems more sensible to directly call the C function from C++.

The C routine adapt_integrate is exported from cubature with

 // R_RegisterCCallable("cubature", "adapt_integrate", (DL_FUNC) adapt_integrate);

I don't understand how to call it from c++, however. Here's my lame attempt,

sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double integrand(double x){
 return(sin(x));
}

// [[Rcpp::depends(cubature)]]
// [[Rcpp::export]]
Rcpp::List integratecpp(double llim, double ulim)
{
  Rcpp::Function p_cubature = R_GetCCallable("cubature", "adapt_integrate");

  Rcpp::List result = p_cubature(integrand, llim, ulim);
  return(result);
}
'
)

integratecpp(0, pi)

This fails to compile; clearly I'm doing something very silly and missing some important steps to convert the output of R_GetCCallable into an Rcpp::Function (or call it directly?). I've read several related posts dealing with function pointers, but haven't seen an example using an external C function.

Answer

11

votes

Romain Francois replied Dec 09 '13 at 16:51

Click to expand answer

I'm trying to call a C routine from the cubature package in a c++ function to perform multidimensional integration.

The basic R example I'm trying to reproduce is

library(cubature)
integrand <- function(x) sin(x)
adaptIntegrate(integrand, 0, pi)

I could just call this R function from Rcpp following this recipe from the gallery, but there would be some performance penalty in switching back and forth from c/c++ to R. It seems more sensible to directly call the C function from C++.

The C routine adapt_integrate is exported from cubature with

 // R_RegisterCCallable("cubature", "adapt_integrate", (DL_FUNC) adapt_integrate);

I don't understand how to call it from c++, however. Here's my lame attempt,

sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double integrand(double x){
 return(sin(x));
}

// [[Rcpp::depends(cubature)]]
// [[Rcpp::export]]
Rcpp::List integratecpp(double llim, double ulim)
{
  Rcpp::Function p_cubature = R_GetCCallable("cubature", "adapt_integrate");

  Rcpp::List result = p_cubature(integrand, llim, ulim);
  return(result);
}
'
)

integratecpp(0, pi)

This fails to compile; clearly I'm doing something very silly and missing some important steps to convert the output of R_GetCCallable into an Rcpp::Function (or call it directly?). I've read several related posts dealing with function pointers, but haven't seen an example using an external C function.

Question 11

5

votes

2k

views

Call a function from c++ via environment Rcpp

[see original]

rrcpp

skyindeer asked Apr 13 '16 at 09:48

Click to expand question

I am considering calling a R function from c++ via environment, but I got an error, here is what I did

#include <Rcpp.h>
using namespace Rcpp;



// [[Rcpp::export]]
NumericVector call(NumericVector x){
  Environment env = Environment::global_env();
  Function f = env["fivenum"];
  NumericVector res = f(x);
  return res;
}

Type call(x), this is what I got,

Error: cannot convert to function

I know I can do it right in another way,

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector callFunction(NumericVector x, Function f) {
    NumericVector res = f(x);
    return res;
}

and type

callFunction(x,fivenum)

But still wondering why first method failed.

Answer

10

votes

digEmAll replied Apr 13 '16 at 10:13

Click to expand answer

I am considering calling a R function from c++ via environment, but I got an error, here is what I did

#include <Rcpp.h>
using namespace Rcpp;



// [[Rcpp::export]]
NumericVector call(NumericVector x){
  Environment env = Environment::global_env();
  Function f = env["fivenum"];
  NumericVector res = f(x);
  return res;
}

Type call(x), this is what I got,

Error: cannot convert to function

I know I can do it right in another way,

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector callFunction(NumericVector x, Function f) {
    NumericVector res = f(x);
    return res;
}

and type

callFunction(x,fivenum)

But still wondering why first method failed.

Question 12

3

votes

2k

views

Using Rcpp functions inside of R's par*apply functions from the parallel package

[see original]

rparallel-processingrcpp

Scott White asked Jul 22 '16 at 04:50

Click to expand question

I'm trying to understand what is happening behind the Rcpp::sourceCpp() call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.

Within this post, Dirk said,

"You need to run the sourceCpp() call in each spawned process, or else get them your code."

This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:

clusterExport(cl = cl, varlist = "payoff")

I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport() is for.

Answer

8

votes

coatless replied Jul 22 '16 at 05:11

Click to expand answer

I'm trying to understand what is happening behind the Rcpp::sourceCpp() call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.

Within this post, Dirk said,

"You need to run the sourceCpp() call in each spawned process, or else get them your code."

This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:

clusterExport(cl = cl, varlist = "payoff")

I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport() is for.

Question 13

14

votes

2.3k

views

how can I handle vectors without knowing the type in Rcpp

[see original]

rrcpp

eddi asked Nov 06 '13 at 22:12

Click to expand question

I want to replicate the following R function in Rcpp:

fR = function(x) x[1:2]

fR(c(1,2,3))
#[1] 1 2
fR(c('a','b','c'))
#[1] "a" "b"

I can do it for a fixed output type like so:

library(inline)
library(Rcpp)

fint = cxxfunction(signature(x = "SEXP"), '
          List xin(x);
          IntegerVector xout;

          for (int i = 0; i < 2; ++i) xout.push_back(xin[i]);

          return xout;', plugin = "Rcpp")

But this will only work for integers, and if I try replacing the xout type with List (or GenericVector, which are the same) - it works with any input type, but I get back a list instead of a vector.

What's the correct Rcpp way of doing this?

Answer

13

votes

Romain Francois replied Nov 07 '13 at 06:33

Click to expand answer

I want to replicate the following R function in Rcpp:

fR = function(x) x[1:2]

fR(c(1,2,3))
#[1] 1 2
fR(c('a','b','c'))
#[1] "a" "b"

I can do it for a fixed output type like so:

library(inline)
library(Rcpp)

fint = cxxfunction(signature(x = "SEXP"), '
          List xin(x);
          IntegerVector xout;

          for (int i = 0; i < 2; ++i) xout.push_back(xin[i]);

          return xout;', plugin = "Rcpp")

But this will only work for integers, and if I try replacing the xout type with List (or GenericVector, which are the same) - it works with any input type, but I get back a list instead of a vector.

What's the correct Rcpp way of doing this?

Question 14

10

votes

7.4k

views

calling a user-defined R function from C++ using Rcpp

[see original]

rrcpp

Sam asked Jan 20 '14 at 03:11

Click to expand question

I have an R code with a bunch of user-defined R functions. I'm trying to make the code run faster and of course the best option is to use Rcpp. My code involves functions that call each other. Therefore, If I write some functions in C++, I should be able to call and to run some of my R functions in my c++ code. In a simple example consider the code below in R:

mySum <- function(x, y){
 return(2*x + 3*y)
}
x <<- 1
y <<- 1

Now consider the C++ code in which I'm trying to access the function above:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
int mySuminC(){
 Environment myEnv = Environment::global_env();
 Function mySum = myEnv["mySum"];
 int x = myEnv["x"];
 int y = myEnv["y"];
 return wrap(mySum(Rcpp::Named("x", x), Rcpp::Named("y", y)));
 }

When I source the file in R with the inline function sourceCpp(), I get the error:

 "invalid conversion from 'SEXPREC*' to int

Could anyone help me on debugging the code? Is my code efficient? Can it be summarized? Is there any better idea to use mySum function than what I did in my code?

Thanks very much for your help.

Answer

14

votes

Kevin Ushey replied Jan 20 '14 at 03:42

Click to expand answer

I have an R code with a bunch of user-defined R functions. I'm trying to make the code run faster and of course the best option is to use Rcpp. My code involves functions that call each other. Therefore, If I write some functions in C++, I should be able to call and to run some of my R functions in my c++ code. In a simple example consider the code below in R:

mySum <- function(x, y){
 return(2*x + 3*y)
}
x <<- 1
y <<- 1

Now consider the C++ code in which I'm trying to access the function above:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
int mySuminC(){
 Environment myEnv = Environment::global_env();
 Function mySum = myEnv["mySum"];
 int x = myEnv["x"];
 int y = myEnv["y"];
 return wrap(mySum(Rcpp::Named("x", x), Rcpp::Named("y", y)));
 }

When I source the file in R with the inline function sourceCpp(), I get the error:

 "invalid conversion from 'SEXPREC*' to int

Could anyone help me on debugging the code? Is my code efficient? Can it be summarized? Is there any better idea to use mySum function than what I did in my code?

Thanks very much for your help.

Question 15

9

votes

10.1k

views

Call R functions in Rcpp

[see original]

rrcpp

Yuge Hao asked Jun 24 '16 at 15:20

Click to expand question

I was trying to call the sd(x), which is a R function, in Rcpp. I've seen an example of calling the R function dbate(x) in Rcpp and it works perfectly.

// dens calls the pdf of beta distribution in R
//[[Rcpp::export]]
double dens(double x, double a, double b)
{
    return R::dbeta(x,a,b,false);
}

But when I tired to apply this method to sd(x) as following, it went wrong.

// std calls the sd function in R
//[[Rcpp::export]]
double std(NumericVector x)
{
  return R::sd(x);
}

Does anyone know why this doesn't work?

Answer

16

votes

coatless replied Jun 24 '16 at 16:03

Click to expand answer

I was trying to call the sd(x), which is a R function, in Rcpp. I've seen an example of calling the R function dbate(x) in Rcpp and it works perfectly.

// dens calls the pdf of beta distribution in R
//[[Rcpp::export]]
double dens(double x, double a, double b)
{
    return R::dbeta(x,a,b,false);
}

But when I tired to apply this method to sd(x) as following, it went wrong.

// std calls the sd function in R
//[[Rcpp::export]]
double std(NumericVector x)
{
  return R::sd(x);
}

Does anyone know why this doesn't work?

Question 16

9

votes

1.4k

views

In Rcpp, how to get a user-defined structure from C into R

[see original]

rcpp

Mark Bower asked Jun 29 '18 at 22:45

Click to expand question

Am using Rcpp packages and can get my C function to compile and run in R, but now I want to return a large, user-defined data structure to R. The fields in the structure are either numbers or strings - no new or odd types within the structure. The example below is simplified and doesn't compile, but it conveys the idea of my problem.

typedef struct {
  char*    firstname[128];
  char*    lastname[128];
  int      nbrOfSamples;
} HEADER_INFO;

// [[Rcpp::export]]
HEADER_INFO* read_header(Rcpp::StringVector strings) {
  FILE *fp;
  MEF_HEADER_INFO *header;
    
  char * filename = (char*)(strings(0));
  char * password = (char*)(strings(1));
    
  header = (HEADER_INFO*)malloc(sizeof(HEADER_INFO));
  memset(header, 0, sizeof(HEADER_INFO));
    
  fp = fopen(filename, "r");
  (void)read_header(header, password);
  return header;
}

I'm pretty sure that I could package the entries in the header back into a StringVector, but that seems like a brute-force approach. My question is whether a more elegant solution exists. It is not clear to me what form such a structure would even have in R: a named List?

Thanks!

Answer

12

votes

Ralf Stubner replied Jul 01 '18 at 08:58

Click to expand answer

Am using Rcpp packages and can get my C function to compile and run in R, but now I want to return a large, user-defined data structure to R. The fields in the structure are either numbers or strings - no new or odd types within the structure. The example below is simplified and doesn't compile, but it conveys the idea of my problem.

typedef struct {
  char*    firstname[128];
  char*    lastname[128];
  int      nbrOfSamples;
} HEADER_INFO;

// [[Rcpp::export]]
HEADER_INFO* read_header(Rcpp::StringVector strings) {
  FILE *fp;
  MEF_HEADER_INFO *header;
    
  char * filename = (char*)(strings(0));
  char * password = (char*)(strings(1));
    
  header = (HEADER_INFO*)malloc(sizeof(HEADER_INFO));
  memset(header, 0, sizeof(HEADER_INFO));
    
  fp = fopen(filename, "r");
  (void)read_header(header, password);
  return header;
}

I'm pretty sure that I could package the entries in the header back into a StringVector, but that seems like a brute-force approach. My question is whether a more elegant solution exists. It is not clear to me what form such a structure would even have in R: a named List?

Thanks!

Question 17

6

votes

3.6k

views

Rcpp function to select (and to return) a sub-dataframe

[see original]

rrcpp

Sam asked Apr 03 '14 at 05:11

Click to expand question

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this question, returning a sub-dataframe) ? My code below may make my question more clear:

code:

# Suppose I have the data frame below created in R:
myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10))

# Suppose I want to write a C++ function that gets id as inout and returns 
# a sub-dataframe corresponding to that id (**If it's possible to return 
# DataFrame in C++**)

# Auxiliary function --> helps get a sub vector:
arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){
  arma::uvec AuxVec = find(IDVec == ID);
  arma::vec rslt = arma::vec(AuxVec.size());
  for (int i = 0; i < AuxVec.size(); i++){
    rslt[i] = vecMain[AuxVec[i]];
  }
  return rslt;
}

# Here is my C++ function:
Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){
  arma::vec id = Rcpp::as<arma::vec>(myDF["id"]);
  arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
  arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]);

  // here I take a sub-vector:
  arma::vec id_sub = myVecSubset(id, id, int ID);
  arma::vec alph_sub = myVecSubset(alph, id, int ID);
  arma::vec mess_sub = myVecSubset(mess, id, int ID);

  // here is the CHALLENGE: How to combine these vectors into a new data frame???
  ???
}

In summary, there are actually two main question: 1) Is there any better way to take the sub-dataframe above in C++? (wish I could simple say myDF[myDF$id == ID,]!!!)

2) Is there anyway that I can combine id_sub, alpha_sub, and mess_sub into an R data frame and return it?

I really appreciate your help.

Answer

10

votes

Kevin Ushey replied Apr 03 '14 at 09:28

Click to expand answer

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this question, returning a sub-dataframe) ? My code below may make my question more clear:

code:

# Suppose I have the data frame below created in R:
myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10))

# Suppose I want to write a C++ function that gets id as inout and returns 
# a sub-dataframe corresponding to that id (**If it's possible to return 
# DataFrame in C++**)

# Auxiliary function --> helps get a sub vector:
arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){
  arma::uvec AuxVec = find(IDVec == ID);
  arma::vec rslt = arma::vec(AuxVec.size());
  for (int i = 0; i < AuxVec.size(); i++){
    rslt[i] = vecMain[AuxVec[i]];
  }
  return rslt;
}

# Here is my C++ function:
Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){
  arma::vec id = Rcpp::as<arma::vec>(myDF["id"]);
  arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
  arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]);

  // here I take a sub-vector:
  arma::vec id_sub = myVecSubset(id, id, int ID);
  arma::vec alph_sub = myVecSubset(alph, id, int ID);
  arma::vec mess_sub = myVecSubset(mess, id, int ID);

  // here is the CHALLENGE: How to combine these vectors into a new data frame???
  ???
}

In summary, there are actually two main question: 1) Is there any better way to take the sub-dataframe above in C++? (wish I could simple say myDF[myDF$id == ID,]!!!)

2) Is there anyway that I can combine id_sub, alpha_sub, and mess_sub into an R data frame and return it?

I really appreciate your help.

Question 18

4

votes

2.5k

views

Changing R's Seed from Rcpp to Guarantee Reproducibility

[see original]

c++rrcpp

Yuge Hao asked Apr 05 '17 at 04:13

Click to expand question

I am trying to write a function r(d, n) in rcpp. The function returns n random draws from normal distribution N(0, d). This function should be well defined, therefore the function should return the same draws whenever the d and n do not change their value.

This won't be a problem if d is restricted to be integer, in which case I can set seed and do the job

// set seed
// [[Rcpp::export]]
void set_seed(unsigned int seed) {
  Rcpp::Environment base_env("package:base");
  Rcpp::Function set_seed_r = base_env["set.seed"];
  set_seed_r(seed);  
}

// function r(d, n)
// [[Rcpp::export]]
vec randdraw(int d, int n){
  set_seed(d);
  vec out = randn(n);
  return out;
}

But clearly I don't want to restrict d to be integer. Ideally d should be double. Any thoughts? Thank you!

Answer

7

votes

coatless replied Apr 05 '17 at 05:58

Click to expand answer

I am trying to write a function r(d, n) in rcpp. The function returns n random draws from normal distribution N(0, d). This function should be well defined, therefore the function should return the same draws whenever the d and n do not change their value.

This won't be a problem if d is restricted to be integer, in which case I can set seed and do the job

// set seed
// [[Rcpp::export]]
void set_seed(unsigned int seed) {
  Rcpp::Environment base_env("package:base");
  Rcpp::Function set_seed_r = base_env["set.seed"];
  set_seed_r(seed);  
}

// function r(d, n)
// [[Rcpp::export]]
vec randdraw(int d, int n){
  set_seed(d);
  vec out = randn(n);
  return out;
}

But clearly I don't want to restrict d to be integer. Ideally d should be double. Any thoughts? Thank you!

Question 19

25

votes

11.4k

views

Should I prefer Rcpp::NumericVector over std::vector?

[see original]

c++rcpp

Tim asked Jan 11 '17 at 22:38

Click to expand question

Is there any reason why I should prefer Rcpp::NumericVector over std::vector<double>?

For example, the two functions below

// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector& x) {
  Rcpp::NumericVector tmp(x.length());
  for (int i = 0; i < x.length(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

// [[Rcpp::export]]
std::vector<double> bar(const std::vector<double>& x) {
  std::vector<double> tmp(x.size());
  for (int i = 0; i < x.size(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

Are equivalent when considering their working and benchmarked performance. I understand that Rcpp offers sugar and vectorized operations, but if it is only about taking R's vector as input and returning vector as output, then would there be any difference which one of those I use? Can using std::vector<double> lead to any possible problems when interacting with R?

Answer

33

votes

coatless replied Jan 11 '17 at 22:59

Click to expand answer

Is there any reason why I should prefer Rcpp::NumericVector over std::vector<double>?

For example, the two functions below

// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector& x) {
  Rcpp::NumericVector tmp(x.length());
  for (int i = 0; i < x.length(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

// [[Rcpp::export]]
std::vector<double> bar(const std::vector<double>& x) {
  std::vector<double> tmp(x.size());
  for (int i = 0; i < x.size(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

Are equivalent when considering their working and benchmarked performance. I understand that Rcpp offers sugar and vectorized operations, but if it is only about taking R's vector as input and returning vector as output, then would there be any difference which one of those I use? Can using std::vector<double> lead to any possible problems when interacting with R?

Question 20

14

votes

6.1k

views

1-dimensional Matrix is changed to a vector in R

[see original]

rvectormatrixrcpp

Jona asked Mar 30 '12 at 19:19

Click to expand question

> a<-matrix(c(1:9),3,3)
> a
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> a[3,]*a[,3]  # I expect 1x1 matrix as result of this.
[1] 21 48 81
> class(a)
[1] "matrix"
> class(a[3,])
[1] "integer"

In R, 1-dimensional matrix is changed to a vector. Can I avoid this? I would like to keep 1-D matrix as a matrix. Actually, I need to throw many kind of matrix to RcppArmadillo, even zero-D matrix. Changing matrix to vector by itself is my problem.

Answer

19

votes

joran replied Mar 30 '12 at 19:21

Click to expand answer

> a<-matrix(c(1:9),3,3)
> a
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> a[3,]*a[,3]  # I expect 1x1 matrix as result of this.
[1] 21 48 81
> class(a)
[1] "matrix"
> class(a[3,])
[1] "integer"

In R, 1-dimensional matrix is changed to a vector. Can I avoid this? I would like to keep 1-D matrix as a matrix. Actually, I need to throw many kind of matrix to RcppArmadillo, even zero-D matrix. Changing matrix to vector by itself is my problem.

Question 21

13

votes

7k

views

How to build a R package which use Rcpp with external c++ libraries?

[see original]

rcpp

Xiaobo Gu asked Sep 02 '13 at 10:04

Click to expand question

Such as boost, where can I specify the following:

1.External c++ header file include path 
2.External c++ source file 
3.External c++ link library file path

Answer

15

votes

Dirk is no longer here replied Sep 02 '13 at 18:01

Click to expand answer

Such as boost, where can I specify the following:

1.External c++ header file include path 
2.External c++ source file 
3.External c++ link library file path

Question 22

13

votes

3.7k

views

efficiently locf by groups in a single R data.table

[see original]

rdataframedata.tabledplyrrcpp

carl.anderson asked May 05 '16 at 20:59

Click to expand question

I have a large, wide data.table (20m rows) keyed by a person ID but with lots of columns (~150) that have lots of null values. Each column is a recorded state / attribute that I wish to carry forward for each person. Each person may have anywhere from 10 to 10,000 observations and there are about 500,000 people in the set. Values from one person can not 'bleed' into the following person, so my solution must respect the person ID column and group appropriately.

For demonstration purposes - here's a very small sample input:

DT = data.table(
  id=c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
  aa=c("A", NA, "B", "C", NA, NA, "D", "E", "F", NA, NA, NA),
  bb=c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
  cc=c(1, NA, NA, NA, NA, 4, NA, 5, 6, NA, 7, NA)
)

It looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1 NA NA NA
 3:  1  B NA NA
 4:  1  C NA NA
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA NA
 8:  2  E NA  5
 9:  3  F NA  6
10:  3 NA NA NA
11:  3 NA NA  7
12:  3 NA NA NA

My expected output looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1  A NA  1
 3:  1  B NA  1
 4:  1  C NA  1
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA  4
 8:  2  E NA  5
 9:  3  F NA  6
10:  3  F NA  6
11:  3  F NA  7
12:  3  F NA  7

I've found a data.table solution that works, but it's terribly slow on my large data sets:

DT[, na.locf(.SD, na.rm=FALSE), by=id]

I've found equivalent solutions using dplyr that are equally slow.

GRP = DT %>% group_by(id)
data.table(GRP %>% mutate_each(funs(blah=na.locf(., na.rm=FALSE))))

I was hopeful that I could come up with a rolling 'self' join using the data.table functionality, but I just can't seem to get it right (I suspect I would need to use .N but I just haven't figured it out).

At this point I'm thinking I'll have to write something in Rcpp to efficiently apply the grouped locf.

I'm new to R, but I'm not new to C++ - so I'm confident I can do it. I just feel like there should be an efficient way to do this in R using data.table.

Answer

28

votes

alexis_laz replied May 06 '16 at 09:16

Click to expand answer

I have a large, wide data.table (20m rows) keyed by a person ID but with lots of columns (~150) that have lots of null values. Each column is a recorded state / attribute that I wish to carry forward for each person. Each person may have anywhere from 10 to 10,000 observations and there are about 500,000 people in the set. Values from one person can not 'bleed' into the following person, so my solution must respect the person ID column and group appropriately.

For demonstration purposes - here's a very small sample input:

DT = data.table(
  id=c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
  aa=c("A", NA, "B", "C", NA, NA, "D", "E", "F", NA, NA, NA),
  bb=c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
  cc=c(1, NA, NA, NA, NA, 4, NA, 5, 6, NA, 7, NA)
)

It looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1 NA NA NA
 3:  1  B NA NA
 4:  1  C NA NA
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA NA
 8:  2  E NA  5
 9:  3  F NA  6
10:  3 NA NA NA
11:  3 NA NA  7
12:  3 NA NA NA

My expected output looks like this:

    id aa bb cc
 1:  1  A NA  1
 2:  1  A NA  1
 3:  1  B NA  1
 4:  1  C NA  1
 5:  2 NA NA NA
 6:  2 NA NA  4
 7:  2  D NA  4
 8:  2  E NA  5
 9:  3  F NA  6
10:  3  F NA  6
11:  3  F NA  7
12:  3  F NA  7

I've found a data.table solution that works, but it's terribly slow on my large data sets:

DT[, na.locf(.SD, na.rm=FALSE), by=id]

I've found equivalent solutions using dplyr that are equally slow.

GRP = DT %>% group_by(id)
data.table(GRP %>% mutate_each(funs(blah=na.locf(., na.rm=FALSE))))

I was hopeful that I could come up with a rolling 'self' join using the data.table functionality, but I just can't seem to get it right (I suspect I would need to use .N but I just haven't figured it out).

At this point I'm thinking I'll have to write something in Rcpp to efficiently apply the grouped locf.

I'm new to R, but I'm not new to C++ - so I'm confident I can do it. I just feel like there should be an efficient way to do this in R using data.table.

Question 23

13

votes

4k

views

Building a tiny R package with CUDA and Rcpp

[see original]

rcudarcpp

landau asked Jun 03 '15 at 18:52

Click to expand question

I'm working on a tiny R package that uses CUDA and Rcpp, adapted from the output of Rcpp.package.skeleton(). I will first describe what happens on the master branch for the commit entitled "fixed namespace". The package installs successfully if I forget CUDA (i.e., if I remove the src/Makefile, change src/rcppcuda.cu to src/rcppcuda.cpp, and comment out the code that defines and calls kernels). But as is, the compilation fails.

I also would like to know how to compile with a Makevars or Makevars.in instead of a Makefile, and in general, try to make this as platform independent as is realistic. I've read about Makevars in the R extensions manual, but I still haven't been able to make it work.

Some of you may suggest rCUDA, but what I'm really after here is improving a big package I've already been developing for some time, and I'm not sure that switching is worth starting again from scratch.

Anyway, here's what happens when I do an R CMD build and R CMD INSTALL on this one (master branch, commit entitled "fixed namespace").

* installing to library ‘/home/landau/.R/library’
* installing *source* package ‘rcppcuda’ ...
** libs
** arch - 
/usr/local/cuda/bin/nvcc -c rcppcuda.cu -o rcppcuda.o --shared -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I/apps/R-3.2.0/include -I/usr/local/cuda/include 
rcppcuda.cu:1:18: error: Rcpp.h: No such file or directory
make: *** [rcppcuda.o] Error 1
ERROR: compilation failed for package ‘rcppcuda’
* removing ‘/home/landau/.R/library/rcppcuda’

...which is strange, because I do include Rcpp.h, and Rcpp is installed.

$ R

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

...

> library(Rcpp)
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.11.6
>

I'm using CentOS,

$ cat /etc/*-release
CentOS release 6.6 (Final)
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
CentOS release 6.6 (Final)
CentOS release 6.6 (Final)

CUDA version 6,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Thu_Mar_13_11:58:58_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1

and I have access to 4 GPUs of the same make and model.

$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery 
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "Tesla M2070"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5375 MBytes (5636554752 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock rate:                                1147 MHz (1.15 GHz)
  Memory Clock rate:                             1566 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           11 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

...

> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 4, Device0 = Tesla M2070, Device1 = Tesla M2070, Device2 = Tesla M2070, Device3 = Tesla M2070
Result = PASS

Edit: it compiles for any commit after "fixed namespace" on either branch, but there are still problems with combining Rcpp and CUDA

To make the package compile, it turns out that I just needed to separate my C++ and CUDA code into separate *.cpp and *.cu files. However, when I try the "compiling cpp and cu separately" commit on the master branch, I get

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

Error in .Call("someCPPcode", r) : 
  "someCPPcode" not resolved from current namespace (rcppcuda)
>

The error goes away in the withoutCUDA branch in the commit entitled "adding branch withoutCUDA".

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

[1] "Object changed."
An object of class "MyClass"
Slot "x":
 [1] 500   2   3   4   5   6   7   8   9  10

Slot "y":
 [1]    1 1000    3    4    5    6    7    8    9   10

>

The only differences between the "compiling cpp and cu separately" commit on master and the "adding branch withoutCUDA" commit on withoutCUDA are

The Makefile and someCUDAcode.cu are gone from withoutCUDA.
In withoutCUDA, all references to someCUDAcode() are gone from someCPPcode.cpp.

Also, it would still be convenient be able to use CUDA and Rcpp in the same *.cu file. I would really like to know how to fix the "fixed namespace" commit on the master branch.

Answer

13

votes

cdeterman replied Jun 08 '15 at 14:20

Click to expand answer

I'm working on a tiny R package that uses CUDA and Rcpp, adapted from the output of Rcpp.package.skeleton(). I will first describe what happens on the master branch for the commit entitled "fixed namespace". The package installs successfully if I forget CUDA (i.e., if I remove the src/Makefile, change src/rcppcuda.cu to src/rcppcuda.cpp, and comment out the code that defines and calls kernels). But as is, the compilation fails.

I also would like to know how to compile with a Makevars or Makevars.in instead of a Makefile, and in general, try to make this as platform independent as is realistic. I've read about Makevars in the R extensions manual, but I still haven't been able to make it work.

Some of you may suggest rCUDA, but what I'm really after here is improving a big package I've already been developing for some time, and I'm not sure that switching is worth starting again from scratch.

Anyway, here's what happens when I do an R CMD build and R CMD INSTALL on this one (master branch, commit entitled "fixed namespace").

* installing to library ‘/home/landau/.R/library’
* installing *source* package ‘rcppcuda’ ...
** libs
** arch - 
/usr/local/cuda/bin/nvcc -c rcppcuda.cu -o rcppcuda.o --shared -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I/apps/R-3.2.0/include -I/usr/local/cuda/include 
rcppcuda.cu:1:18: error: Rcpp.h: No such file or directory
make: *** [rcppcuda.o] Error 1
ERROR: compilation failed for package ‘rcppcuda’
* removing ‘/home/landau/.R/library/rcppcuda’

...which is strange, because I do include Rcpp.h, and Rcpp is installed.

$ R

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

...

> library(Rcpp)
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.11.6
>

I'm using CentOS,

$ cat /etc/*-release
CentOS release 6.6 (Final)
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
CentOS release 6.6 (Final)
CentOS release 6.6 (Final)

CUDA version 6,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Thu_Mar_13_11:58:58_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1

and I have access to 4 GPUs of the same make and model.

$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery 
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "Tesla M2070"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5375 MBytes (5636554752 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock rate:                                1147 MHz (1.15 GHz)
  Memory Clock rate:                             1566 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           11 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

...

> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 4, Device0 = Tesla M2070, Device1 = Tesla M2070, Device2 = Tesla M2070, Device3 = Tesla M2070
Result = PASS

Edit: it compiles for any commit after "fixed namespace" on either branch, but there are still problems with combining Rcpp and CUDA

To make the package compile, it turns out that I just needed to separate my C++ and CUDA code into separate *.cpp and *.cu files. However, when I try the "compiling cpp and cu separately" commit on the master branch, I get

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

Error in .Call("someCPPcode", r) : 
  "someCPPcode" not resolved from current namespace (rcppcuda)
>

The error goes away in the withoutCUDA branch in the commit entitled "adding branch withoutCUDA".

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

[1] "Object changed."
An object of class "MyClass"
Slot "x":
 [1] 500   2   3   4   5   6   7   8   9  10

Slot "y":
 [1]    1 1000    3    4    5    6    7    8    9   10

>

The only differences between the "compiling cpp and cu separately" commit on master and the "adding branch withoutCUDA" commit on withoutCUDA are

The Makefile and someCUDAcode.cu are gone from withoutCUDA.
In withoutCUDA, all references to someCUDAcode() are gone from someCPPcode.cpp.

Also, it would still be convenient be able to use CUDA and Rcpp in the same *.cu file. I would really like to know how to fix the "fixed namespace" commit on the master branch.

Question 24

9

votes

496

views

Group vector on conditional sum

[see original]

rperformancercpp

Sotos asked Aug 07 '17 at 14:58

Click to expand question

I want to group a vector based on the sum of the elements being less than or equal to n. Assume the following,

set.seed(1)
x <- sample(10, 20, replace = TRUE)
#[1]  3  4  6 10  3  9 10  7  7  1  3  2  7  4  8  5  8 10  4  8

#Where,
n = 15

The expected output would be to group values while their sum is <= 15, i.e.

y <- c(1, 1, 1, 2, 2, 3, 4, 5 ,5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10)

As you can see the sum is never greater than 15,

sapply(split(x, y), sum)
# 1  2  3  4  5  6  7  8  9 10 
#13 13  9 10 15 12 12 13 14  8

NOTE: I will be running this on huge datasets (usually > 150 - 200GB) so efficiency is a must.

A method that I tried and comes close but fails is,

as.integer(cut(cumsum(x), breaks = seq(0, max(cumsum(x)) + 15, 15)))
#[1] 1 1 1 2 2 3 3 4 4 4 5 5 5 6 6 6 7 8 8 8

Answer

4

votes

David replied Aug 07 '17 at 16:23

Click to expand answer

I want to group a vector based on the sum of the elements being less than or equal to n. Assume the following,

set.seed(1)
x <- sample(10, 20, replace = TRUE)
#[1]  3  4  6 10  3  9 10  7  7  1  3  2  7  4  8  5  8 10  4  8

#Where,
n = 15

The expected output would be to group values while their sum is <= 15, i.e.

y <- c(1, 1, 1, 2, 2, 3, 4, 5 ,5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10)

As you can see the sum is never greater than 15,

sapply(split(x, y), sum)
# 1  2  3  4  5  6  7  8  9 10 
#13 13  9 10 15 12 12 13 14  8

NOTE: I will be running this on huge datasets (usually > 150 - 200GB) so efficiency is a must.

A method that I tried and comes close but fails is,

as.integer(cut(cumsum(x), breaks = seq(0, max(cumsum(x)) + 15, 15)))
#[1] 1 1 1 2 2 3 3 4 4 4 5 5 5 6 6 6 7 8 8 8

Question 25

8

votes

1.8k

views

Renaming and Hiding an Exported Rcpp function in an R Package

[see original]

rrcppr-packageroxygen2

Cello asked Sep 04 '17 at 14:24

Click to expand question

Writing an R package, I have a R function that calls a specific Rcpp function. The Rcpp function just serves as a helper function and I do not want to creat a .Rd file for it. My solution so far is to export both functions in the Namespace file which causes the warning to create an .Rd file for the Rcpp function as soon as I run the check command. If I remove the helper function in the Namespace file, I get rid of this warning causing now the problem that the R function is not able to find it anymore.

Is there a way to solve this problem. That means to make the Rcpp function still visible for the R function and at the same time to get rid of the warning that there exists no .Rd file for the Rcpp function?

Thanks a lot :-)

Answer

10

votes

coatless replied Sep 04 '17 at 15:16

Click to expand answer

Writing an R package, I have a R function that calls a specific Rcpp function. The Rcpp function just serves as a helper function and I do not want to creat a .Rd file for it. My solution so far is to export both functions in the Namespace file which causes the warning to create an .Rd file for the Rcpp function as soon as I run the check command. If I remove the helper function in the Namespace file, I get rid of this warning causing now the problem that the R function is not able to find it anymore.

Is there a way to solve this problem. That means to make the Rcpp function still visible for the R function and at the same time to get rid of the warning that there exists no .Rd file for the Rcpp function?

Thanks a lot :-)

Question 26

3

votes

608

views

how many vectors can be added in DataFrame::create( vec1, vec2 ... )?

[see original]

rcpp

Chris Snow asked Dec 09 '14 at 04:40

Click to expand question

I am creating a DataFrame to hold a parsed haproxy http log files which has quite a few fields (25+).

If I add more than 20 vectors (one for each field), I get the compilation error:

no matching function call to 'create'

The create method:

    return DataFrame::create(
      _["clientIp"]     = clientIp,
      _["clientPort"]   = clientPort,
      _["acceptDate"]   = acceptDate,
      _["frontendName"] = frontendName,
      _["backendName"]  = backendName,
      _["serverName"]   = serverName,
      _["tq"]           = tq,
      _["tw"]           = tw,
      _["tc"]           = tc,
      _["tr"]           = tr,
      _["tt"]           = tt,
      _["status_code"]  = statusCode,
      _["bytes_read"]   = bytesRead,

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedRequestCookie"]   = capturedRequestCookie,
#endif     

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedResponseCookie"]   = capturedResponseCookie,
#endif    

      _["terminationState"] = terminationState,
      _["actconn"]          = actconn,
      _["feconn"]           = feconn,
      _["beconn"]           = beconn,
      _["srv_conn"]         = srvConn,
      _["retries"]          = retries,
      _["serverQueue"]      = serverQueue,
      _["backendQueue"]     = backendQueue 
    );

Questions:

Have I hit a hard limit?
Is there a workaround to allow me to add more than 20 vectors to a data frame?

Answer

6

votes

Kevin Ushey replied Dec 09 '14 at 05:03

Click to expand answer

I am creating a DataFrame to hold a parsed haproxy http log files which has quite a few fields (25+).

If I add more than 20 vectors (one for each field), I get the compilation error:

no matching function call to 'create'

The create method:

    return DataFrame::create(
      _["clientIp"]     = clientIp,
      _["clientPort"]   = clientPort,
      _["acceptDate"]   = acceptDate,
      _["frontendName"] = frontendName,
      _["backendName"]  = backendName,
      _["serverName"]   = serverName,
      _["tq"]           = tq,
      _["tw"]           = tw,
      _["tc"]           = tc,
      _["tr"]           = tr,
      _["tt"]           = tt,
      _["status_code"]  = statusCode,
      _["bytes_read"]   = bytesRead,

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedRequestCookie"]   = capturedRequestCookie,
#endif     

#if CAPTURED_REQUEST_COOKIE_FIELD == 1
      _["capturedResponseCookie"]   = capturedResponseCookie,
#endif    

      _["terminationState"] = terminationState,
      _["actconn"]          = actconn,
      _["feconn"]           = feconn,
      _["beconn"]           = beconn,
      _["srv_conn"]         = srvConn,
      _["retries"]          = retries,
      _["serverQueue"]      = serverQueue,
      _["backendQueue"]     = backendQueue 
    );

Questions:

Have I hit a hard limit?
Is there a workaround to allow me to add more than 20 vectors to a data frame?

Question 27

2

votes

1.1k

views

Calling R's optim function from within C++ using Rcpp

[see original]

c++rrcpp

Siqi.Xiang asked Sep 28 '17 at 16:18

Click to expand question

I'm new for Rcpp, and in my code, I must call the R function "optim" from C++. I referred to many examples, but there is still a mistake: "static assertion failed: cannot convert type to SEXP". Here is my code, and the problem is the last function:

#include <RcppArmadillo.h>

using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;
using namespace std;
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
double fr(arma::colvec x){
  double result = 100 * (x(2) - x(1) * x(1)) * (x(2) - x(1) * x(1)) + (1 -     x(1)) * (1 - x(1));
  return result;  
} 


typedef double (*funcPtr)(arma::colvec x);

// [[Rcpp::export]]
XPtr<funcPtr> putFunPtrInXPtr(){
  return(XPtr<funcPtr>(new funcPtr(&fr)));
}


// [[Rcpp::export]]
arma::colvec callOptimFun(SEXP x) {

  RNGScope scope;

  Rcpp::Environment stats("package:stats");
  Rcpp::Function optim = stats["optim"];
  XPtr<funcPtr> xpfun = putFunPtrInXPtr();
  funcPtr fun = *xpfun;
  Rcpp::List y = optim(x, fun);
  arma::colvec r = y["par"];
  return r; 
}

Unfortunately, I have tried lots of methods for my last function, and all methods have the same error. These are my tries: 1.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

2.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

3.

// [[Rcpp::export]]
Rcpp::List callOptimFun(SEXP x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

I am not familiar with C++. What may be the problem? Thanks!

Answer

4

votes

coatless replied Sep 28 '17 at 17:35

Click to expand answer

I'm new for Rcpp, and in my code, I must call the R function "optim" from C++. I referred to many examples, but there is still a mistake: "static assertion failed: cannot convert type to SEXP". Here is my code, and the problem is the last function:

#include <RcppArmadillo.h>

using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;
using namespace std;
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
double fr(arma::colvec x){
  double result = 100 * (x(2) - x(1) * x(1)) * (x(2) - x(1) * x(1)) + (1 -     x(1)) * (1 - x(1));
  return result;  
} 


typedef double (*funcPtr)(arma::colvec x);

// [[Rcpp::export]]
XPtr<funcPtr> putFunPtrInXPtr(){
  return(XPtr<funcPtr>(new funcPtr(&fr)));
}


// [[Rcpp::export]]
arma::colvec callOptimFun(SEXP x) {

  RNGScope scope;

  Rcpp::Environment stats("package:stats");
  Rcpp::Function optim = stats["optim"];
  XPtr<funcPtr> xpfun = putFunPtrInXPtr();
  funcPtr fun = *xpfun;
  Rcpp::List y = optim(x, fun);
  arma::colvec r = y["par"];
  return r; 
}

Unfortunately, I have tried lots of methods for my last function, and all methods have the same error. These are my tries: 1.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

2.

// [[Rcpp::export]]
Rcpp::List callOptimFun(arma::colvec x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

3.

// [[Rcpp::export]]
Rcpp::List callOptimFun(SEXP x) {

  \\....
  Rcpp::List y = optim(x, fun);
  return y; 
}

I am not familiar with C++. What may be the problem? Thanks!

Question 28

2

votes

1.8k

views

Cannot use 'sourceCpp' from a file

[see original]

c++rrcpp

Oriol asked Jan 31 '18 at 17:11

Click to expand question

I am trying to play with the Rcpp package from the R statistical software. Unfortunately, I cannot compile a basic example with the sourceCpp function. My toy example is the following:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
int hello(int a)
{
    return a;
}

Saved into the file "helloworldR.cpp.

Then when I call sourceCpp:

library(Rcpp)
sourceCpp("helloworldR.cpp")

I get the following error:

In file included from /home/userD/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/RcppCommon.h:65:0,
                  from /home/master/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp.h:27,
                  from helloworldR.cpp:1:
 /home/userD/map:1:1: error: stray ‘\177’ in program
  ^?ELF^B^A^A         ^B > ^A    ^K@     @       Xw          @ 8   @   ^] ^F   ^E
  ^
 /home/userD/map:1:8: warning: null character(s) ignored

make: *** [helloworldR.o] Error 1
g++  -I/usr/local/lib/R/include -DNDEBUG   -I"/home/master/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include" -I"/home/userD" -I/usr/local/include   -fpic  -g -O2  -c helloworldR.cpp -o helloworldR.o
 /usr/local/lib/R/etc/Makeconf:168: recipe for target 'helloworldR.o' failed
 Error in sourceCpp("helloworldR.cpp") :
   Error 1 occurred building shared library.

But if I put the code as a string it works:

sourceCpp(code = "#include <Rcpp.h>
 + using namespace Rcpp;
 +
 + // [[Rcpp::export]]
 + int hello(int a)
 + {
 +   return a;
 + }
 + ")
 > hello(2)
 [1] 2

What do you think it can be? Here is my session information:

sessionInfo()
 R version 3.4.3 (2017-11-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 16.04.3 LTS

 Matrix products: default
 BLAS: /usr/local/lib/R/lib/libRblas.so
 LAPACK: /usr/local/lib/R/lib/libRlapack.so

 locale:
  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8
  [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8
  [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] ggplot2_2.2.1 Rcpp_0.12.15

 loaded via a namespace (and not attached):
  [1] colorspace_1.3-2 scales_0.5.0     compiler_3.4.3   lazyeval_0.2.1
  [5] plyr_1.8.4       tools_3.4.3      pillar_1.1.0     gtable_0.2.0
  [9] tibble_1.4.2     grid_3.4.3       rlang_0.1.6      munsell_0.4.3

Answer

0

votes

Dirk is no longer here replied Jan 31 '18 at 17:27

Click to expand answer

I am trying to play with the Rcpp package from the R statistical software. Unfortunately, I cannot compile a basic example with the sourceCpp function. My toy example is the following:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
int hello(int a)
{
    return a;
}

Saved into the file "helloworldR.cpp.

Then when I call sourceCpp:

library(Rcpp)
sourceCpp("helloworldR.cpp")

I get the following error:

In file included from /home/userD/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/RcppCommon.h:65:0,
                  from /home/master/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp.h:27,
                  from helloworldR.cpp:1:
 /home/userD/map:1:1: error: stray ‘\177’ in program
  ^?ELF^B^A^A         ^B > ^A    ^K@     @       Xw          @ 8   @   ^] ^F   ^E
  ^
 /home/userD/map:1:8: warning: null character(s) ignored

make: *** [helloworldR.o] Error 1
g++  -I/usr/local/lib/R/include -DNDEBUG   -I"/home/master/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include" -I"/home/userD" -I/usr/local/include   -fpic  -g -O2  -c helloworldR.cpp -o helloworldR.o
 /usr/local/lib/R/etc/Makeconf:168: recipe for target 'helloworldR.o' failed
 Error in sourceCpp("helloworldR.cpp") :
   Error 1 occurred building shared library.

But if I put the code as a string it works:

sourceCpp(code = "#include <Rcpp.h>
 + using namespace Rcpp;
 +
 + // [[Rcpp::export]]
 + int hello(int a)
 + {
 +   return a;
 + }
 + ")
 > hello(2)
 [1] 2

What do you think it can be? Here is my session information:

sessionInfo()
 R version 3.4.3 (2017-11-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 16.04.3 LTS

 Matrix products: default
 BLAS: /usr/local/lib/R/lib/libRblas.so
 LAPACK: /usr/local/lib/R/lib/libRlapack.so

 locale:
  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8
  [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8
  [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] ggplot2_2.2.1 Rcpp_0.12.15

 loaded via a namespace (and not attached):
  [1] colorspace_1.3-2 scales_0.5.0     compiler_3.4.3   lazyeval_0.2.1
  [5] plyr_1.8.4       tools_3.4.3      pillar_1.1.0     gtable_0.2.0
  [9] tibble_1.4.2     grid_3.4.3       rlang_0.1.6      munsell_0.4.3

Question 29

44

votes

12.9k

views

Using 3rd party header files with Rcpp

[see original]

c++rrcpp

politicalEconomist asked Dec 21 '12 at 18:23

Click to expand question

I have a header file called coolStuff.h that contains a function awesomeSauce(arg1) that I would like to use in my cpp source file.

Directory Structure:

RworkingDirectory
- sourceCpp
  - theCppFile.cpp
- cppHeaders
  - coolStuff.h

The Code:

#include <Rcpp.h>
#include <cppHeaders/coolStuff.h>
using namespace Rcpp;

// [[Rcpp::export]]
double someFunctionCpp(double someInput){

 double someOutput = awesomeSauce(someInput);

return someOutput;
}

I get the error:

 theCppFile.cpp:2:31: error: cppHeaders/coolStuff.h: No such file or directory

I have moved the file and directory all over the place and can't seem to get this to work. I see examples all over the place of using 3rd party headers that say just do this:

#include <boost/array.hpp>

(Thats from Hadley/devtools)

https://github.com/hadley/devtools/wiki/Rcpp

So what gives? I have been searching all morning and can't find an answer to what seems to me like a simple thing.

UPDATE 01.11.12

Ok now that I have figured out how to build packages that use Rcpp in Rstudio let me rephrase the question. I have a stand alone header file coolStuff.h that contains a function I want to use in my cpp code.

1) Where should I place coolStuff.h in the package directory structure so the function it contains can be used by theCppFile.cpp?

2) How do I call coolStuff.h in the cpp files? Thanks again for your help. I learned a lot from the last conversation.

Note: I read the vignette "Writing a package that uses Rcpp" and it does not explain how to do this.

The Answer:

Ok let me summarize the answer to my question since it is scattered across this page. If I get a detail wrong feel free to edit this or let me know and I will edit it:

So you found a .h or .cpp file that contains a function or some other bit of code you want to use in a .cpp file you are writing to use with Rcpp.

Lets keep calling this found code coolStuff.h and call the function you want to use awesomeSauce(). Lets call the file you are writing theCppFile.cpp.

(I should note here that the code in .h files and in .cpp files is all C++ code and the difference between them is for the C++ programer to keep things organized in the proper way. I will leave a discussion of the difference out here, but a simple search here on SO will lead you to discussion of the difference. For you the R programer needing to use a bit o' code you found, there is no real difference.)

IN SHORT: You can use a file like coolStuff.h provided it calls no other libraries, by either cut-and-pasteing into theCppFile.cpp, or if you create a package you can place the file in the \src directory with the theCppFile.cpp file and use #include "coolStuff.h" at the top of the file you are writing. The latter is more flexible and allows you to use functions in coolStuff.h in other .cpp files.

DETAILS:

1) coolStuff.h must not call other libraries. So that means it cannot have any include statements at the top. If it does, what I detail below probably will not work, and the use of found code that calls other libraries is beyond the scope of this answer.

2) If you want to compile the file with sourceCpp() you need to cut and paste coolStuff.h into theCppFile.cpp. I am told there are exceptions, but sourceCpp() is designed to compile one .cpp file, so thats the best route to take.

(NOTE: I make no guarantees that a simple cut and paste will work out of the box. You may have to rename variables, or more likely switch the data types being used to be consistent with those you are using in theCppFile.cpp. But so far, cut-and-paste has worked with minimal fuss for me with 6 different simple .h files)

3) If you only need to use code from coolStuff.h in theCppFile.cpp and nowhere else, then you should cut and paste it into theCppFile.cpp.

(Again I make no guarantees see the note above about cut-and-paste)

4) If you want to use code contained in coolStuff.h in theCppFile.cpp AND other .cpp files, you need to look into building a package. This is not hard, but can be a bit tricky, because the information out there about building packages with Rcpp ranges from the exhaustive thorough documentation you want with any R package (but that is above your head as a newbie), and the newbie sensitive introductions (that may leave out a detail you happen to need).

Here is what I suggest:

A) First get a version of theCppFile.cpp with the code from coolStuff.h cut-and-paste into theCppFile.cpp that compiles with sourceCpp() and works as you expect it to. This is not a must, but if you are new to Rcpp OR packages, it is nice to make sure your code works in this simple situation before you move to the more complicated case below.

B) Now build your package using Rcpp.package.skeleton() or use the Build functionality in RStudio (HIGHLY recommended). You can find details about using Rcpp.package.skeleton() in hadley/devtools or Rcpp Attributes Vignette. The full documentation for writing packages with Rcpp is in the Writing a package that uses Rcpp, however this one assumes you know your way around C++ fairly well, and does not use the new "Attributes" way of doing Rcpp.

Don't forget to "Build & Reload" if using RStudio or compileAttributes() if you are not in RStudio.

C) Now you should see in your \R directory a file called RcppExports.R. Open it and check it out. In RcppExports.R you should see the R wrapper functions for all the .cpp files you have in your \src directory. Pretty sweet.

D) Try out the R function that corresponds to the function you wrote in theCppFile.cpp. Does it work? If so move on.

E) With your package built you can move coolStuff.h into the src folder with theCppFile.cpp.

F) Now you can remove the cut-and-paste code from theCppFile.cpp and at the top of theCppFile.cpp (and any other .cpp file you want to use code from coolStuff.h) put #include "coolStuff.h" just after #include <Rcpp.h>. Note that there are no brackets around ranker.h, rather there are "". This is a C++ convention when including local files provided by the user rather than a library file like Rcpp or STL etc...

G) Now you have to rebuild the package. In RStudio this is just "Build & Reload" in the Build menu. If you are not using RStudio you should run compileAttributes()

H) Now try the R function again just as you did in step D), hopefully it works.

Answer

25

votes

jjallaire replied Dec 21 '12 at 19:34

Click to expand answer

I have a header file called coolStuff.h that contains a function awesomeSauce(arg1) that I would like to use in my cpp source file.

Directory Structure:

RworkingDirectory
- sourceCpp
  - theCppFile.cpp
- cppHeaders
  - coolStuff.h

The Code:

#include <Rcpp.h>
#include <cppHeaders/coolStuff.h>
using namespace Rcpp;

// [[Rcpp::export]]
double someFunctionCpp(double someInput){

 double someOutput = awesomeSauce(someInput);

return someOutput;
}

I get the error:

 theCppFile.cpp:2:31: error: cppHeaders/coolStuff.h: No such file or directory

I have moved the file and directory all over the place and can't seem to get this to work. I see examples all over the place of using 3rd party headers that say just do this:

#include <boost/array.hpp>