Creating DataFrame In Memory

Mar 22, 2014 at 10:54 PM
Hi,

I have a .NET class with tabular data. I want to convert it to a DataFrame, so that I can pass it in to R.
I noticed that there is no public CTor on the DataFrame class and I wasn't able to find a way to create such class.
What is the the most efficient way to create a DataFrame? I don't want to save the data to file and load it inside R. I want to do everything in memory.

Thank you,
Yoav.
Developer
Mar 23, 2014 at 10:38 AM
If you are using R.NET 1.5.5 I think you can do the following (caveat: I did not run a test)
REngine e;
int[] yourIntegers;
double[] yourDoubles;
// etc.
e.SetSymbol("a", e.CreateIntegerVector(yourIntegers))
e.SetSymbol("b", e.CreateNumericVector(yourDoubles))

var df = e.Evaluate("data.frame(nameForA=a, nameForB=b)").AsDataFrame();
I am currently looking at adding a extension function to facilitate creating data frames, which will make it to what may be called version 1.6 in a few weeks.
Mar 23, 2014 at 7:34 PM
Thank you for your reply.

Is there a way to handle missing values?
For example, if my column is int?[], how can I create an integer vector to represent this?

Thanks,
Yoav.

Looking forward for v1.6!
Mar 23, 2014 at 8:46 PM
Ok, I think I figured it out.
For integers I need to pass in Int32.MinValue and for doubles I need to pass in double.NaN.
Mar 26, 2014 at 12:27 AM
Hi again.

It seems like there is a problem with logical arrays with missing values.
Doing this: e.Evaluate("a <- c(0,1,NA,1)").AsLogical();
will result with "false, true, true, true".
And I can't find a clear way how to pass bool?[] into R.Net.

How do I handle bool?[] arrays?
Developer
Mar 26, 2014 at 1:33 AM
This is not supported via nullable booleans; however you can do the following to identify NAs
    var boolVals = e.Evaluate("a <- c(0,1,NA,1)").AsLogical().ToArray();
    var missingValuesAt = e.Evaluate("which(is.na(a))").AsInteger().ToArray();
Mar 26, 2014 at 1:47 AM
Is there a plan on changing it in the future and support nullable bool arrays (both as inputs and as outputs)?
Shouldn't this be relatively simple, as booleans are represented as integers in R and the NA value is int.MinValue, so in theory, when converting the array of nullable booleans to array of integers (and back) the int.MinValue can be used to represent NA values.
Mar 26, 2014 at 2:07 AM
Quick question about nullable boolean arrays.
Is it possible to use integer vectors instead of logical vectors when passing in (and out) nullable boolean arrays to R using R.Net?
From the R documentation (https://stat.ethz.ch/R-manual/R-devel/library/base/html/logical.html) it seems like they are interchangable: "Logical vectors are coerced to integer vectors in contexts where a numerical value is required, with TRUE being mapped to 1L, FALSE to 0L and NA to NA_integer_.", but I wanted to understand if logical vectors are "marked" in a special way so that R understands that they are logical.

Will this work?
Mar 26, 2014 at 3:25 AM
Hi J-M,

I got an error when creating DataFrame in memory:
                e.SetSymbol("dData", e.CreateNumericVector(dftData));
                e.Evaluate("print(dData)");
                var df = e.Evaluate("data.frame(X=dData").AsDataFrame();
dData print out ok, but next statement causes the error:
[1]  5  8  9 10 11 13 13 13 13 15 15 16 16 17 17 18 18 19 19 22 22 25 25 26 28
[26] 28 29 32 32 34 38 44
A first chance exception of type 'System.ArgumentNullException' occurred in RDotNet.dll

Additional information: Value cannot be null.
Did I miss something here?
Developer
Mar 26, 2014 at 3:32 AM
Not sure this is it, but you are missing a closing bracket in the string.
var df = e.Evaluate("data.frame(X=dData").AsDataFrame()
should be
var df = e.Evaluate("data.frame(X=dData)").AsDataFrame()
Mar 26, 2014 at 3:50 AM
Yes, it is the one causes the null problem :D

But next line:
var df = e.Evaluate("data.frame(X=dData)").AsDataFrame();
e.Evaluate("print(X)");
generates:
Error in print(X) : object 'X' not found
Mar 27, 2014 at 1:56 AM
Shifting the discussion back to nullable boolean arrays.
I think it should be easy enough to create my own class inheriting from Vector<bool?>. The only problem is that the ProtectedPointer class is internal and can't be reused in custom classes.
Is it possible to make it public in future releases?
Or even better, implement the ProtectedPointer call in the base Vector<T> class (and also do the index range check there).

Does it make sense?
Developer
Mar 27, 2014 at 2:12 AM
Yes I think I see what you mean.

If you feel up for it (there is no harm trying), you can create a fork of the repository for your own needs, with the possibility to submit a pull request.
I am not sure whether this applies to mercurial, but with github the model is to fork, create a branch on your forked repo, make your changes on this branch, then submit the pull request.