Reading R data frames with strings

Nov 7, 2013 at 9:02 PM
Edited Nov 7, 2013 at 9:03 PM
I'm trying to import R data frames into Deedle data frames - using the AsDataFrame extension method and then converting the data appropriately.

However, it looks like the AsDataFrame extension method does not work properly when the data frame contains strings (it returns all data as integers). For example:
let re = REngine.CreateInstance("x")
re.Initialize()
re.Evaluate("library(datasets)")
re.Evaluate("esoph").AsDataFrame() 
Here, I get something like this:
val it : DataFrame =
  seq
    [seq [1; 1; 1; 1; ...]; seq [1; 1; 1; 1; ...]; seq [1; 2; 3; 4; ...];
     seq [0.0; 0.0; 0.0; 0.0; ...]; ...]
If I explore the data set in R, it clearly contains strings:
> esoph
   agegp     alcgp    tobgp ncases ncontrols
1  25-34 0-39g/day 0-9g/day      0        40
2  25-34 0-39g/day    10-19      0        10
3  25-34 0-39g/day    20-29      0         6
Coordinator
Nov 7, 2013 at 10:42 PM
A string vector is often converted into a factor vector. A factor is not a string vector but an integer vector internally. So, if you retrieve an underlying value from a factor in a data frame, you would get an integer. You could use Factor class via Factor active pattern and invoke GetFactors() to obtain string representations.
match column with
   | Factor (f) -> f.GetFactors ()
   | CharacterVector (c) -> c.ToArray ()
   ...