Reading a UTF-8 encoded flat file using read.csv - 'µ' char not read properly

Oct 25, 2012 at 2:45 PM

UTF-8 encoded csv file not read 

Hi,

I am finding this issue while reading a UTF -8 encoded flat file. The code is given as below.

REngine.SetDllDirectory(@"C:\Software\R\R-2.13.0\bin\i386");REngine.CreateInstance("RDotNet", new[] { "-q" });REngine engine = REngine.GetInstanceFromID("RDotNet");

String filePath = "C://Data//Demo.csv";

engine.EagerEvaluate("data<-read.csv('" + filePath + "', encoding="UTF-8", header=TRUE, sep = '|')");

CharacterVector Result = engine.EagerEvaluate("(toString(sort(unique(data$details))))").AsCharacter();MessageBox.Show(Result[0].ToString());

 

Demo.csv

sno|details

10|2:0 μ Sample Data

File is in UTF-8 format.While reading it showed like '2:0 Âμ Sample' Data instead of '2:0 μ Sample Data'

when we open the R-Console open the file using the below command, the data is read properly:

data<-read.csv("C:/Data/Demo.csv",encoding = "UTF-8",header=TRUE,sep="|")

toString(sort(unique(data$details)))

 

Can anyone please suggest how to read the file properly?

Nov 14, 2012 at 6:16 PM

as far as i know, R itself cannot read UTF-8 encoding. you need to encode the files in ANSI.

now, "μ" is a greek letter, and is not part of the ASCII table, but rather on the Unicode UTF table, hence, you cannot use this letter at all.

 

 If now someone know a way of making R read UTF-8 files, please tell me as well, because i've had similar problems