Memory leak (/issue), Please Help !

Feb 13, 2014 at 12:47 AM
Edited Feb 13, 2014 at 1:43 AM
Hello All:

I've been busting my guts trying to get this function to work, but it runs out of memory after x = 3 or 4. I should be clearing all the stuff, trying every last trick I know.

What have I missed ? Is there a way I can explore further ?
I've run the code without all the R.NET stuff and runs fine , so I believe it must have to do with my R things.

See below:
 public static void ExampleFunction()
  {
        var oldPath = System.Environment.GetEnvironmentVariable("PATH");
        var rPath = ALLRG___.RFOLDER();

        var newPath = System.String.Format("{0}{1}{2}", rPath, System.IO.Path.PathSeparator, oldPath);
        System.Environment.SetEnvironmentVariable("PATH", newPath);

        REngine rEngine = REngine.CreateInstance("RDotNet");
        rEngine.Initialize();
        
        for (int x = 0; x < 20; x++)
        {
            int startPoint = 30;
            
            for (int c = 30; c < 600; c++)
            {
                sdTable.Clear();
                sdTable.Dispose();
                sdTable = new DataTable();
                sqlquery = " SELECT * FROM TEMPT_MAINTABLE " +
                           " WHERE SET_DAT_ID = " + c +
                           " AND POINT_DAT_ID >= 30 " +
                           " AND MODEL_ID = " + x +
                           " ORDER BY PRD_ID ASC, POINT_DAT_ID ASC ";
                myCommand = new SqlCommand(sqlquery, bthConn);
                sdReader = myCommand.ExecuteReader();
                sdTable.Load(sdReader);
                
                foreach (int sid in lsAllPrds)
                {
                    prdTable.Clear();
                    prdTable.Dispose();
                    prdTable = new DataTable();
                    prdTable =
                        (
                            from dbl in sdTable.AsEnumerable()
                            where
                                dbl.Field<int>("PRD_ID") == sid &&
                                dbl.ItemArray.All(v => v != null && 
                                v != DBNull.Value)
                            orderby
                                dbl.Field<int>("POINT_DAT_ID") ascending
                            select dbl
                        ).CopyToDataTable();

                    int use_length = prdTable.Rows.Count;

                    for (int i = -1; i < 15; i++)
                    {
                        string varName = "";
                        if (i >= 0) varName = indFldNames[i];
                        if (i == -1) varName = "SCORE_FIELD";

                        double[] tempDouble = new double[use_length];
                        for (int j = 0; j < use_length; j++)
                            tempDouble[j] = 
                                Convert.ToDouble(prdTable.Rows[j][varName]);

                        using (RDotNet.NumericVector prdPxVector = 
                                rEngine.CreateNumericVector(tempDouble))
                        {
                            rEngine.SetSymbol(varName, prdPxVector);

                            prdPxVector.Close();
                            prdPxVector.SetHandleAsInvalid();
                            prdPxVector.Dispose();

                        }
                    }

                    GC.Collect();
                    rEngine.ForceGarbageCollection();
                    
                    rEngine.Evaluate("model_dataframe <- " +
                                     "data.frame(" + indFldNames.InnerJoin(",") + ", SCORE_FIELD )");
                                     
                    rEngine.Evaluate("rm(" + indFldNames.InnerJoin(",") + ", SCORE_FIELD)");
                    
                    rEngine.Evaluate("trans_model_dataframe <-" +
                                     "    ((model_dataframe[75:2,]) - " +
                                     "    (model_dataframe[74:1,])) / " +
                                     "    (model_dataframe[74:1,]) ");
                                      
                    
                    rquery = "m <- lm(SCORE_FIELD ~ " + 
                              indFldNames.InnerJoin(model_jchar) + "," +
                             " data= trans_model_dataframe)";
                             
                    rEngine.Evaluate(rquery);

                    rEngine.Evaluate("m.RSQ <- summary(m)[['r.squared']]");
                    DynamicVector mRSQ = rEngine.GetSymbol("m.RSQ").AsVector();
                    double rSqOut = Convert.ToDouble(mRSQ[0]);
                                        
                    DynamicVector testvar5 = 
                        rEngine.GetSymbol("mNames").AsVector();
                        
                    NumericMatrix testvar14t = 
                        rEngine.GetSymbol("mCoef").AsNumericMatrix();

                    for (int i = 0; i < testvar14t.RowCount; i++)
                    {
                        string tempIvN0 = testvar5[i].ToString();

                        string tempIvN = testvar5[i].ToString();

                        Single coeff = (Single)testvar14t[i, 0];
                        Single pvalue = (Single)testvar14t[i, 3];

                        if (Single.IsInfinity(coeff)) coeff = 0;
                        if (Single.IsNaN(coeff)) coeff = 0;

                        if (Single.IsInfinity(pvalue)) pvalue = 1;
                        if (Single.IsNaN(pvalue)) pvalue = 1;

                        DataRow oprNewRow = regnTable.NewRow();
                        oprNewRow["PRD_ID"] = sid;
                        oprNewRow["DAT_ID"] = c;
                        oprNewRow["LINDEX_ID"] = x;
                        oprNewRow["IVR_INDEX_ID"] = indVarCodes[tempIvN];
                        oprNewRow["IVR_INDEX_COEFF"] = coeff;
                        oprNewRow["IVR_INDEX_PVALUE"] = pvalue;
                        regnTable.Rows.Add(oprNewRow);

                    }

                    testvar5.Dispose();
                    testvar5 = null;

                    testvar14t.Dispose();
                    testvar14t = null;
                    
                    mRSQ.Dispose();
                    mRSQ = null;
                    
                    rEngine.Evaluate("rm(m)");
                    rEngine.Evaluate("rm(SCORE_FIELD)");
                    rEngine.Evaluate("rm(trans_model_dataframe)");
                    rEngine.Evaluate("rm(model_dataframe)");
                    rEngine.Evaluate("rm(m.RSQ)");
                    rEngine.Evaluate("rm(mNames)");
                    rEngine.Evaluate("rm(mCoef)");
                    rEngine.Evaluate("gc()");

                    rEngine.Evaluate("rm( list = ls( all = TRUE ) )");
                    rEngine.Evaluate("gc()");                           
                    
                }
                
                GC.Collect();
                rEngine.ForceGarbageCollection();
                
            }
        
        }
        
        rEngine.Close();
        rEngine.Dispose();
  }
Feb 13, 2014 at 12:55 AM
gc() should print out memory usage after each call, call it at the start and start of the loop and verify that the memory is being cleared. You can also do memory profiling in R via, http://stackoverflow.com/questions/7856306/monitor-memory-usage-in-r.

You seem to be holding two copies of the memory, one in C#/.NET the other in R, this alone will double the memory. It would be useful to know where the OOM error occurs, and how much memory is being used when it is run in R versus just as a .NET loop.
Feb 13, 2014 at 1:44 AM
I don't know why my plusses are coming out as &#43 ??
Feb 13, 2014 at 1:52 AM
Edited Feb 13, 2014 at 1:53 AM
evolvedmicrobe wrote:
gc() should print out memory usage after each call, call it at the start and start of the loop and verify that the memory is being cleared. You can also do memory profiling in R via, http://stackoverflow.com/questions/7856306/monitor-memory-usage-in-r.

You seem to be holding two copies of the memory, one in C#/.NET the other in R, this alone will double the memory. It would be useful to know where the OOM error occurs, and how much memory is being used when it is run in R versus just as a .NET loop.
.

Thanks for the tip on using gc() to see how much memory R is using at the time. I'll experiment with that.

I did a similar experiment to the one you suggest, but just by removing all the R (R dot net) stuff out, but leaving my loops and ADO and SQL queries the same, and ran that. Ran fine, problem really appears to be not clearing the R memory. (despite all my gc() calls).

Here is my bugbear:

The bloody thing runs out of memory on the 9000th iteration. Where I want R data completely cleared after each one! i.e: I should have virtually no more memory used on the first iteration as the millionth!

The "clear all R memory" stuff (since restarting rEngine is forbidden) I am trying to do by this -
                    rEngine.Evaluate("rm(m)");
                    rEngine.Evaluate("rm(SCORE_FIELD)");
                    rEngine.Evaluate("rm(trans_model_dataframe)");
                    rEngine.Evaluate("rm(model_dataframe)");
                    rEngine.Evaluate("rm(m.RSQ)");
                    rEngine.Evaluate("rm(mNames)");
                    rEngine.Evaluate("rm(mCoef)");
                    rEngine.Evaluate("gc()");

                    rEngine.Evaluate("rm( list = ls( all = TRUE ) )");
                    rEngine.Evaluate("gc()");                           
                    
And also,
                GC.Collect();
                rEngine.ForceGarbageCollection();
As you can see in the code above. But no luck as yet.
Developer
Feb 13, 2014 at 10:08 AM
There is a lot to get one's head around, and I will not bet I got the sole issue, but the following lines do not look right.
     prdPxVector.SetHandleAsInvalid();
     prdPxVector.Dispose();
First, though not all that important, prdPxVector is created in the "using" statement, so no need to call Dispose: it is called anyway.
Second and importantly, calling SetHandleAsInvalid, presumably under the assumption it will help release memory, likely has the exact opposite effect:. See SafeHandle.SetHandleAsInvalid Method for the behavior.

What I think happens is that because you call SetHandleAsInvalid, when disposing of prdPxVector in C# it does not trigger the decrement of the reference count in the R world. So, when you call e.g. "rm(SCORE_FIELD)", the R garbage collector still sees at least one reference to SCORE_FIELD. And all the previously created SCORE_FIELD over the loop and probably other variables still cannot be garbage collected, hence the memory blow-out.

Hope this helps to alleviate the issue.

I'll see whether this is feasible and desirable to hide these SafeHandle methods from users.
Feb 14, 2014 at 7:07 AM
jperraud, if that works you may have saved me yet again! Excited. Will post back once I have tried.
Feb 14, 2014 at 11:34 PM
Edited Feb 14, 2014 at 11:34 PM
My (very slow) function running now, will report back.

Just a quick suggestion for the rDotNet guys - would be a great little addition to the library if rEngine had a really simple method, rEngine.ClearAllMemory();. This would run all the rm()'s and gc()'s and etc.

As it seems to be really difficult to close and restart the R engine - that might be a simple one to fill the gap?
Developer
Feb 15, 2014 at 9:37 PM
billybond,

Regarding the slow runtime, I'd be interested to hear if you have an idea of where the hotspot is, if you can identify it.

If you are using version 1.5.5 or the latest 'default' branch, there are known runtime performance issues for converting large R vectors to C#, but creating R vectors from large C# numeric arrays should be already as efficient as can be. The former issue will be resolved in the next release (currently the fix is on a branch)
Feb 16, 2014 at 2:42 AM
jperraud,

Firstly, THANK YOU, your feeling about the SetHandleAsInvalid being the problem was absolutely correct! That line did seem to be stopping my ForceGarbageCollection and gc() commands from freeing up everything.

Secondly, the reason the function takes so long is simply the number of OLS models which are being run (>100k). I've tested performance against .NET native methods such as ExtremeOptmisation - and while a bit slower, R+RDotNet is certainly in the same order of magnitude. I am using the branch which you made after you fixed up the array loading performance hit - this in fact, did improve my performance by close to two orders of magnitude, and is why I enjoying the strong performance I am today.

Thanks again,
Donald