Show memory usage of R objects

July 18, 2011 by · Leave a Comment
Filed under: Uncategorized 

To show the sizes of objects in memory (in decreasing order of size):

objectsizes <- sapply(ls(), function(x) {object.size(eval(as.name(x)))})
objectsizes[order(objectsizes, decreasing = TRUE)]

R by command for multiple tables

June 21, 2011 by · Leave a Comment
Filed under: Notes 

If you want to summarize a breakdown of counts by various groups in a single data frame:

x <- as.integer(runif(1000)*5)
y <- rep(c("A", "B", "C", "D"), 250)
do.call(rbind, by(x, list(y), function(g) {table(g)}))

results in:
'  0  1  2  3  4
A 54 45 45 51 55
B 52 51 56 42 49
C 60 50 49 43 48
D 51 51 51 50 47

R Optimization

October 26, 2010 by · Leave a Comment
Filed under: Notes 

Check eigenvalues of hessian of optimized sum of squares to check for singular gradient matrix.  A singular gradient matrix has infinite solutions, so the best you can do is the optimized set of values plus a linearly scaled vector.  The vector is equal to the eigenvector associated with the eigenvalue that is numerically indistinguishable from zero.

So if you have

optimfunc <- function(x, data, sevcalcfunc, optimgoal,  ...) {
calcsev <- sevcalcfunc(x, data, ...)
sumsqerr <- sum((optimgoal - calcsev)^2)
return(sumsqerr)
}

and some function, then you can optimize
system.time(optimsimple <- optim(c(.3, -12614.716, .1), optimfunc, hessian = TRUE,data = reg.data, sevcalcfunc = calclosssimple, optimgoal = reg.data$maxloss))

and
eigen(optimsimple$hessian)
$values
[1] 1.498617e+16 1.868481e+14 8.412726e+04

Since the 3rd eigenvalue is very small compared to the first two, adding any constant z times the 3rd eigenvector to the optimized solution doesn't really change the value of the optimized sum of squares, and can therefore be considered a solution as well.

eigen(optimsimple$hessian)$vectors[,3]
[1] 1.956755e-06 1.000000e+00 4.573782e-06

> optimfunc(optimsimple$par, reg.data, calclosssimple, reg.data$maxloss)
[1] 3.793891e+14
> optimfunc(optimsimple$par + 1000 * eigen(optimsimple$hessian)$vectors[,3], reg.data, calclosssimple, reg.data$maxloss)
[1] 3.794212e+14
> optimfunc(optimsimple$par - 1000 * eigen(optimsimple$hessian)$vectors[,3], reg.data, calclosssimple, reg.data$maxloss)
[1] 3.794211e+14

Clear Memory Cache

June 28, 2010 by · Leave a Comment
Filed under: Notes 

Clear memory cache:

sync; echo 3 > /proc/sys/vm/drop_caches

Thanks to this site

Compile R with ATLAS

June 17, 2010 by · Leave a Comment
Filed under: Notes 

First, install yum-utils so you can run yum-builddeps:

yum install yum-utils

Then build all R dependencies with:
yum-builddep R

Add EPEL to your code repositories list, yum install atlas and wget the latest R code
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum install atlas-devel.x86_64
wget http://cran.r-project.org/src/base/R-2/R-2.11.1.tar.gz

Gunzip and tar into /usr/local/lib64, then compile R from inside R-version/

./configure --with-blas="-L/usr/lib64/atlas -lptf77blas -lpthread -latlas" --enable-R-shlib --enable-BLAS-shlib
make
make check
make install

And test:
set.seed(10)
x <- matrix(runif(1000000), 1000,1000)
y <- matrix(runif(1000000), 1000,1000)
system.time(z <- x %*% y)
#base
# user system elapsed
# 4.036 0.008 4.039
#atlas
# user system elapsed
# 0.428 0.020 0.457
set.seed(10)
x <- matrix(runif(10000000), 1000,10000)
y <- matrix(runif(10000000), 10000,1000)
system.time(z <- x %*% y)
#base
# user system elapsed
# 39.302 0.016 39.309
#atlas
# user system elapsed
# 2.840 0.028 2.873

R match

June 2, 2010 by · Leave a Comment
Filed under: Notes 

> a <- 1:10 > b <- 4:6 > b[match(a, b)]
[1] NA NA NA 4 5 6 NA NA NA NA

R ggplot2 change scale to log

June 2, 2010 by · Leave a Comment
Filed under: Notes 

Use “trans” argument in scale_[type]_gradient function.

E.g: for scale_fill_gradient
ggplot(data = cleandatazip) + geom_boxplot(aes(x = State, y = losssevresid, color = beltindicator, fill = liqinstate)) + scale_fill_gradient(low = 1, high = 0,trans = "log10")

R Cointegration Johansen Method

May 26, 2010 by · Leave a Comment
Filed under: Notes 

Johansen’s method of ML estimation of cointegrated variables:

Assume the rank model (slightly different from Error Correction Model):

y_t = \Pi y_{t-1} + \Gamma \Delta y_{t-1} + e_t

From package urca:

x <- ts(diffinv(matrix(rnorm(2000),1000,2))) # no cointegration
adf.test(x[,1])
Augmented Dickey-Fuller Test
data: x[, 1]
Dickey-Fuller = -2.8452, Lag order = 9, p-value = 0.2205 # fail to reject null hypothesis that x[,1] is stationary; i.e. x[,1] is non-stationary
alternative hypothesis: stationary
Augmented Dickey-Fuller Test
data: x[, 2]
Dickey-Fuller = -2.3174, Lag order = 9, p-value = 0.444
alternative hypothesis: stationary

So x[,1] and x[,2] both have unit roots

summary(lm(x[,1] ~ x[,2]))
Call:
lm(formula = x[, 1] ~ x[, 2])
Residuals:
Min 1Q Median 3Q Max
-33.113 -9.985 1.864 9.154 26.183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.91962 0.89654 -9.949 < 2e-16 ***
x[, 2] 0.25553 0.03756 6.803 1.76e-11 ***
---
Signif. codes: 0 $-1òø***òù 0.001 òø**òù 0.01 òø*òù 0.05 òø.òù 0.1 òø òù 1
Residual standard error: 12.54 on 999 degrees of freedom
Multiple R-squared: 0.04428, Adjusted R-squared: 0.04332
F-statistic: 46.28 on 1 and 999 DF, p-value: 1.759e-11
summary(ca.jo(x))
######################
# Johansen-Procedure #
######################
Test type: maximal eigenvalue statistic (lambda max) , with linear trend
Eigenvalues (lambda):
[1] 0.012383727 0.002369522
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r r = 0 | 12.45 12.91 14.90 19.19
Eigenvectors, normalised to first column:
(These are the cointegration relations)
Series.1.l2 Series.2.l2
Series.1.l2 1.0000000 1.0000000
Series.2.l2 -0.4589628 0.4131471
Weights W:
(This is the loading matrix)
Series.1.l2 Series.2.l2
Series.1.d -0.007543197 -0.004558537
Series.2.d 0.010349953 -0.003509216

Analysis on cointegrated data series

>      x <- diffinv(rnorm(1000))
>      y <- 2.0-3.0*x+rnorm(x,sd=5)
>      z <- ts(cbind(x,y))  # cointegrated
adf.test(z[,1])
Augmented Dickey-Fuller Test
data:  z[, 1]
Dickey-Fuller = -2.278, Lag order = 9, p-value = 0.4606<- Fail to reject null hypothesis that z[,1] has a unit root
alternative hypothesis: stationary
adf.test(z[,2])
Augmented Dickey-Fuller Test
data:  z[, 2]
Dickey-Fuller = -2.1756, Lag order = 9, p-value = 0.504<- Fail to reject null hypothesis that z[,1] has a unit root
alternative hypothesis: stationary
summary(lm(z[,2]~z[,1]))
Call:
lm(formula = z[, 2] ~ z[, 1])
Residuals:
Min       1Q   Median       3Q      Max
-14.7541  -3.6144   0.2952   3.3238  15.9594
Coefficients:
Estimate Std. Error  t value Pr(>|t|)
(Intercept)  1.49289    0.28898    5.166 2.88e-07 ***
z[, 1]      -2.95955    0.02649 -111.707  < 2e-16 *** <- this regression is correct, but it could be spurious, like above!

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Residual standard error: 4.912 on 999 degrees of freedom
Multiple R-squared: 0.9259,    Adjusted R-squared: 0.9258
F-statistic: 1.248e+04 on 1 and 999 DF,  p-value: < 2.2e-16
> testyes <- ca.jo(z)
> summary(testyes)
######################
# Johansen-Procedure #
######################
Test type: maximal eigenvalue statistic (lambda max) , with linear trend
Eigenvalues (lambda):
[1] 0.316635270 0.004815587
Values of teststatistic and critical values of test:
test 10pct  5pct  1pct
r <= 1 |   4.82 6.50  8.18 11.65 <- Accept null hypothesis that r <= 1; since r != 0, r = 1
r = 0  | 380.35 12.91 14.90 19.19 <- Definitely reject null hypothesis that r = 0
Eigenvectors, normalised to first column:
(These are the cointegration relations)
x.l2         y.l2
x.l2 1.000000  1.000000000
y.l2 0.338318 -0.002723256
Weights W:
(This is the loading matrix)
x.l2        y.l2
x.d -0.00931742 -0.01223872
y.d -2.79267081  0.03848098
# t(Beta) %*% z is trend stationary:
kpss.test(z %*% testyes@V[,1], “Trend”)
KPSS Test for Trend Stationarity
data:  z %*% testyes@V[, 1]
KPSS Trend = 0.0499, Truncation lag parameter = 7, p-value = 0.1 <- Fail to reject null hypothesis that z %*% testyes@V[, 1] is trend stationary
Warning message:
In kpss.test(z %*% testyes@V[, 1], “Trend”) :
p-value greater than printed p-value
# PI using just 1st eigenvector (only first is significant):
testyes@W[,1] %*% t(testyes@V[,1])
x.l2         y.l2
[1,] -0.00931742 -0.003152251
[2,] -2.79267081 -0.944810826
# looking at first eigenvector (cointegrating matrix, since r =1),we see the 3:1 ratio between x and y:
> testyes@V[,1]
x.l2     y.l2
1.000000 0.338318

so 1.000000 * x_t + 0.338318 * y_t = u_t

where u_t is trend stationary

In this case,we get

y_t = -(1/0.338318) * x_t + u_t,

y_t = -2.955799 * x_t + u_t,

which is very close to generating formula:

y_t = -3.0 * x_t + 2 + e_t

And looking at GAMMA we see the intercepts:

testyes@GAMMA
constant       x.dl1        y.dl1
x.d 0.1105044 -0.06585214 -0.004777821
y.d 1.0494639 -2.97460892 -0.973192833

Note that the y intercept should be 2, and when we run with a larger number of data points (10,000) that number gets better:

constant       x.dl1        y.dl1
x.d -0.006742906  0.01035171  0.001922519
y.d  2.076755784 -3.12882828 -1.023600506

R Cointegration

May 26, 2010 by · Leave a Comment
Filed under: Notes 

From package tseries:

from po.test help:

x <- ts(diffinv(matrix(rnorm(2000),1000,2)))
po.test(x)
Phillips-Ouliaris Cointegration Test
data: x
Phillips-Ouliaris demeaned = -13.5288, Truncation lag parameter = 10,
p-value = 0.15
Warning message:
In po.test(x) : p-value greater than printed p-value: Fail to reject null hypothesis that there is no cointegration, i.e., not cointegrated
x <- diffinv(rnorm(1000)) y <- 2.0-3.0*x+rnorm(x,sd=5) z <- ts(cbind(x,y)) # cointegrated po.test(z) Phillips-Ouliaris Cointegration Test data: z Phillips-Ouliaris demeaned = -933.6705, Truncation lag parameter = 10, p-value = 0.01 Warning message: In po.test(z) : p-value smaller than printed p-value Reject null hypothesis that there is no cointegration, i.e., cointegrated

R Stationarity Test

May 26, 2010 by · Leave a Comment
Filed under: Notes 

From kpss.test help

package:tseries

Level Stationary Example
x <- rnorm(1000) # is level stationary
kpss.test(x)
KPSS Test for Level Stationarity
data: x
KPSS Level = 0.0319, Truncation lag parameter = 7, p-value = 0.1 <- fail to reject null hypothesis that x is stationary
Warning message:
In kpss.test(x) : p-value greater than printed p-value

Not Stationary Example
y <- cumsum(x) # has unit root
kpss.test(y)
KPSS Test for Level Stationarity
data: y
KPSS Level = 9.4899, Truncation lag parameter = 7, p-value = 0.01 <- Safe to reject null hypothesis that x is stationary
Warning message:
In kpss.test(y) : p-value smaller than printed p-value

Trend Stationary Example
x <- 0.3*(1:1000)+rnorm(1000) # is trend stationary
kpss.test(x, null = "Trend")
KPSS Test for Trend Stationarity
data: x
KPSS Trend = 0.1068, Truncation lag parameter = 7, p-value = 0.1 <- Fail to reject null hypothesis that x is trend stationary
Warning message:
In kpss.test(x, null = "Trend") : p-value greater than printed p-value

Next Page »