Is there a simple way to transform this dataframe into the form below? I thought I could just get the desired column and cast it to a matrix, but that didnt work.
set.seed(1)
data1<-data.frame(dv=rep(c("low","high"),3),iv1=rep(c("A","B","C"),2),freq=runif(6))
as.matrix(data1[,3],ncol=3) #this didnt work
GOAL:
# A B C
#high .28 .32 .39
#low .31 .36 .31
We can try
xtabs(freq~dv+iv1, data1)
Or
library(reshape2)
acast(data1, dv~iv1, value.var='freq')
Or
with(data1, tapply(freq, list(dv, iv1), FUN=I))
Related
This might be very simple, but I am not able to get how to fix this problem. Basically I need to calculate growth for multiple columns. So when I am dividing by a column, if it has 0 value it returns Inf
Let me take a example data set
a <- c(1,0,3,4,5)
b <- c(1,4,2,0,4)
c <- data.frame(a,b)
c$growth <- b/a-1
So if you see for the 2nd row since a is having 0 the growth is Inf. It should display 4
My original data is in data.table so any solution in data.table would help.
How can we fix this?
I don't know why you want to turn Inf to 4. In my opinion it doesn't make any sense as the growth is not 4 is Inf. However, if you still want to do that here's some code:
a <- c(1,0,3,4,5)
b <- c(1,4,2,0,4)
data <- data.frame(a,b)
data$growth <- b/a-1
data[data$growth == Inf,3] <- data[data$growth == Inf,2]
I have been getting more familiar with R and learning about long and wide data frames. I am getting decent at using dcast (and ddply), but as far as I can tell, they rely on my data being numerical. In the following example, I have:
data.frame(color=c("red","orange","blue","white"),safe=c("N","N","Y","Y"))
Basically, the old assumption that insurance companies penalized "risky colors" of cars as being less safe. I'd like a command to turn this into a wide table. Is there a flavor or syntax of dcast I'm missing that would turn the above table into
red | orange | blue | white
N | N | Y | Y
Thanks for any help.
Maybe the transpose function?
a <- data.frame(color=c("red","orange","blue","white"),safe=c("N","N","Y","Y"))
# transpose and make it a dataframe.
new_a <- data.frame(t(a), stringsAsFactors=FALSE)
# makes the column names the first row of the new dataframe
names(new_a) <- new_a[1,]
# now you can get rid of the first row.
new_a <- new_a[-1,]
> new_a
red orange blue white
safe N N Y Y
I am trying to use approx() to predict points on curves inside of ddply, but it does not seem to be working as I expect it to once it is handed to ddply.
This all works:
#Fake Data, V3 is my index variable
df<-data.frame(V1=rep(0:10,3), V2=c(exp(0:10), 2*exp(0:10), 3*exp(0:10)), V3=rep(1:3,each=11))
approxy<-function(i){
estim<-approx(x=i$V1, y=i$V2, xout=c(1.1,5.1,9.1))$y
return(data.frame(ex1=estim[1], ex5=estim[2], ex9=estim[3]))
}
approxy(df[df$V3==1,])
This does not:
ddply(df, c("V3"), fun=approxy)
It just spits the original dataframe back out. Any thoughts on this problem would be appreciated.
Your syntax is incorrect:
ddply(df, c("V3"), .fun=approxy)
gives
V3 ex1 ex5 ex9
1 1 3.185359 173.9147 9495.422
2 2 6.370719 347.8294 18990.844
3 3 9.556078 521.7442 28486.266
I want to apply a percentage calculation on certain rows (according to column criteria) of my data set. Normally I would do a (1) subset for this, (2) calculate the percentage, (3) delete the old (or previously subsetted rows) in my original data and (4) finally stack them together via rbind().
My question is there a better/faster/shorter way to do this calculation? Here some example data:
df <- data.frame(object = c("apples","tomatoes", "apples","pears" ),
Value = c(50,10,30,40))
The percentage calculation (50%) I would like to use for the subset on e.g. apples:
sub[,2] <- sub$Value * 50 /100
And the result should look like this:
object Value
1 apples 25
2 tomatoes 10
3 apples 15
4 pears 40
Thank you. Probably there is an easy way, but I didn't find online a solution so far.
Create a logical index for 'object' that are `apples' and do the calculation only the subset of 'Value' based on the 'index'.
i1 <- df$object=='apples'
df$Value[i1] <- df$Value[i1]*50/100
Or you can use ifelse
df$Value <- with(df, ifelse(object=='apples', Value*50/100, Value))
Or a more faster approach would be data.table
library(data.table)
setDT(df)[object=='apples', Value := Value*0.5]
I have 2 data frames w/ 5 columns and 100 rows each.
id price1 price2 price3 price4 price5
1 11.22 25.33 66.47 53.76 77.42
2 33.56 33.77 44.77 34.55 57.42
...
I would like to get the correlation of the corresponding rows, basically
for(i in 1:100){
cor(df1[i, 1:5], df2[i, 1:5])
}
but without using a for-loop. I'm assuming there's someway to use plyr to do it but can't seem to get it right. Any suggestions?
Depending on whether you want a cool or fast solution you can use either
diag(cor(t(df1), t(df2)))
which is cool but wasteful (because it actually computes correlations between all rows which you don't really need so they will be discarded) or
A <- as.matrix(df1)
B <- as.matrix(df2)
sapply(seq.int(dim(A)[1]), function(i) cor(A[i,], B[i,]))
which does only what you want but is a bit more to type.
I found that as.matrix is not required.
Correlations of all pairs of rows between dataframes df1 and df2:
sapply(1:nrow(df1), function(i) cor(df1[i,], df2[i,]))
and columns:
sapply(1:ncol(df1), function(i) cor(df1[,i], df2[,i]))