This question already has answers here:
What is the right way to multiply data frame by vector?
(6 answers)
Most efficient way to multiply a data frame by a vector
(4 answers)
Closed 1 year ago.
I'd like to multiply the df with coefficients, here is some example data
set.seed(123)
df <- data.frame(var1=sample(0:1,10,TRUE),var2=sample(0:1,10,TRUE),var3=sample(0:1,10,TRUE) )
coef <- c(1,2,3)
df is the variables, the coefs are coeficients. I tried multiplying it by doing the following but i get this
> df*coef
var1 var2 var3
1 0 2 0
2 0 3 1
3 0 1 0
4 1 0 0
5 0 3 0
6 3 0 0
7 1 2 3
8 2 0 1
9 0 0 0
10 0 0 3
I would have expected column1 to be multiply with 1, column two multiply against value 2 etc.
Is there a way to do this multiplication? Any help greatly appreciated. Thanks
You could do:
df * coef[col(df)]
or eve
data.frame(t(t(df) * coef))
Try "df .* coef" to do a bitwise multiplication which multiplies each element by the corresponding element.
Related
I am trying to return the combinations of all the possible rows of the following data frame for n times.
test <- expand.grid(rep(list(0:1),3))
For example, now the test is a data frame of 3 columns and 8 rows as follows:
Var1 Var2 Var3
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
For example, combinations with n=2 would then provide a data frame of 6 columns and 64 rows. It is also acceptable if the result is in a list of 64 main elements where each element returns a combination of the two data frames.
I feel that I can still use expand.grid() but did not manage to use it correctly, I guess.
I have figured it out as soon as I posted the question.
I can just use the code of the test as follows:
expand.grid(rep(list(0:1),3*n))
This question already has answers here:
How to convert a table to a data frame
(5 answers)
Closed 4 years ago.
I have data like dataframe df_a, and want to have it converted to the format as in dataframe df_b.
xtabs() gives similar result, but I did not find a way to access elements as in the example code below. Accessing through xa[1,1] gives no advantage since there is a weak correlation between indexing by numbers ("1") and names ("A"). As you can see there is a sort difference in the xtabs() result, so xa[2,2]=2 and not 0 as on the df_b listing.
> df_a
ItemName Feature Amount
1 First A 2
2 First B 3
3 First A 4
4 Second C 3
5 Second C 2
6 Third D 1
7 Fourth B 2
8 Fourth D 3
9 Fourth D 2
> df_b
ItemName A B C D
1 First 6 3 0 0
2 Second 0 0 5 0
3 Third 0 0 0 1
4 Fourth 0 2 0 5
> df_b$A
[1] 6 0 0 0
> xa<-xtabs(df_a$Amount~df_a$ItemName+df_a$Feature)
> xa
df_a$Feature
df_a$ItemName A B C D
First 6 3 0 0
Fourth 0 2 0 5
Second 0 0 5 0
Third 0 0 0 1
> xa$A
Error in xa$A : $ operator is invalid for atomic vectors
There is a way of iterative conversion with for() loops, but totally inefficient in my case because my data has millions of records.
For the purpose of further processing my required output format is dataframe.
If anyone solved similar problem please share.
You can just use as.data.frame.matrix(xa)
# output
A B C D
First 6 3 0 0
Fourth 0 2 0 5
Second 0 0 5 0
Third 0 0 0 1
## or
df_b <- as.data.frame.matrix(xa)[unique(df_a$ItemName), ]
data.frame(ItemName = row.names(df_b), df_b, row.names = NULL)
# output
ItemName A B C D
1 First 6 3 0 0
2 Second 0 0 5 0
3 Third 0 0 0 1
4 Fourth 0 2 0 5
Without using xtabs you can do something like this:
df %>%
dplyr::group_by(ItemName, Feature) %>%
dplyr::summarise(Sum=sum(Amount, na.rm = T)) %>%
tidyr::spread(Feature, Sum, fill=0) %>%
as.data.frame()
This will transform as you require and it stays as a data.frame
Or, you can just as.data.frame(your_xtabs_result) and that should work too
This question already has an answer here:
Get a square matrix out of a non symetric data frame
(1 answer)
Closed 5 years ago.
I have a data frame looking like this:
ID cat1 cat2 cat3
1 cat1_A cat2_A cat3_A
2 cat1_B cat2_A cat3_B
3 cat1_B cat2_B cat3_A
I would now like to convert this to a kind of transposed table using all values in each column as new column names, and a 0/1 (presence/absence) call for the respective column name as new value:
ID cat1_A cat1_B cat2_A cat2_B cat3_A cat3_B
1 1 0 1 0 1 0
2 0 1 1 0 0 1
3 0 1 0 1 1 0
I hope it's clear what I'd like to do, not sure how to explain it in a better way. Any help would be greatly appreciated!
Thanks!
We can use mtabulate from qdapTools
res <- cbind(df1[1], mtabulate(as.data.frame(t(df1[-1]))))
row.names(res) <- NULL
res
# ID cat1_A cat2_A cat3_A cat1_B cat3_B cat2_B
#1 1 1 1 1 0 0 0
#2 2 0 1 0 1 1 0
#3 3 0 0 1 1 0 1
This question already has answers here:
Alternate, interweave or interlace two vectors
(2 answers)
Closed 10 years ago.
I am new to R language. I came to a situation where I need to fill the zero at alternate Position in the vector. for Example:
v<-c(1,2,3,4,5,6,7,8,9,10)
I need the new vector like
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10
I tried with for loop to fill the zero but I am not able to do.
I am sure there are plenty of solutions, but
as.vector(rbind(0,v))
[1] 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10
will do it.
A couple of approaches
Create a vector of zeros and then replace the correct indices
x1 <- rep(0, length(x)*2)
x1[seq(2,20,by=2)] <- x
or use Map and c
unlist(Map(c,0,x))
Just for kicks:
v <- c(1,2,3,4,5,6,7,8,9,10)
c(mapply(c, 0, v))
I have a table of several columns, with values from 1 to 8. The columns have different lenghts so I have filled them with NAs at the end. I would like to transform each column of the data so I will get something like this for each column:
1 2 3 4 5 6 7 8
0-25 1 0 0 0 0 1 0 2
25-50 5 1 2 0 0 0 0 1
50-75 12 2 2 3 0 1 1 1
75-100 3 25 1 1 1 0 0 0
where the row names are percentages of the actual length of the original column (i.e. without the NAs), the column names are the original 0 to 8 values, and the new values are the number of occurances of the original values in each percentage. Any ideas will be appreciated.
Best,
Lince
PS/ I realize that my original message was very confusing. The data I want to transform contain a number of columns from time series like this:
1
1
8
1
3
4
1
5
1
6
2
7
1
NA
NA
and I need to calculate the frequency of occurences of each value (1 to 8) at the 0-25%, 25-50% et cetera of the series. Joris' answer is very useful. I can work on it. Thanks!
Given the lack of some information, I can offer you this :
Say 0 is no occurence, and 1 is occurence. Then you can use the following little script for the results of one column. Wrap it in a function, apply it over the columns and you get what you need.
x <- c(1,0,0,1,1,0,1,0,0,0,1,0,1,1,1,NA,NA,NA,NA,NA,NA)
prop <- which(x==1) / sum(!is.na(x))*100
result <- cut(prop,breaks=c(0,25,50,75,100))
table(result)