Is there a functional form of the assignment operator? I would like to be able to call assignment with lapply, and if that's a bad idea I'm curious anyways.
Edit:
This is a toy example, and obviously there are better ways to go about doing this:
Let's say I have a list of data.frames, dat, each corresponding to a one run of an experiment. I would like to be able to add a new column, "subject", and give it a sham-name. The way I was thinking of it was something like
lapply(1:3, function(x) assign(data.frame = dat[[x]], column="subject", value=x)
The output could either be a list of modified data frames, or the modification could be purely a side effect.
dput of list starting list
list(structure(list(V1 = c(-1.16664504687199, -0.429499924318301, 2.15470735901367, -0.287839633854442, -0.850578353982526, 0.211636723222015, -0.184714165752958, -0.773553182015158, 0.801811848828454, 1.39420292299319 ), V2 = c(-0.00828185523886259, -0.0215669898046275, 0.743065397283645, -0.0268464140141802, 0.168027242784788, -0.602901928341917, 0.0740511186398372, 0.180307494696194, 0.131160421341309, -0.924995634374182)), .Names = c("V1", "V2"), row.names = c(NA, -10L), class = "data.frame"), structure(list( V1 = c(1.81912921386885, 1.17011641727415, 0.692247839769473, 0.0323050362633069, 1.35816977313292, -0.437475434344363, -0.270255715332778, 0.96140963297774, 0.914691132220417, -1.8014509598977), V2 = c(1.45082316226241, 2.05135744606495, -0.787250759618171, 0.288104852581324, -0.376868533959846, 0.531872044490353, -0.750375220117567, -0.459592764008714, 0.991667163481123, 1.31280356980115)), .Names = c("V1", "V2" ), row.names = c(NA, -10L), class = "data.frame"), structure(list( V1 = c(0.528912899341174, 0.464615157920766, -0.184211714281637, 0.526909095449027, -0.371529800682086, -0.483772861751781, -2.02134822661341, -1.30841566046747, -0.738493559993166, -0.221463545903242), V2 = c(-1.44732101816006, -0.161730785376045, 1.06294520132753, 1.22680614207705, -0.721565979363022, -0.438309438404104, -0.0243401435910825, 0.624227513999603, 0.276605218579759, -0.965640602482051)), .Names = c("V1", "V2"), row.names = c(NA, -10L), class = "data.frame"))
Maybe I don't get it but as stated in "The Art of R programming":
Any assignment statement in which the left side is not just an
identifier (meaning a variable name) is considered a replacement
function.
and so in fact you can always translate this:
names(x) <- c("a","b","ab")
to this:
x <- "names<-"(x,value=c("a","b","ab"))
the general rule is just "function_name<-"(<object>, value = c(...))
Edit to the comment:
It works with the " too:
> x <- c(1:3)
> x
[1] 1 2 3
> names(x) <- c("a","b","ab")
> x
a b ab
1 2 3
> x
a b ab
1 2 3
> x <- c(1:3)
> x
[1] 1 2 3
> x <- "names<-"(x,value=c("a","b","ab"))
> x
a b ab
1 2 3
There is the assign function. I don't see any problems with using it but you have to be aware of what environment you want to assign to. See the help ?assign for syntax.
Read this chapter carefully to understand the ins and outs of environments in detail. http://adv-r.had.co.nz/Environments.html
Related
I have a dataframe that I have to sort in decreasing order of absolute row value without changing the actual values (some of which are negative).
To give you an example, e.g. for the 1st row, I would like to go from
-0.01189179 0.03687456 -0.12202753 to
-0.12202753 0.03687456 -0.01189179.
For the 2nd row from
-0.04220260 0.04129326 -0.07178175 to
-0.07178175 -0.04220260 0.04129326 etc.
How can I do this in R?
Many thanks!
Try this
lst <- lapply(df , \(x) order(-abs(x)))
ans <- data.frame(Map(\(x,y) x[y] , df ,lst))
output
a b
1 -0.01189179 -0.07178175
2 0.03687456 -0.04220260
3 -0.12202753 0.04129326
data
df <- structure(list(a = c(-0.12202753, 0.03687456, -0.01189179), b = c(-0.0422026,
0.04129326, -0.07178175)), row.names = c(NA, -3L), class = "data.frame")
Here is a simple approach (using #Mohamed Desouky's Data)
df <- df[nrow(df):1,]
> df
a b
3 -0.01189179 -0.07178175
2 0.03687456 0.04129326
1 -0.12202753 -0.04220260
I have a dataframe where I want to delete all rows with specific pattern. I am confused with compiling a regular expression.
Data:
structure(list(id = 1:5, email = c("1#gmail.com", "2#gmail.com",
"3#gmail.com", "4#pattern.com", "5#pattern.com")), class = "data.frame", row.names = c(NA,
-5L))
What I am trying to do is:
data <- data %>%
filter(email != ".+#pattern.com")
But something is wrong with my regex. What is the most effective way to compose a regular expression for such patterns? What is the proper regex pattern for my sample case?
This uses grepl to perform a regex comparison
libary(dplyr)
data %>%
filter(!grepl("#pattern.com$", email))
id email
1 1 1#gmail.com
2 2 2#gmail.com
3 3 3#gmail.com
In base R you can remove the rows in which the pattern #pattern.com is detected by the function greplin the email column:
data[-which(grepl("#pattern.com", data$email)),]
id email
1 1 1#gmail.com
2 2 2#gmail.com
3 3 3#gmail.com
Data:
data <- structure(list(id = 1:5, email = c("1#gmail.com", "2#gmail.com",
"3#gmail.com", "4#pattern.com", "5#pattern.com")), class = "data.frame", row.names = c(NA,
I need to take the following data frame and create a 3x3 matrix with all pairwise products of the prop variable. Here is the data I am starting with...
> example
Parasite prop
1 Hel_1.1 0.06818182
2 Hel_11 0.18181818
3 Hel_13 0.02272727
> dput(example)
structure(list(Parasite = structure(1:3, .Label = c("Hel_1.1",
"Hel_11", "Hel_13", "Hel_14", "Hel_2", "Hel_3", "Hel_4", "Hel_4.5",
"Hel_5", "Hel_6", "Hel_7", "Hel_9", "Pro_1", "Pro_2", "Hel_1.4"
), class = "factor"), prop = c(0.0681818181818182, 0.181818181818182,
0.0227272727272727)), .Names = c("Parasite", "prop"), row.names = c(NA,
3L), class = "data.frame")
I would like to obtain a matrix that looks like this (The pairwise product values are a little off because I computed them by hand without rounding uniformly)
Hel_1.1 Hel_11 Hel_13
Hel_1.1 .0046 .0122 .0015
Hel_11 .0122 .0324 .0039
Hel_13 .0015 .0039 .0004
I would appreciate any help.
You can try this:
prop <- example$prop
names(prop) <- example$Parasite
prop %o% prop
# Hel_1.1 Hel_11 Hel_13
#Hel_1.1 0.004648760 0.012396694 0.0015495868
#Hel_11 0.012396694 0.033057851 0.0041322314
#Hel_13 0.001549587 0.004132231 0.0005165289
If have two csv data frames data1 and data2 of dimension/size n1*n2 and m1*m2. I would like to create a new data frame consisting of differences: If (and only if)
data1[i,1] = data2[j,1] & data1[i,3] = data2[j,3]
then I want to consider
difference[i,z] <- abs(data1[i,x]-data2[i,y])
Is it possible to this in a simple manner, for instance using for/if?
difference <- matrix(nrow = max{n1,m1}, ncol = 3)
for (i in 1:n1) {
for (j in 1:m1) {
if(data1[i,1] == data2[j,1] & data1[i,3] == data2[j,3]){
difference[i,1] = data1[i,1]
difference[i,2] = data1[i,3]
difference[i,3] = data1[i,6]-data2[j,7]
}
}
This code is obviously far from being complete and I have several issues:
(1) I don't know if it is realizable using for loops/if conditional. If yes, being unfamiliar with R, I'm not sure if I need to put a 'print(something)' at the end of the loops.
(2) data1/2[i,1] is of type character. Hence I'm not sure if
data1[i,1] == data2[j,1] & data1[i,3] == data2[j,3]
is well-defined.
(3) The 'difference' matrix/frame should have as many rows as the number of i's and j's where
data1[i,1] = data2[j,1] & data1[i,3] = data2[j,3]
I do not know what this number is. Therefore I cannot really specify the size of 'difference'.
EDIT:
data1 = read.csv("path/to/data1.csv") ## Prices of 157 products each at
## 122 time points; (column1=Product, column3=date, column7=price)
data2 = read.csv("path/to/data2.csv") ## Prices of 118 products each at
## 122 time points; (column1=Product, column3=date, column6=price)
## the 122 time points are the same for both frames
## But: data1 contains some products data2 doesn't and vice versa
## I want to compare prices of the same products at the same time
So far, I've done it manually for product X1:
priceX1 = as.data.frame(data1[c(1,122),7])
priceX2 = as.data.frame(data2[c(5,126),6]) ## Product X2 starts at row 5
differenceX1 <- abs(priceX1 - priceX2)
The problem is I'd have to repeat this for all products contained in both data1 and data2.
RE-EDIT: dput(data1) returns
...), class = "factor"),
COMMENT = c(NA, ..., NA)), .Names = c("PRODUCT", "QUALIFIER_I",
"DATE", "QUALIFIER_II", "QUOTATION_DATE", "PROD_DATE", "PRICE",
"TYPE", "ID", "COMMENT"), row.names = c(NA, 14400L), class
= "data.frame")
"..." stands for me omitting a long list of products that couldn't fit here.
dput(data2) returns
..., NA, NA, NA)), .Names = c("PRODUCT", "QUALIFIER_II",
"DATE", "QUALIFIER_I", "Data2_source", "PRICE"), row.names = c(NA,
19161L), class = "data.frame")
"..." stand for me omitting a huge list of prices that couldn't fit in here.
You can find all pairs (i,j) which satisfy your condition by merging the two data.frames:
differences = merge(data1, data2, by=c('PRODUCT','DATE'))
This avoids for-loops entirely, and you can easily define the new column:
differences$Diff = abs(differences$PRICE.x - differences$PRICE.y)
I have two data frames.
First one called : sentence
structure(list(Text = c("This is a pen", "this is a sword", "pen is mightier than a sword"
)), .Names = "Text", row.names = c(NA, -3L), class = "data.frame")
which looks like:
Text
1 This is a pen
2 this is a sword
3 pen is mightier than a sword
Second one called : words
structure(list(wordvec = c("pen", "sword"), value = c(1, 2)), .Names = c("wordvec",
"value"), row.names = c(NA, -2L), class = "data.frame")
which looks like:
wordvec value
1 pen 1
2 sword 2
I have to search for words present in wordvec in sentence, and if they are present i have to return the sum of words.
Desired output is as follows:
Text Value
1 This is a pen 1
2 this is a sword 2
3 pen is mightier than a sword 3
I first tried extracting the words present in sentence$Text matching with words$wordvec and made a vector. This I successfully did.
library(stringi)
sentence$words <- sapply(stri_extract_all(sentence[[1]],regex='(#?)\\w+'),function(x) paste(x[x %in% words[[1]]],collapse=','))
As a next step i tried getting the sum of words present and create a vector sentence$value. I tried the following code
sentence$value <- sum(words$value)[match(sentence$words, words$wordvec)]
We paste the 'wordvec' as a single string, then extract the words from the 'Text' column that matches the pattern in a list, match with the 'wordvec' vector to get the position, based on that we get the corresponding 'value' from the 'words' and then we do the sum.
library(stringr)
sapply(str_extract_all(sentence$Text,
paste0('\\b(',paste(words$wordvec, collapse='|'), ')\\b')),
function(x) sum(words$value[match(x, words$wordvec)]))
#[1] 1 2 3
Another option is using strsplit after converting the 'sentence' data.frame to data.table (setDT(sentence,..)), match the vector of split words with 'wordvec', get the corresponding 'value' and do the sum.
library(data.table)
setDT(sentence, keep.rownames=TRUE)[,
sum(words$value[match(strsplit(Text, '\\s')[[1]],
words$wordvec, nomatch=0)]), by = rn]$V1
#[1] 1 2 3
Here is another simple solution using the for loop. However performance might be an issue. Your dataframe:
sentence<-structure(list(Text = c("This is a pen", "this is a sword", "pen is mightier than a sword"
)), .Names = "Text", row.names = c(NA, -3L), class = "data.frame")
words<-structure(list(wordvec = c("pen", "sword"), value = c(1, 2)), .Names = c("wordvec",
"value"), row.names = c(NA, -2L), class = "data.frame")
Create an empty dataframe with nrow as the number of counts of each word from wordvec.
a<-data.frame(matrix(0, ncol=1, nrow=nrow(sentence)))
Now using the for loop, go through every word in words and find it in sentence by using str_count from stringr. Using cbind you can store the number of times the word has been repeated in a dataframe for future reference. In this case a
for (i in 1:nrow(words))
a<-cbind(a,data.frame(count=str_count(sentence$Text,words$wordvec[i]))*words$value[i])
Now just add the sum of the rows by using rowSums
data.frame(Text=sentence$Text,Value=rowSums(a))
and you will get:
Text Value
1 This is a pen 1
2 this is a sword 2
3 pen is mightier than a sword 3
>
Try it :)