copying data from one data frame to other using variable in R - r

I am trying to transfer data from one data frame to other. I want to copy all 8 columns from a huge data frame to a smaller one and name the columns n1, n2, etc..
first I am trying to find the column number from which I need to copy by using this
x=as.numeric(which(colnames(old_df)=='N1_data'))
Then I am pasting it in new data frame this way
new_df[paste('N',1:8,'new',sep='')]=old_df[x:x+7]
However, when I run this, all the new 8 columns have exactly same data. However, instead if I directly use the value of x, then I get what I want like
new_df[paste('N',1:8,'new',sep='')]=old_df[10:17]
So my questions are
Why I am not able to use the variable x. I added as.numeric just to make sure it is a number not a list. However, that does not seem to help.
Is there any better or more efficient way to achieve this?

If I'm understanding your question correctly, you may be overthinking the problem.
library(dplyr);
new_df <- select(old_df, N1_data, N2_data, N3_data, N4_data,
N5_data, N6_data, N7_data, N8_data);
colnames(new_df) <- sub("N(\\d)_data", "n\\\\1", colnames(new_df));

Related

R function for identifying values from one column in another?

I have two different data frames, each of them consisting of a list of "genes" and a list of "interactors" (other genes). Is it possible with R to check if there any "genes" from one list that are also present in any of the columns of "interactors" from the other data frame, and vice-versa?
I am quite new in R, so perhaps there is an easy way to perform this, but I don't even know how to look for it.
Thanks in advance!
Guillermo.
please can you show a sample of your data?
In any case, I guess the following is what you need:
df_common<-data.frame(df[which(df$genes %in% df$interactors),])
it is checking which elements in the column "genes" in the data frame df are also present %in% the column "interactors" in the same data frame
Is it this what you are looking for? if not, please paste input and desired output

R empty data frame after subsetting by factor

I need to subset my data depending on the content of one factor variable.
I tried to do it with subset:
new <- subset(data, original$Group1=="SALAD")
data is already a subset from a bigger data frame, in original I have the factor variable which should identify the wanted rows.
This works perfectly for one level of the factor variable, but (and I really don´t understand why!!) when I do it with the other factor level "BREAD" it creates the data frame but says "no data available" - so it is empty. I´ve imported the data from SPSS, if this matters. I´ve already checked the factor levels, but the naming should be right!
Would be really grateful for help, I spent 3 hours on this problem and wasn´t able to find a solution.
I´ve also tried other ways to subset my data (e.g. split), but I want a data frame as output.
Do you have advice in general, what is the best way to subset a data frame if I want e.g. 3 columns of this data frame and these should be extracted depending on the level of a factor (most Code examples are only for one or all columns..)
The entire point of the subset function (as I understand it) is to look inside the data frame for the right variable - so you can type
subset(data, var1 == "value")
instead of
data[data$var1 == "value,]
Please correct me anyone if that is incorrect.
Now, in you're case, you are explicitly taking Group1 from the data frame original and using that to subset data - which you say is a subset of original. Based on this, I see no reason to believe (and every reason not to believe) that the elements of original$Group1 will align with the rows of data. If Group1 is defined within data, why not just use the copy defined there - which is aligned correctly? If not, you need to be very explicit about what you are trying to accomplish, so that you can ensure that things are aligned correctly.

Subset variables by name in R

I know that there are many threads called this but either the advice within hasn't worked or I haven't understood it.
I have read what was an SPSS file into R.
I cleaned some variables and added new ones.
By this point the file size is 1,000 MB.
I wanted to write it into a CSV to look at it more easily but it just stops responding - file too big I guess.
So instead I want to create a subset of only the variables I need. I tried a couple of things
(besb <- bes[, c(1, 7, 8)])
data1 <- bes[,1:8]
I also tried referring to variables by name:
nf <- c(bes$approveGov, bes$politmoney)
All these attempts return errors with number of dimensions.
Therefore could somebody please explain to me how to create a reduced subset of variables preferably using variable names?
An easy way to subset variables from a data.frame is with the dplyr package. You can select variables with their bare names. For example:
library(dplyr)
nf <- select(bes, approveGov, politmoney)
It's fast for large data frames too.

r create and address variable in for loop

I have multiple csv-files in one folder. I want to load each csv-file in this folder into one separate data frame. Next, I want to extract certain elements from this data frame into a matrix and calculate the mean of all these matrixes.
setwd("D:\\data")
group_1<-list.files()
a<-length(group_1)
mferg_mean<-data.frame
for(i in 1:a)
{
assign(paste0("mferg_",i),read.csv(group_1[i],header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
}
As there are 11 csv-files in the folder I now have the data frames
mferg_1
to
mferg_11
How can I address each data frame in this loop? As mentioned, I want to extract certain elements from each data frame to a matrix. I would imagine it something like this:
assign(paste0("mferg_matrix_",i),mferg_i[1:5,1:10])
But this obviously does not work because R does not recognize mferg_i in the loop. How can I address this data frame?
This is not something you should probably be using assign for in the first place. Working with a bunch of different data.frames in R is a mess, but working with a list of data.frames is much easier. Try reading your data with
group_1<-list.files()
mferg <- lapply(group_1, function(filename) {
read.csv(filename,header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
})
and you get each each value with mferg[[1]], mferg[[1]], etc. And then you can create a list of extractions with
mferg_matrix <- lapply(mferg, function(x) x[1:5, 1:10])
This is the more R-like way to do things.
But technically you can use get to retrieve values like you use assign to create them. For example
assign(paste0("mferg_matrix_",i),get(paste0("mferg_",i))[1:5,1:10])
but again, this is probably not a smart strategy in the long run.

Replacing Rows in a large data frame in R

I have to manually collect some rows so based on the R Cookbook, it recommended me to pre-allocate some memory for a large data frame. Say my code is
dataSize <- 500000;
shoesRead <- read.csv(file="someShoeCsv.csv", head=TRUE, sep=",");
shoes <- data.frame(size=integer(dataSize), price=double(dataSize),
cost=double(dataSize), retail=double(dataSize));
So now, I have some data about shoes which I imported via csv, and then I perform some calculation and want to insert into the data frame shoes. Let's say the someShoeCsv.csv has a column called ukSize and so
usSize <- ukSize * 1.05 #for example
My question is how do I do so? Running the code, noting now I have a usSize variable which was transformed from the ukSize column, read from the csv file:
shoes <- rbind(shoes,
data.frame("size"=usSize, "price"=price,
"cost"=cost, "retail"=retail));
adds to the already large data frame.
I have experimented with doing the list and then rbind but understand that it is tedious and so I am thinking of using this method but still to no avail.
I'm not quite sure what you're trying to do, but if you're trying to replace some of the pre-allocated rows with new data, you could do so like this:
Nreplace = length(usSize)
shoes$size[1:Nreplace] <- usSize
shoes$price[1:Nreplace] <- shoesRead$price
And so on, for the rest of the columns.
Here's some unsolicited advice. Looking at the code you've included, you reference ukSize and price etc without referencing the data frame, which makes it appear like you've done attach(shoesRead). Definitely never use attach(). If you want the price vector, for example, just do shoesRead$price. It's just a little bit more typing for the sake of much more readable code.

Resources