Repeat function in R - r

I have a matrix 320X64 and I want to modify the 64 variables so that the first 8 are equal to 0 and the last 56 equal to 1.
I tried the repeat function :
pen.vect<-(rep(0,8),rep(1,56))
penalty.factor<-pen.vect
but it's not working
Thank you :)

You can change between matrices and data frames easily. Working with a data frame will allow you to accomplish this easier with bracket notation:
bm <- as.data.frame(B) # assuming your matrix is called "B"
bm[,1:8] <- 0
bm[,9:56] <- 1
B2 <- as.matrix(bm)
Here's a full, working example with dummy data:
B = matrix(c(2:65), nrow=320, ncol=64) # Create a matrix with dummy data
bm <- as.data.frame(B) # Change it to a data frame
bm[,1:8] <- 0 # Change each row in the first 8 columns to 0
bm[,9:56] <- 1 # Change the rest to 1
B2 <- as.matrix(bm) # Change the data back to a matrix
Also, take a look at this post for how to properly post an R question. I'm honestly shocked your question hasn't been deleted or flagged yet. R on SO can be brutal.

Related

Compare 2 vectors and add missing values from target vector in R

I am using R and I have a correct vector whose names contain all the target names (names(correct) <- c("A","B","C","D","E")) such as:
correct <- c("a","b","c","d","e")
names(correct) <- c("A","B","C","D","E")
The vector I have to modify, instead, tofix, has names that may miss some values compared to correct above, in the example below is missing "C" and "E".
tofix <- c(2,5,4)
names(tofix) <- c("A","B","D")
So I want to fix it in a way that the resulting vector, fixed, contains the same names as in correct and with the same order, and when the name is missing adds 0 as a value, like the below:
fixed <- c(2,5,0,4,0)
names(fixed) <- names(correct)
Any idea how to do this in R? I tried with multiple if statements and for loops, but time complexity was far from ideal.
Many thanks in advance.
You may try
fixed <- rep(0, length(correct))
fixed[match(names(tofix), names(correct))] <- tofix
names(fixed) <- names(correct)
fixed
A B C D E
2 5 0 4 0
unlist(modifyList(as.list(table(names(correct))*0), as.list(tofix)))
A B C D E
2 5 0 4 0

How to define default input value when merging two datasets on one column of different lengths?

Assuming I have an original version dataset containing a complete set of "texsts" (a string variable), and a second dataset that only contains those "texts" for which the new variable "value" takes a certain value (0, 1, or NA).
Now I would like to merge them back together so that the resulting dataset contains the full range of "texts" from the first dataset but also includes "value" which should be 0 if coded 0 and/or only present in the original dataset.
dat1<-data.frame(text=c("a","b","c","d","e","f","g","h")) # original dataset
dat2<-data.frame(text=c("e","f","g","h"), value=c(0,NA,1,1)) # second version
The final dataset should look like this:
> dat3
text value
1 a 0
2 b 0
3 c 0
4 d 0
5 e 0
6 f NA
7 g 1
8 h 1
However, what Base-R's merge() does is to introduce NAs where I want 0s instead:
dat3<-merge(dat1, dat2, by=c("text"), all=T)
Is there a way to define a default input for when the variable by which datasets are merged is only present in one but not the other dataset? In other words, how can I define 0 as standard input value instead of NA?
I am aware of the fact that I could temporarily change the coded NAs in the second dataset to something else to distinguish later on between "real" NAs and NAs that just get introduced, but I would really like to refrain from doing so, if there's another, cleaner way. Ideally, I would like to use merge() or plyr::join() for that purpose but couldn't find anything in the manual(s).
I know that this is not ideal too, but something to consider:
library(dplyr)
dat3 <- dplyr::left_join(dat1,dat2,all.x =T)
dat3[which(dat2$text != dat3$text),2] = 0
Or wrapping in a function to call a one-liner:
merge_NA <- function(dat1,dat2){
dat3 <- dplyr::left_join(dat1,dat2,all.x = T)
dat3[which(dat2$text != dat3$text),2] = 0
return(dat3)
}
Now, you only call:
merge_NA(dat1,dat2)

Data handling: 2 independent factors, which decide the position of a numeric value in a new data frame

I am new to Stackoverflow and to R, so I hope you can be a bit patient and excuse any formatting mistakes.
I am trying to write an R-script, which allows me to automatically analyze the raw data of a qPCR machine.
I was quite successful in cleaning up the data, but at some point I run into trouble. My goal is to consolidate the data into a comprehensive table.
The initial data frame (DF) looks something like this:
Sample Detector Value
1 A 1
1 B 2
2 A 3
3 A 2
3 B 3
3 C 1
My goal is to have a dataframe with the Sample-names as row names and Detector as column names.
A B C
1 1 2 NA
2 3 NA NA
3 2 3 1
My approach
First I took out the names of samples and detectors and saved them in vectors as factors.
detectors = summary(DF$Detector)
detectors = names(detectors)
samples = summary(DF$Sample)
samples = names(samples)
result = data.frame(matrix(NA, nrow = length(samples), ncol = length(detectors)))
colnames(result) = detectors
rownames(result) = samples
Then I subsetted the detectors into a new dataframe based on the name of the detector in the dataframe.
for (i in 1:length(detectors)){
assign(detectors[i], DF[which(DF$Detector == detectors[i]),])
}
Then I initialize an empty dataframe with the right column and row names:
result = data.frame(matrix(NA, nrow = length(samples), ncol = length(detectors)))
colnames(result) = detectors
rownames(result) = samples
So now the Problem. I have to get the values from the detector subsets into the result dataframe. Here it is important that each values finds the way to the right position in the dataframe. The issue is that there are not equally many values since some samples lack some detectors.
I tried to do the following: Iterate through the detector subsets, compare the rowname (=samplename) with each other and if it's the same write the value into the new dataframe. In case it it is not the same, it should write an NA.
for (i in 1:length(detectors)){
for (j in 1:length(get(detectors[i])$Sample)){
result[j,i] = ifelse(get(detectors[i])$Sample[j] == rownames(result[j,]), get(detectors[i])$Ct.Mean[j], NA)
}
}
The trouble is, that this stops the iteration through the detector$Sample column and it switches to the next detector. My understanding is that the comparing samples get out of sync, yielding the all following ifelse yield a NA.
I tried to circumvent it somehow by editing the ifelse(test, yes, no) NO with j=j+1 to get it back in sync, but this unfortunately didn't work.
I hope I could make my problem understandable to you!
Looking forward to hear any suggestions, or comments (also how to general improve my code ;)
We can use acast from library(reshape2) to convert from 'long' to 'wide' format.
acast(DF, Sample~Detector, value.var='Value') #returns a matrix output
# A B C
#1 1 2 NA
#2 3 NA NA
#3 2 3 1
If we need a data.frame output, use dcast.
Or use spread from library(tidyr), which will also have the 'Sample' as an additional column.
library(tidyr)
spread(DF, Detector, Value)

How to run a for-loop through a string vector of a data frame in R?

I'm trying to do something very simple: to run a loop through a vector of names and use those names in my code.
geo = c(rep("AT",3),rep("BE",3))
time = c(rep(c("1990Q1","1990Q2","1990Q3"),2))
value = c(1:6)
Data <- data.frame(geo,time,value)
My real dataset has 14 countries and 75 time periods. I would like to find a function which for example loops through the countries, then subsets them so I have the single datasets such as:
data_AT <- subset(Data, (Data$geo=="AT"))
data_BE <- subset(Data, (Data$geo=="BE"))
but with a loop and ideally with a solution I can apply to other functions as well :-)
In my mind, this should look something like this:
codes <- unique(Data$geo)
for (i in 1:length(codes))
{k <- codes[i]
data_(k) <- subset(Data, (Data$geo==k))}
however subset doesn't work like this, neither do other functions. I think my problem is that I don't know how to address the respective name which "k" has taken (e.g. "AT") as part of my code. If at all possible, I would very much appreciate an answer with a general solution of how I can run a function through a vector containing text and use each element of that vector in my code. Maybe in the direction of the apply functions? Though I'm not getting very far with that either...
Any help would be very much appreciated!
I'm using loops for simiral purposes too. Maybe it's not the fastest way, but at least I understand it -- for example, when saving plots for different subsets.
There is no need to loop through length of vector, you can loop through vector itself. For converting string to variable name, you can use assign.
geo = c(rep("AT",3),rep("BE",3))
time = c(rep(c("1990Q1","1990Q2","1990Q3"),2))
value = c(1:6)
Data <- data.frame(geo,time,value)
codes <- sort(unique(Data$geo))
for (k in codes) {
name<-paste("data", k, sep="_")
assign(name, subset(Data, (Data$geo==k)))
}
BTW, filter from package dplyr is much faster than subset!
In R, you would typically do this with a list of data.frames instead of several separate data.frames:
lst <- split(Data, Data$geo)
lst
#$AT
# geo time value
#1 AT 1990Q1 1
#2 AT 1990Q2 2
#3 AT 1990Q3 3
#
#$BE
# geo time value
#4 BE 1990Q1 4
#5 BE 1990Q2 5
#6 BE 1990Q3 6
Now you can access each element (which is a data.frame) by typing:
lst[["AT"]]
# geo time value
#1 AT 1990Q1 1
#2 AT 1990Q2 2
#3 AT 1990Q3 3
If you have a vector of country names for which you want to add +1 to the value column, you can do it like this:
cntrs <- c("BE", "AT")
lst[cntrs] <- lapply(lst[cntrs], function(x) {x$value <- x$value + 1; return(x)} )
#$BE
# geo time value
#4 BE 1990Q1 5
#5 BE 1990Q2 6
#6 BE 1990Q3 7
#
#$AT
# geo time value
#1 AT 1990Q1 2
#2 AT 1990Q2 3
#3 AT 1990Q3 4
Edit: if you really want to stick with a for loop, I recommend not to split the data into several separate data.frames but to run the loop on the whole data set like this for example:
cntrs <- "BE"
for(i in cntrs){
Data$value[Data$geo == i] <- Data$value[Data$geo == i] + 1
}

Filling matrix with data from two columns of dataframe

I am so desperated and even I am ready to lose some more rep points but I have to ask it.
(Yes, I read some threads about it).
I created a dataframe with only 2 columns I want to put to the matrix (I didn't know how to pick just 2 columns from whole data):
tbl_corel <- tbl_end[,c("diff", "abund_mean")]
In next step I created and empty matrix:
## Creating a empty matrix to check the correlation between diff and abund_mean
mat_corel <- matrix(0, ncol = 2)
colnames(mat_corel) <- c("diff", "abund_mean")
I tried to use that function to fill the matrix with the data:
mat_corel <- matrix(tbl_corel), nrow = 676,ncol = 2)
Of course I had to check manually how many rows I have in my data frame...
It doesn't work.
Tried that function as well:
mat_corel[ as.matrix(tbl_corel) ] <- 1
It doesn't work. I'd be so grateful for the help.
diff abund_mean
1 0 3444804.80
2 0 847887.02
3 0 93654.19
4 0 721692.76
5 0 382711.04
6 1 428656.66
If you want to create a matrix from your two-columns data frame, there is a more direct and simpler way : just transform you data frame as a matrix directly :
mat_corel <- as.matrix(tbl_corel)
But if you just want to compute a correlation coefficient, you can do it directly from your data frame :
cor(tbl_end$diff, tbl_end$abund_mean)

Resources