matriisi <- matrix(c(1.3,1.4,1.2,1.9,1.9,2.9,3.0,4.2,5.1,5.5), nrow= 5, byrow = FALSE, )
colnames (matriisi) <- c("y","x")
matriisi
datat <- data.frame(matriisi)
havainnot <- datat[which(datat$x>3.0),]
havainnot
I got following results:
y x
3 1.2 4.2
4 1.9 5.1
5 1.9 5.5
How can i get only the values from the second column, which in this case is X? Why does it show both columns?
You can extract the values of the second column if you reference by name:
matriisi[, "x"]
2.9 3.0 4.2 5.1 5.5
havainnot[, "x"]
4.2 5.1 5.5`
If you want by row, you can use
matriisi[1,]
y x
1.3 2.9
havainnot[1,]
y x
3 1.2 4.2
Or you can selectt given a row and column.
havainnot[1,"x"]
4.2
havainnot[2,"x"]
5.1
Then, once you know the data you want, you can create a matrix using next syntax:
havainnot <- matrix(havainnot[2,"x"])
Related
This question already has answers here:
Calculate the difference between the largest and smallest column for each row
(2 answers)
Finding the range of cols by row
(1 answer)
Row wise operation on data.table
(3 answers)
Closed 2 years ago.
i have a task that i need to complete in R studio using R language. i'm new to this.
i have a "CSV" file with a table that consists of 80 columns and 568 rows after i sampled 80% of the original data file. now i need to add a column to the table and calculate the (max - min) of each row and that column will show the results of each row in this new data file.
data <- read.csv(file.choose(), header=T)
data
data$maxSubMin <- for(i in 1:568){
max(data[i,1:78]) - min(data[i,1:78])
}
there are no errors shown in the log, but there is no new column...
somebody knows whats the reason?
You can use row-wise apply :
data$maxSubMin <- apply(data[,1:78], 1, function(x) max(x) - min(x))
You can also take diff of range
data$maxSubMin <- apply(data[,1:78], 1, function(x) diff(range(x)))
Using rowMaxs and rowMins from matrixStats :
library(matrixStats)
data$maxSubMin <- rowMaxs(as.matrix(data[,1:78]))- rowMins(as.matrix(data[,1:78]))
The issue is that the for loop returns NULL. Hence you don't get a new column. To make your for loop work you have to do the assignment inside the loop, i.e.
for(i in 1:nrow(data)){
data$maxSubMin[i] <- max(data[i,1:ncol(data)]) - min(data[i,1:ncol(data)])
}
Nonetheless the preferred approach would be to use apply as already suggested by #RonakShah. Using the iris dataset as example data:
#data <- read.csv(file.choose(), header=T)
data <- iris[,-5]
data$maxSubMin <- apply(data, 1, function(x) max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
head(data)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width maxSubMin
#> 1 5.1 3.5 1.4 0.2 4.9
#> 2 4.9 3.0 1.4 0.2 4.7
#> 3 4.7 3.2 1.3 0.2 4.5
#> 4 4.6 3.1 1.5 0.2 4.4
#> 5 5.0 3.6 1.4 0.2 4.8
#> 6 5.4 3.9 1.7 0.4 5.0
I am the beginner of R language. I want to write different size vector into csv. Here is code:
library(igraph)
library(DirectedClustering)
my_list = readLines("F://RR//listtest.csv")
eigen <-c()
for(i in 1:length(my_list))
{
my_data <- read.csv(my_list[i],head=TRUE, row.names =1 )
my_matrix <-as.matrix(my_data)
g1 <- graph_from_adjacency_matrix(my_matrix, weighted=TRUE,diag = FALSE)
e1 <- eigen_centrality(g1,directed = TRUE)
eigen[[i]] <-e1[["vector"]]
}
df = data.frame(eigenvalue,eigen)
df
write.csv(df, "F://RR//outtest.csv")
The first question is due to different size of vector (the max is 14), the data.frame can not be used.
The second question is when i use the same size of vector to write into some csv file,it will display
like
- Vec1 Vec2 Vec3
1. 2.5 3.5 4.5
2. 1.8 1.6 1.4
3. 1.3 5.8 9.9
but i wanna it display row by row, something like:
1 2.5
2 3.5
3 4.5
4 1.8
5 1.6
6 1.4
7 1.3
8 5.8
9 9.9
I really need your help, thanks lot.
I'm trying to scale() only numeric columns IF a data.frame contains a mix of numeric and non-numeric columns of data. (Initially, I am wondering if there could be an if statement showing if a data.frame contains non-numeric data?)
Note that I want to keep the original data.frame variables, and only add the new, scaled variables with the suffix ".s" to the original data.frame.
I have tried the following. But it looks like it also populates the non-numeric column Loc in the below example?
stan <- function(data, scale = TRUE, center = TRUE, na.rm = TRUE){
data <- if(na.rm) data[complete.cases(data), ]
ind <- sapply(data, is.numeric)
data[paste0(names(data), ".s")] <- lapply(data[ind], scale)
return(as.data.frame(data))
}
# EXAMPLE:
stan(iris)
RE: your question on how to test whether your data frame has any non-numeric columns, you have a couple of ways to do this. Here's one:
all(sapply(iris, class) == "numeric")
# [1] FALSE
You can use that as your test in the if statement. It should be true exactly when scale() can produce a result.
Alternatively, you could try the offending colMeans, but that ends up being more complicated.
EDIT: since the OP accepted this as the answer, I'll add #Frank 's comment that answers the first part:
f = function(d) {ind <- sapply(d, is.numeric); d[paste0(names(d)[ind], ".s")] <- lapply(d[ind], scale); d} - Frank
Using dplyr, you can do:
library(dplyr)
iris %>%
mutate_if(is.numeric, funs(s = scale))
which will create the scaled columns with the suffix _s (no way to change this to .s as far as I know, although you can always do an additional renaming step).
Alternative solution:
data <- data.frame(iris, scale(Filter(is.numeric, setNames(iris, paste0(names(iris), ".s")))))
Returns:
> head(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.s Sepal.Width.s Petal.Length.s Petal.Width.s
1 5.1 3.5 1.4 0.2 setosa -0.8976739 1.01560199 -1.335752 -1.311052
2 4.9 3.0 1.4 0.2 setosa -1.1392005 -0.13153881 -1.335752 -1.311052
3 4.7 3.2 1.3 0.2 setosa -1.3807271 0.32731751 -1.392399 -1.311052
4 4.6 3.1 1.5 0.2 setosa -1.5014904 0.09788935 -1.279104 -1.311052
5 5.0 3.6 1.4 0.2 setosa -1.0184372 1.24503015 -1.335752 -1.311052
6 5.4 3.9 1.7 0.4 setosa -0.5353840 1.93331463 -1.165809 -1.048667
I want to combine two matrices with partly overlapping rownames in R. When the rownames match, values from the two matrices should end up as adjacent columns. When the rownames only occur in one matrix, empty space should be inserted for the other matrix.
Data set:
testm1 <- cbind("est"=c(1.5,1.2,0.7,4.0), "lci"=c(1.1,0.9,0.5,0.9), "hci"=c(2.0,1.7,0.8,9.0))
rownames(testm1) <- c("BadFood","NoActivity","NoSunlight","NoWater")
testm1 #Factors associated with becoming sick
testm2 <- cbind("est"=c(3.0,2.0,0.9,7.0), "lci"=c(1.3,1.2,0.2,2.0), "hci"=c(5.0,3.1,1.7,9.0))
rownames(testm2) <- c("BadFood","NoActivity","Genetics","Age")
testm2 #Factors associated with dying
Desired output:
Sick Dying
est lci hci est lci hci
BadFood 1.5 1.1 2.0 3.0 1.3 5.0
NoActivity 1.2 0.9 1.7 2.0 1.2 3.1
NoSunlight 0.7 0.5 0.8 - - -
NoWater 4.0 0.9 9.0 - - -
Genetics - - - 0.9 0.2 1.7
Age - - - 7.0 2.0 9.0
Is there a simple way to do this that would work for all matrices?
Here is a base R method that keeps everything in matrix form:
# get rownames of new matrix
newNames <- union(rownames(testm1), rownames(testm2))
# construct new matrix
newMat <- matrix(NA, length(newNames), 2*ncol(testm2),
dimnames=list(c(newNames), rep(colnames(testm1), 2)))
# fill in new matrix
newMat[match(rownames(testm1), newNames), 1:ncol(testm1)] <- testm1
newMat[match(rownames(testm2), newNames), (ncol(testm1)+1):ncol(newMat)] <- testm2
In the final two lines, match is used to find the proper row indices by row name.
This returns
newMat
est lci hci est lci hci
BadFood 1.5 1.1 2.0 3.0 1.3 5.0
NoActivity 1.2 0.9 1.7 2.0 1.2 3.1
NoSunlight 0.7 0.5 0.8 NA NA NA
NoWater 4.0 0.9 9.0 NA NA NA
Genetics NA NA NA 0.9 0.2 1.7
Age NA NA NA 7.0 2.0 9.0
I think this does what you are after though its not that pretty and requires the data to be a data.frame not a matrix. Hope it helps at least !
( Code was adapted from this question & answer https://stackoverflow.com/a/34530141/4651564 )
library(dplyr)
dat1 <- as.data.frame(testm1)
dat2 <- as.data.frame(testm2)
full_join( dat1 %>% mutate(Symbol = rownames(dat1) ),
dat2 %>% mutate(Symbol = rownames(dat2) ),
by = 'Symbol')
You can do it using merge() function.
First of all cast your test matrices into dataframes, then use merge on the dataframes, finally convert the result in a matrix (but do you necessarily need a matrix?).
Here's an example code:
testm1 <- as.data.frame(testm1)
testm2 <- as.data.frame(testm2)
result <- merge(testm1, testm2, by='row.names', all.x=T, all.y=T)
# all.x is needed if you want to save rows not matched in the merge process
result <- as.matrix(result)
If you want to obtain a data frame, simply omit the last line of code. Hope this helps.
I have a list consist of 23 elements with 69 rows and 13 columns each. And I need to apply calculation on multiple columns for each element of the list.
As a simple example, my list looks like this:
>list
>$`1`
> a b c
>1 2.1 1.4 3.4
>2 4.4 2.6 5.5
>3 2.6 0.4 3.0
...
>$`2`
> a b c
>70 5.1 4.9 5.1
>71 4.4 7.6 8.5
>72 2.8 3.5 6.8
...
what I wish to do is something like z = (a-b) / c
for each element ($1,$2..., $23)
I tried the following code:
for( i in 1:23) {
z = (list[[i]]$a - list[[i]]$b) / list[[i]]$c
}
which gave me only 49 values, rather than 1566 values.
Anyone have any idea what was wrong with my code and be able to correct it?
Thank you very much!
You can do it with function lapply(). Here is example assuming that in all data frame columns have the same name.
ll<-list(data.frame(a=1:3,b=4:6,c=7:9),
data.frame(a=1:3,b=6:4,c=7:9),
data.frame(a=1:3,b=4:6,c=9:7))
lapply(ll, with, (a-b)/c)
Working on #DidzisElferts answer, you can use this
lapply(ll, within, z <- (a-b)/c)
to add z as a new column on each dataframe.