Create matrix from dataframe in R

Create matrix from dataframe in R - r

I have dataset of following:
> iris
X5.1 X3.3 X1.7 X0.5 X.1
1 6.1 3.0 4.6 1.4 1
2 4.8 3.1 1.6 0.2 -1
3 5.0 3.4 1.5 0.2 -1
4 4.5 2.3 1.3 0.3 -1
5 5.4 3.4 1.7 0.2 -1
6 5.1 2.5 3.0 1.1 1
7 5.5 2.6 4.4 1.2 1
8 4.8 3.4 1.9 0.2 -1
9 6.5 2.8 4.6 1.5 1
10 5.4 3.0 4.5 1.5 1
11 5.8 4.0 1.2 0.2 -1
12 5.0 3.3 1.4 0.2 -1
13 7.0 3.2 4.7 1.4 1
14 5.0 3.4 1.6 0.4 -1
15 4.7 3.2 1.6 0.2 -1
16 5.0 2.3 3.3 1.0 1
17 4.4 3.0 1.3 0.2 -1
18 5.0 3.0 1.6 0.2 -1
19 4.9 3.0 1.4 0.2 -1
Now, I want to create matrix called "train.x" and it should store 10 rows and 4 columns from the given dataset. How would i do that? My solution so far is
train.x<-matrix(iris[1:70,1:4])
and it doesn't work. Any help would be appreciated thanks!!

Use this code:
as.matrix(iris[1:10, 1:4])
# X5.1 X3.3 X1.7 X0.5
#1 6.1 3.0 4.6 1.4
#2 4.8 3.1 1.6 0.2
#3 5.0 3.4 1.5 0.2
#4 4.5 2.3 1.3 0.3
#5 5.4 3.4 1.7 0.2
#6 5.1 2.5 3.0 1.1
#7 5.5 2.6 4.4 1.2
#8 4.8 3.4 1.9 0.2
#9 6.5 2.8 4.6 1.5
#10 5.4 3.0 4.5 1.5

Related

How to drop a column from a list?

Good afternoon :
Suppose i have the following list of dataframes :
[[4]]
[[4]]$L.1
Sepal.Length Sepal.Width Petal.Length Petal.Width v
1 5.1 3.5 1.4 0.2 1
5 5.0 3.6 1.4 0.2 1
6 5.4 3.9 1.7 0.4 1
11 5.4 3.7 1.5 0.2 1
16 5.7 4.4 1.5 0.4 1
19 5.7 3.8 1.7 0.3 1
20 5.1 3.8 1.5 0.3 1
21 5.4 3.4 1.7 0.2 1
[[4]]$L.2
Sepal.Length Sepal.Width Petal.Length Petal.Width v
2 4.9 3.0 1.4 0.2 2
3 4.7 3.2 1.3 0.2 2
4 4.6 3.1 1.5 0.2 2
7 4.6 3.4 1.4 0.3 2
8 5.0 3.4 1.5 0.2 2
9 4.4 2.9 1.4 0.2 2
10 4.9 3.1 1.5 0.1 2
12 4.8 3.4 1.6 0.2 2
13 4.8 3.0 1.4 0.1 2
[[4]]$L.3
Sepal.Length Sepal.Width Petal.Length Petal.Width v
15 5.8 4.0 1.2 0.2 3
17 5.4 3.9 1.3 0.4 3
136 7.7 3.0 6.1 2.3 3.
My question is how to drop the column v?
I tried without success:
lapply(L, "[", -v)
Thank you in advance for help !

Try this approach:
#Code
L <- lapply(L, function(x){x$v<-NULL;x})
Or with dplyr:
#Code 2
L <- lapply(L, function(x){x %>% dplyr::select(-v)})

L <- L[,-5] where 5 is the column number

In base R, we can use setdiff
L1 <- lapply(L, function(x) x[setdiff(names(x), 'v')])

filter list with nested dataframes in R

I have a list consisting of dataframes. The list is created by a funtion that I cannot control. Therefore, each dataframe holds more information then I need. The structure of every dataframe in the list is the same. What I need to do is to filter out rows by values of one column and write this to a new list. The list contains over 1000 dataframes of the same structure.
historical_file[1]
$daily_kl_historical_tageswerte_KL_00001_19370101_19860630_hist
STATIONS_ID MESS_DATUM QN_3 FX FM QN_4 RSK RSKF SDK SHK_TAG NM VPM PM TMK UPM TXK TNK TGK eor
1 1 1937-01-01 NA NA NA 5 0.0 0 NA 0 6.3 NA NA -0.5 NA 2.5 -1.6 NA eor
2 1 1937-01-02 NA NA NA 5 0.0 0 NA 0 3.0 NA NA 0.3 NA 5.0 -4.0 NA eor
3 1 1937-01-03 NA NA NA 5 0.0 0 NA 0 4.3 NA NA 3.2 NA 5.0 -0.2 NA eor
4 1 1937-01-04 NA NA NA 5 0.0 0 NA 0 8.0 NA NA 0.2 NA 3.8 -0.2 NA eor
5 1 1937-01-05 NA NA NA 5 0.0 0 NA 0 8.0 NA NA 1.4 NA 4.5 -0.7 NA eor
6 1 1937-01-06 NA NA NA 5 5.2 7 NA 0 6.0 NA NA 0.2 NA 2.0 -2.4 NA eor
[ reached 'max' / getOption("max.print") -- omitted 17296 rows ]
$daily_kl_historical_tageswerte_KL_00003_18910101_20110331_hist
STATIONS_ID MESS_DATUM QN_3 FX FM QN_4 RSK RSKF SDK SHK_TAG NM VPM PM TMK UPM TXK TNK TGK eor
1 3 1891-01-01 NA NA NA 5 0.0 0 NA NA 0.0 4.3 NA -3.6 88 0.5 -5.9 NA eor
2 3 1891-01-02 NA NA NA 5 0.0 0 NA NA 2.7 4.1 NA -2.8 84 0.0 -5.8 NA eor
3 3 1891-01-03 NA NA NA 5 2.5 1 NA NA 3.7 3.9 NA -0.2 69 2.1 -6.2 NA eor
4 3 1891-01-04 NA NA NA 5 8.2 1 NA NA 8.0 6.4 NA 1.8 90 3.7 0.6 NA eor
5 3 1891-01-05 NA NA NA 5 1.9 1 NA NA 7.7 4.7 NA -2.5 87 1.5 -4.2 NA eor
6 3 1891-01-06 NA NA NA 5 2.5 1 NA NA 8.0 3.5 NA -5.8 88 -4.0 -6.9 NA eor
I would like to filter every dataframe by MESS_DATUM. So on an individual dataframe I would do
historical_file_new<-historical_file%>%filter(MESS_DATUM>'2000-07-01')
How to do that on this list?

you pass your filter into lapply
library(dplyr)
l <- list(iris,iris)
lapply(l,function(x) filter(x,Species=="setosa"))
#> [[1]]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> 11 5.4 3.7 1.5 0.2 setosa
#> 12 4.8 3.4 1.6 0.2 setosa
#> 13 4.8 3.0 1.4 0.1 setosa
#> 14 4.3 3.0 1.1 0.1 setosa
#> 15 5.8 4.0 1.2 0.2 setosa
#> 16 5.7 4.4 1.5 0.4 setosa
#> 17 5.4 3.9 1.3 0.4 setosa
#> 18 5.1 3.5 1.4 0.3 setosa
#> 19 5.7 3.8 1.7 0.3 setosa
#> 20 5.1 3.8 1.5 0.3 setosa
#> 21 5.4 3.4 1.7 0.2 setosa
#> 22 5.1 3.7 1.5 0.4 setosa
#> 23 4.6 3.6 1.0 0.2 setosa
#> 24 5.1 3.3 1.7 0.5 setosa
#> 25 4.8 3.4 1.9 0.2 setosa
#> 26 5.0 3.0 1.6 0.2 setosa
#> 27 5.0 3.4 1.6 0.4 setosa
#> 28 5.2 3.5 1.5 0.2 setosa
#> 29 5.2 3.4 1.4 0.2 setosa
#> 30 4.7 3.2 1.6 0.2 setosa
#> 31 4.8 3.1 1.6 0.2 setosa
#> 32 5.4 3.4 1.5 0.4 setosa
#> 33 5.2 4.1 1.5 0.1 setosa
#> 34 5.5 4.2 1.4 0.2 setosa
#> 35 4.9 3.1 1.5 0.2 setosa
#> 36 5.0 3.2 1.2 0.2 setosa
#> 37 5.5 3.5 1.3 0.2 setosa
#> 38 4.9 3.6 1.4 0.1 setosa
#> 39 4.4 3.0 1.3 0.2 setosa
#> 40 5.1 3.4 1.5 0.2 setosa
#> 41 5.0 3.5 1.3 0.3 setosa
#> 42 4.5 2.3 1.3 0.3 setosa
#> 43 4.4 3.2 1.3 0.2 setosa
#> 44 5.0 3.5 1.6 0.6 setosa
#> 45 5.1 3.8 1.9 0.4 setosa
#> 46 4.8 3.0 1.4 0.3 setosa
#> 47 5.1 3.8 1.6 0.2 setosa
#> 48 4.6 3.2 1.4 0.2 setosa
#> 49 5.3 3.7 1.5 0.2 setosa
#> 50 5.0 3.3 1.4 0.2 setosa
#>
#> [[2]]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> 11 5.4 3.7 1.5 0.2 setosa
#> 12 4.8 3.4 1.6 0.2 setosa
#> 13 4.8 3.0 1.4 0.1 setosa
#> 14 4.3 3.0 1.1 0.1 setosa
#> 15 5.8 4.0 1.2 0.2 setosa
#> 16 5.7 4.4 1.5 0.4 setosa
#> 17 5.4 3.9 1.3 0.4 setosa
#> 18 5.1 3.5 1.4 0.3 setosa
#> 19 5.7 3.8 1.7 0.3 setosa
#> 20 5.1 3.8 1.5 0.3 setosa
#> 21 5.4 3.4 1.7 0.2 setosa
#> 22 5.1 3.7 1.5 0.4 setosa
#> 23 4.6 3.6 1.0 0.2 setosa
#> 24 5.1 3.3 1.7 0.5 setosa
#> 25 4.8 3.4 1.9 0.2 setosa
#> 26 5.0 3.0 1.6 0.2 setosa
#> 27 5.0 3.4 1.6 0.4 setosa
#> 28 5.2 3.5 1.5 0.2 setosa
#> 29 5.2 3.4 1.4 0.2 setosa
#> 30 4.7 3.2 1.6 0.2 setosa
#> 31 4.8 3.1 1.6 0.2 setosa
#> 32 5.4 3.4 1.5 0.4 setosa
#> 33 5.2 4.1 1.5 0.1 setosa
#> 34 5.5 4.2 1.4 0.2 setosa
#> 35 4.9 3.1 1.5 0.2 setosa
#> 36 5.0 3.2 1.2 0.2 setosa
#> 37 5.5 3.5 1.3 0.2 setosa
#> 38 4.9 3.6 1.4 0.1 setosa
#> 39 4.4 3.0 1.3 0.2 setosa
#> 40 5.1 3.4 1.5 0.2 setosa
#> 41 5.0 3.5 1.3 0.3 setosa
#> 42 4.5 2.3 1.3 0.3 setosa
#> 43 4.4 3.2 1.3 0.2 setosa
#> 44 5.0 3.5 1.6 0.6 setosa
#> 45 5.1 3.8 1.9 0.4 setosa
#> 46 4.8 3.0 1.4 0.3 setosa
#> 47 5.1 3.8 1.6 0.2 setosa
#> 48 4.6 3.2 1.4 0.2 setosa
#> 49 5.3 3.7 1.5 0.2 setosa
#> 50 5.0 3.3 1.4 0.2 setosa
Created on 2020-04-20 by the reprex package (v0.3.0)

Using certain plyr functions to calculate more than one thing

Let's say I have the following:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5.0 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
Is it possible to calculate more than one thing for the first column, such as min, max and mean using a certain plyr function, and doing that in a single call?
Thanks!

Moving data in dataframe

I'm trying to move the data in the data frame around. I want to move all the first values not equal to 0 to Height 1.
Example data looks like follow
Tree <- c(1:10)
height0 <- c(0,0,0,0,0,0,0,0,0,0)
height1 <- c(1.5,2.0,0.0,1.2,1.3,0.9,0.0,0.0,1.8,0.0)
height2 <- c(2.4,2.2,1.1,1.9,1.4,1.7,0.0,0.0,2.7,0.0)
height3 <- c(3.1,2.9,2.1,2.6,2.2,2.4,0.0,0.6,3.6,0.0)
height4 <- c(3.8,3.4,2.9,3.0,2.9,3.1,0.0,1.1,4.1,0.0)
height5 <- c(4.2,3.7,3.6,3.7,3.5,3.8,0.7,1.9,4.6,0.0)
height6 <- c(4.4,4.1,4.1,4.2,4.0,4.5,1.6,2.6,4.9,1.2)
height7 <- c(4.7,4.4,4.3,4.6,4.2,4.9,2.2,3.0,5.1,2.0)
df <- data.frame(Tree, height0, height1, height2, height3, height4, height5, height6, height7)
So the Data frame df looks like follow
df
Tree height0 height1 height2 height3 height4 height5 height6 height7
1 1 0 1.5 2.4 3.1 3.8 4.2 4.4 4.7
2 2 0 2.0 2.2 2.9 3.4 3.7 4.1 4.4
3 3 0 0.0 1.1 2.1 2.9 3.6 4.1 4.3
4 4 0 1.2 1.9 2.6 3.0 3.7 4.2 4.6
5 5 0 1.3 1.4 2.2 2.9 3.5 4.0 4.2
6 6 0 0.9 1.7 2.4 3.1 3.8 4.5 4.9
7 7 0 0.0 0.0 0.0 0.0 0.7 1.6 2.2
8 8 0 0.0 0.0 0.6 1.1 1.9 2.6 3.0
9 9 0 1.8 2.7 3.6 4.1 4.6 4.9 5.1
10 10 0 0.0 0.0 0.0 0.0 0.0 1.2 2.0
I'm trying to move all the first height values to height 1, as not all the trees germinated at the same time and i only want to compare the growth speed and not get false results due to germination differences.
So what my data should like like afterwards is as follow
df
Tree height0 height1 height2 height3 height4 height5 height6 height7
1 1 0 1.5 2.4 3.1 3.8 4.2 4.4 4.7
2 2 0 2.0 2.2 2.9 3.4 3.7 4.1 4.4
3 3 0 1.1 2.1 2.9 3.6 4.1 4.3
4 4 0 1.2 1.9 2.6 3.0 3.7 4.2 4.6
5 5 0 1.3 1.4 2.2 2.9 3.5 4.0 4.2
6 6 0 0.9 1.7 2.4 3.1 3.8 4.5 4.9
7 7 0 0.7 1.6 2.2
8 8 0 0.6 1.1 1.9 2.6 3.0
9 9 0 1.8 2.7 3.6 4.1 4.6 4.9 5.1
10 10 0 1.2 2.0
Is there any a way to do this?
I have over 3000 trees I measured for 40 times, and doing it manually is going to take to long
Thank you

One option would be to loop through the rows (apply with MARGIN = 1), extract the non-zero elements, pad the rest with NA using the length<-), transpose the output and assign it back.
df[-(1:2)] <- t(apply(df[-(1:2)], 1, function(x) `length<-`(x[x!=0], ncol(df)-2)))
df
# Tree height0 height1 height2 height3 height4 height5 height6 height7
#1 1 0 1.5 2.4 3.1 3.8 4.2 4.4 4.7
#2 2 0 2.0 2.2 2.9 3.4 3.7 4.1 4.4
#3 3 0 1.1 2.1 2.9 3.6 4.1 4.3 NA
#4 4 0 1.2 1.9 2.6 3.0 3.7 4.2 4.6
#5 5 0 1.3 1.4 2.2 2.9 3.5 4.0 4.2
#6 6 0 0.9 1.7 2.4 3.1 3.8 4.5 4.9
#7 7 0 0.7 1.6 2.2 NA NA NA NA
#8 8 0 0.6 1.1 1.9 2.6 3.0 NA NA
#9 9 0 1.8 2.7 3.6 4.1 4.6 4.9 5.1
#10 10 0 1.2 2.0 NA NA NA NA NA

How to subset a dataframe using R function and use that dataframe later?

I want to subset a data frame using a function as follows.
calcScore <- function(y){
t <- iris[iris$Species == y,]
return(t)
}
when I passed the value as calcScore('setosa') it gave an output as below.
> calcScore('setosa')
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
14 4.3 3.0 1.1 0.1 setosa
15 5.8 4.0 1.2 0.2 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
21 5.4 3.4 1.7 0.2 setosa
22 5.1 3.7 1.5 0.4 setosa
23 4.6 3.6 1.0 0.2 setosa
24 5.1 3.3 1.7 0.5 setosa
25 4.8 3.4 1.9 0.2 setosa
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
29 5.2 3.4 1.4 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
31 4.8 3.1 1.6 0.2 setosa
32 5.4 3.4 1.5 0.4 setosa
33 5.2 4.1 1.5 0.1 setosa
34 5.5 4.2 1.4 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
36 5.0 3.2 1.2 0.2 setosa
37 5.5 3.5 1.3 0.2 setosa
38 4.9 3.6 1.4 0.1 setosa
39 4.4 3.0 1.3 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
43 4.4 3.2 1.3 0.2 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa
47 5.1 3.8 1.6 0.2 setosa
48 4.6 3.2 1.4 0.2 setosa
49 5.3 3.7 1.5 0.2 setosa
50 5.0 3.3 1.4 0.2 setosa
But dataframe t cannot get after that. it gives the following error.
> t
standardGeneric for "t" defined from package "base"
function (x)
standardGeneric("t")
<environment: 0x11be807c>
Methods may be defined for arguments: x
Use showMethods("t") for currently available ones.
How can I write a function to subset the dataframe and it should be saved and can be able to access later?

You haven't assigned the output to anything. In other words, try something like:
mynewdf <- calcScore('setosa')

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Create matrix from dataframe in R - r

Use this code: as.matrix(iris[1:10, 1:4]) # X5.1 X3.3 X1.7 X0.5 #1 6.1 3.0 4.6 1.4 #2 4.8 3.1 1.6 0.2 #3 5.0 3.4 1.5 0.2 #4 4.5 2.3 1.3 0.3 #5 5.4 3.4 1.7 0.2 #6 5.1 2.5 3.0 1.1 #7 5.5 2.6 4.4 1.2 #8 4.8 3.4 1.9 0.2 #9 6.5 2.8 4.6 1.5 #10 5.4 3.0 4.5 1.5

Related

How to drop a column from a list?

filter list with nested dataframes in R

Using certain plyr functions to calculate more than one thing

Moving data in dataframe

How to subset a dataframe using R function and use that dataframe later?

Categories

Resources