I have a character array dat which I want to convert to a data frame df but it is not working
head(dat)
[1] " 1931 1 5.0 0.6 11 78.4 43.4"
[2] " 1931 2 6.7 0.7 7 48.9 63.6"
[3] " 1931 4 10.4 3.1 3 44.6 110.1"
[4] " 1931 5 13.2 6.1 1 63.7 167.4"
[5] " 1931 6 15.4 8.0 0 87.8 150.3"
[6] " 1931 7 17.3 10.6 0 121.4 111.2"
> df<-as.data.frame(dat)
> head(df)
dat
1 1931 1 5.0 0.6 11 78.4 43.4
2 1931 2 6.7 0.7 7 48.9 63.6
3 1931 4 10.4 3.1 3 44.6 110.1
4 1931 5 13.2 6.1 1 63.7 167.4
5 1931 6 15.4 8.0 0 87.8 150.3
6 1931 7 17.3 10.6 0 121.4 111.2
df[,c(3)]
Error in [.data.frame(df, , c(3)) : undefined columns selected
Reading with read.table: You can rename as desired.
df<-read.table(text = " dat
1 1931 1 5.0 0.6 11 78.4 43.4
2 1931 2 6.7 0.7 7 48.9 63.6
3 1931 4 10.4 3.1 3 44.6 110.1
4 1931 5 13.2 6.1 1 63.7 167.4
5 1931 6 15.4 8.0 0 87.8 150.3
6 1931 7 17.3 10.6 0 121.4 111.2",
header=F,fill=T,as.is=T,skip = 1)
df[3]
V3
1 1
2 2
3 4
4 5
5 6
6 7
If dat is as shown reproducibly in the Note at the end then as.data.frame(dat) creates a data frame with one column called dat and then when there is an attempt to take the 3rd column an error results since there is only one column.
Instead, use read.table and get the third column like this. Omit the comma if you want a data frame result.
read.table(text = dat)[, 3]
## [1] 5.0 6.7 10.4 13.2 15.4 17.3
Note
dat <- c(" 1931 1 5.0 0.6 11 78.4 43.4",
" 1931 2 6.7 0.7 7 48.9 63.6",
" 1931 4 10.4 3.1 3 44.6 110.1",
" 1931 5 13.2 6.1 1 63.7 167.4",
" 1931 6 15.4 8.0 0 87.8 150.3",
" 1931 7 17.3 10.6 0 121.4 111.2")
Here's a tidyverse approach:
dat <- c(" 1931 1 5.0 0.6 11 78.4 43.4",
" 1931 2 6.7 0.7 7 48.9 63.6",
" 1931 4 10.4 3.1 3 44.6 110.1",
" 1931 5 13.2 6.1 1 63.7 167.4",
" 1931 6 15.4 8.0 0 87.8 150.3",
" 1931 7 17.3 10.6 0 121.4 111.2")
library(tidyverse)
str_trim(dat) %>% # trim leading space
tibble(x = .) %>% # put into tibble (data.frame)
separate(x, # separate x into 7 columns, named below
into = c("year","v1","v2","v3","v4","v5","v6"),
sep = "[ ]{1,}") # separate by one or more spaces ("[ ]{1,}")
That leads to:
# A tibble: 6 x 7
year v1 v2 v3 v4 v5 v6
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1931 1 5.0 0.6 11 78.4 43.4
2 1931 2 6.7 0.7 7 48.9 63.6
3 1931 4 10.4 3.1 3 44.6 110.1
4 1931 5 13.2 6.1 1 63.7 167.4
5 1931 6 15.4 8.0 0 87.8 150.3
6 1931 7 17.3 10.6 0 121.4 111.2
Related
I am trying to merge 2 data frames.
The main dataset, df1, contains numerical data in wide format - each row represents a date, each column contains the value for that date in a given city.
df2 contains metadata for each city: latitude, longitude, and elevation.
What I wish to do is add the metadata for each city to df1, but I was unsuccessful in doing so as the data frames don't match up in structure.
df1
Date Machrihanish High_Wycombe Camborne Dun_Fell Plymouth
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 20200101 8.5 6.9 9.6 3.3 9.9
2 20200102 11.7 9.1 11.2 5 10.9
3 20200103 9.1 9.9 11.2 5.1 11.1
4 20200104 9.2 8.1 9.4 2.2 9.4
5 20200105 11.7 7.6 9 4.3 9.3
6 20200106 10.8 8 11.6 3.7 10.6
7 20200107 14.7 11.7 12 6.7 11.5
8 20200108 11.2 11.8 11.6 6.2 11.3
9 20200109 7 12 11.6 -0.2 11.5
10 20200110 9.3 7.4 10 0 10.1
df2
Location Longitude Latitude Elevation
<chr> <dbl> <dbl> <dbl>
1 Machrihanish -5.70 55.4 10
2 High_Wycombe -0.807 51.7 204
3 Camborne -5.33 50.2 87
4 Dun_Fell -2.45 54.7 847
5 Plymouth -4.12 50.4 50
Here is a solution that tidies the data to long format by location and day, and merges the lat / long information.
Using data provided in the original post, we read it into two data frames.
tempText <- "rowId Date Machrihanish High_Wycombe Camborne Dun_Fell Plymouth
1 20200101 8.5 6.9 9.6 3.3 9.9
2 20200102 11.7 9.1 11.2 5 10.9
3 20200103 9.1 9.9 11.2 5.1 11.1
4 20200104 9.2 8.1 9.4 2.2 9.4
5 20200105 11.7 7.6 9 4.3 9.3
6 20200106 10.8 8 11.6 3.7 10.6
7 20200107 14.7 11.7 12 6.7 11.5
8 20200108 11.2 11.8 11.6 6.2 11.3
9 20200109 7 12 11.6 -0.2 11.5
10 20200110 9.3 7.4 10 0 10.1"
library(tidyr)
library(dplyr)
temps <- read.table(text = tempText,header = TRUE)
latLongs <-"rowId Location Longitude Latitude Elevation
1 Machrihanish -5.70 55.4 10
2 High_Wycombe -0.807 51.7 204
3 Camborne -5.33 50.2 87
4 Dun_Fell -2.45 54.7 847
5 Plymouth -4.12 50.4 50"
latLongs <- read.table(text = latLongs,header = TRUE)
Next, we use tidyr::pivot_longer() to generate long format data, and then merge it with the lat long data via dplyr::full_join(). Note that we set the name of the column where the wide format column names are stored with names_to = "Location" so that full_join() uses Location to join the two data frames.
temps %>%
select(-rowId) %>%
pivot_longer(.,Machrihanish:Plymouth,names_to = "Location", values_to="MaxTemp") %>%
full_join(.,latLongs) %>% select(-rowId) -> joinedData
head(joinedData)
...and the first few rows of joined output looks like this:
> head(joinedData)
# A tibble: 6 × 6
Date Location MaxTemp Longitude Latitude Elevation
<int> <chr> <dbl> <dbl> <dbl> <int>
1 20200101 Machrihanish 8.5 -5.7 55.4 10
2 20200101 High_Wycombe 6.9 -0.807 51.7 204
3 20200101 Camborne 9.6 -5.33 50.2 87
4 20200101 Dun_Fell 3.3 -2.45 54.7 847
5 20200101 Plymouth 9.9 -4.12 50.4 50
6 20200102 Machrihanish 11.7 -5.7 55.4 10
>
I have a dataframe (df) with 3 columns (V1, V2 and V3). I would like to add a column (V4), in which I enter (for each row) the value of V3 from another row in which the value of V1+0.5 equals V1 of row i AND in which the value of V2+0.5 equals V2 of row i.
For the rows where this condition is not met, I want an NA in the column of V4.
V1 <- c(-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,3,3.5)
V2 <- c(14,14.5,15,15.5,1,1.5,2,2.5,8,8.5,9,9.5,10)
V3 <- c(42,42.1,42.2,42.3,42.4,42.5,42.6,42.7,42.8,42.9,43,43.1,43.2)
df <- data.frame(V1,V2,V3)
For this input data:
V1 V2 V3
-2.5 14 42
-2 14.5 42.1
-1.5 15 42.2
-1 15.5 42.3
-0.5 1 42.4
0 1.5 42.5
0.5 2 42.6
1 2.5 42.7
1.5 8 42.8
2 8.5 42.9
2.5 9 43
3 9.5 43.1
3.5 10 43.2
My desired result would be:
V1 V2 V3 V4
-2.5 14 42 NA
-2 14.5 42.1 42
-1.5 15 42.2 42.1
-1 15.5 42.3 42.2
-0.5 1 42.4 NA
0 1.5 42.5 42.4
0.5 2 42.6 42.5
1 2.5 42.7 42.6
1.5 8 42.8 NA
2 8.5 42.9 42.8
2.5 9 43 42.9
3 9.5 43.1 43
3.5 10 43.2 43.1
I figured I could use a for-loop and an ifelse statement (using is.na for the NA values), but I do not know how to refer to the rows using something that looks like df$V1(of row i) == df$V1+0.5 (of row x) (and the same for V2).
Another dplyr solution would be with using ifelse:
library(dplyr)
df1 %>%
mutate(V4 = ifelse(V1 == lag(V1) + 0.5 & V2 == lag(V2) + 0.5, lag(V3), NA))
#> V1 V2 V3 V4
#> 1 -2.5 14.0 42.0 NA
#> 2 -2.0 14.5 42.1 42.0
#> 3 -1.5 15.0 42.2 42.1
#> 4 -1.0 15.5 42.3 42.2
#> 5 -0.5 1.0 42.4 NA
#> 6 0.0 1.5 42.5 42.4
#> 7 0.5 2.0 42.6 42.5
#> 8 1.0 2.5 42.7 42.6
#> 9 1.5 8.0 42.8 NA
#> 10 2.0 8.5 42.9 42.8
#> 11 2.5 9.0 43.0 42.9
#> 12 3.0 9.5 43.1 43.0
#> 13 3.5 10.0 43.2 43.1
Data:
df1 <- data.frame(V1 = c(-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,3,3.5),
V2 = c(14,14.5,15,15.5,1,1.5,2,2.5,8,8.5,9,9.5,10),
V3 = c(42,42.1,42.2,42.3,42.4,42.5,42.6,42.7,42.8,42.9,43,43.1,43.2))
I'm using qtgrace for MacOS and when I plotted two data in qtgrace I got something like this:
Overlapping data sets
However, I would like to plot something like this:
Non-overlapping data sets
My data 1:
0 14
0.1 6
0.2 14
0.3 14
0.4 14
0.5 14
0.6 14
0.7 14
0.8 6
0.9 6
1 6
1.1 6
1.2 6
1.3 6
1.4 6
1.5 6
1.6 6
1.7 6
1.8 6
1.9 6
2 6
2.1 6
2.2 6
2.3 6
2.4 6
2.5 6
2.6 6
2.7 6
2.8 6
2.9 6
3 6
3.1 6
3.2 6
3.3 6
3.4 6
3.5 6
3.6 6
3.7 6
3.8 6
3.9 6
4 6
4.1 6
4.2 6
4.3 6
4.4 6
4.5 6
4.6 6
4.7 6
4.8 6
4.9 6
5 6
5.1 6
5.2 6
5.3 6
5.4 6
5.5 6
5.6 6
5.7 6
5.8 6
5.9 6
6 6
6.1 6
6.2 6
6.3 6
6.4 6
6.5 6
6.6 6
6.7 6
6.8 6
6.9 6
7 6
7.1 6
7.2 6
7.3 2
7.4 6
7.5 2
7.6 2
7.7 2
7.8 2
7.9 6
8 2
8.1 6
8.2 2
8.3 2
8.4 6
8.5 6
8.6 6
8.7 2
8.8 6
8.9 19
9 19
9.1 6
9.2 6
9.3 6
9.4 2
9.5 2
9.6 2
9.7 2
9.8 2
9.9 2
10 2
10.1 2
10.2 2
10.3 2
10.4 2
10.5 2
10.6 2
10.7 2
10.8 2
10.9 2
11 2
11.1 2
11.2 2
11.3 2
11.4 2
11.5 2
11.6 2
11.7 2
11.8 2
11.9 2
12 2
12.1 2
12.2 2
12.3 2
12.4 2
12.5 2
12.6 2
12.7 2
12.8 2
12.9 2
13 2
13.1 2
13.2 2
13.3 2
13.4 2
13.5 2
13.6 2
13.7 2
13.8 2
13.9 2
14 2
14.1 2
14.2 2
14.3 2
14.4 2
14.5 2
14.6 2
14.7 2
14.8 2
14.9 2
15 2
15.1 2
15.2 2
15.3 2
15.4 2
15.5 2
15.6 2
15.7 2
15.8 2
15.9 2
16 2
16.1 2
16.2 2
16.3 2
16.4 2
16.5 2
16.6 2
16.7 2
16.8 2
16.9 2
17 2
17.1 2
17.2 2
17.3 2
17.4 2
17.5 2
17.6 2
17.7 2
17.8 2
17.9 2
18 2
18.1 2
18.2 2
18.3 2
18.4 2
18.5 2
18.6 2
18.7 2
18.8 2
18.9 2
19 2
19.1 2
19.2 2
19.3 2
19.4 2
19.5 2
19.6 2
19.7 2
19.8 2
19.9 2
20 2
20.1 2
20.2 2
20.3 2
20.4 2
20.5 2
20.6 2
20.7 2
20.8 2
20.9 2
21 2
21.1 2
21.2 2
21.3 2
21.4 2
21.5 2
21.6 2
21.7 2
21.8 7
21.9 2
22 2
22.1 2
22.2 2
22.3 7
22.4 7
22.5 7
22.6 7
22.7 7
22.8 2
22.9 2
23 7
23.1 7
23.2 7
23.3 7
23.4 7
23.5 2
23.6 2
23.7 2
23.8 2
23.9 2
24 2
24.1 2
24.2 2
24.3 2
24.4 2
24.5 2
24.6 2
24.7 2
24.8 2
24.9 2
25 2
. .
. .
. .
Data 2:
0 4
0.1 4
0.2 4
0.3 4
0.4 4
0.5 4
0.6 4
0.7 4
0.8 4
0.9 4
1 2
1.1 4
1.2 4
1.3 4
1.4 4
1.5 4
1.6 4
1.7 4
1.8 4
1.9 4
2 4
2.1 4
2.2 4
2.3 4
2.4 4
2.5 4
2.6 4
2.7 4
2.8 4
2.9 4
3 4
3.1 4
3.2 4
3.3 4
3.4 4
3.5 4
3.6 4
3.7 4
3.8 4
3.9 4
4 4
4.1 4
4.2 4
4.3 4
4.4 4
4.5 4
4.6 4
4.7 4
4.8 4
4.9 4
5 4
5.1 4
5.2 4
5.3 4
5.4 4
5.5 4
5.6 4
5.7 4
5.8 4
5.9 4
6 4
6.1 4
6.2 4
6.3 4
6.4 4
6.5 4
6.6 4
6.7 4
6.8 4
6.9 4
7 4
7.1 4
7.2 4
7.3 4
7.4 4
7.5 4
7.6 4
7.7 4
7.8 4
7.9 4
8 4
8.1 4
8.2 4
8.3 4
8.4 2
8.5 4
8.6 4
8.7 4
8.8 4
8.9 4
9 4
9.1 4
9.2 4
9.3 4
9.4 4
9.5 4
9.6 4
9.7 4
9.8 4
9.9 4
10 4
10.1 4
10.2 4
10.3 4
10.4 4
10.5 2
10.6 2
10.7 4
10.8 2
10.9 2
11 2
11.1 2
11.2 4
11.3 4
11.4 2
11.5 2
11.6 2
11.7 2
11.8 2
11.9 2
12 2
12.1 2
12.2 2
12.3 2
12.4 4
12.5 4
12.6 2
12.7 2
12.8 4
12.9 2
13 2
13.1 4
13.2 4
13.3 4
13.4 4
13.5 10
13.6 2
13.7 2
13.8 2
13.9 2
14 2
14.1 2
14.2 2
14.3 10
14.4 2
14.5 2
14.6 4
14.7 2
14.8 2
14.9 4
15 2
15.1 10
15.2 2
15.3 2
15.4 2
15.5 2
15.6 2
15.7 2
15.8 2
15.9 2
16 2
16.1 2
16.2 2
16.3 2
16.4 2
16.5 2
16.6 2
16.7 2
16.8 2
16.9 2
17 2
17.1 2
17.2 2
17.3 2
17.4 2
17.5 2
17.6 2
17.7 2
17.8 2
17.9 2
18 2
18.1 2
18.2 2
18.3 2
18.4 2
18.5 2
18.6 2
18.7 2
18.8 2
18.9 2
19 2
19.1 2
19.2 2
19.3 2
19.4 2
19.5 2
19.6 2
19.7 2
19.8 2
19.9 2
20 2
20.1 2
20.2 2
20.3 2
20.4 2
20.5 2
20.6 2
20.7 2
20.8 2
20.9 2
21 2
21.1 2
21.2 2
21.3 2
21.4 2
21.5 2
21.6 2
21.7 2
21.8 2
21.9 2
22 2
22.1 2
22.2 2
22.3 2
22.4 2
22.5 2
22.6 2
22.7 2
22.8 2
22.9 2
23 2
23.1 2
23.2 2
23.3 2
23.4 2
23.5 2
23.6 2
23.7 2
23.8 2
23.9 2
24 2
24.1 2
24.2 2
24.3 2
24.4 2
24.5 2
24.6 2
24.7 2
24.8 2
24.9 2
25 2
. .
. .
. .
The data are in two separate xvg file from GROMACS cluster analysis. I wanna plot five different sets in a manner which I can see all data without superposing.
Thank you!
I think the best approach would be to write a script that takes the original files and spits out new files with shifted y values. However, since you have asked for a qt/xmgrace solution, here is how you do it:
Load up all the datasets into qtgrace
Open the "Data -> Transformations -> Evaluate expression..." dialog
Select in the left and right columns a dataset and in the textbox below enter the formula y = y + 0.1. Click "apply". This will shift the dataset up by 0.1
Select the next dataset in the same way and use the formula y = y + 0.2. Click apply
Rinse and repeat for all the datasets (changing the shift accordingly)
I am trying to work with this a data that has information about price electricity. The price is registered each 5 minutes. My objective is replace the negative values with the mean of the day.
year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009 7 1 1 16.9 17.6 16.7 15.7 15.5
2009 7 1 2 17.7 18.8 17.8 -16.1 15.5
2009 7 1 3 -17.7 18.6 18.1 15.9 15.4
2009 7 1 4 16.7 18.6 -17.6 14.3 12.8
2009 7 2 1 -15.6 17.6 16.3 13.2 11.8
2009 7 2 2 13.7 15.7 12.0 -11.1 -12.9
2009 7 2 3 13.7 15.8 11.9 11.1 12.9
2009 7 2 4 -13.9 16.1 -12.1 11.2 12.9
2009 8 1 1 13.8 16.0 12.2 11.2 12.8
2009 8 1 2 -13.7 16.3 11.6 10.6 12.6
2009 8 1 3 13.7 -15.8 11.9 11.0 12.7
2009 8 1 4 13.8 16.0 12.1 11.2 12.9
2009 8 2 1 17.6 -17.6 17.3 16.5 17.1
2009 8 2 2 17.7 17.6 17.3 16.8 17.4
2009 8 2 3 15.8 16.0 15.1 15.0 15.5
2009 8 2 4 -15.4 15.6 14.5 14.6 15.1
2009 9 1 1 14.7 15.0 13.8 14.0 14.5
2009 9 1 2 15.3 15.4 14.3 14.6 15.0
2009 9 1 3 15.3 15.6 14.4 14.5 15.0
2009 9 1 4 14.9 15.7 13.7 13.8 14.5
In order to obtain the mean of each day I use the following code
Daily_mean<-Base %>%
arrange(year, month, day, fivemin) %>% #we are ordering the data
group_by(year, month, day)%>%
summarise_at(
vars(c(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic)),
.funs = funs(mean(.)))
When I get the daily mean I want to replace each negative value with the mean of the day. For example using the 16th observation
2009 8 2 4 "8.925" 15.6 14.5 14.6 15.1
If someone can help me i would be grateful
We can use a replace in mutate_at to change the negative values to mean of that column after grouping by the relevant columns
library(dplyr)
Base %>%
arrange(year, month, day, fivemin) %>%
group_by(year, month, day) %>%
mutate_at(vars(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic),
~ replace(., . < 0, mean(.)))
I'm calling height, diameter and age from a csv file. I'm trying to calculate the volume of the tree using pi x h x r^2. In order to calculate the radius, I'm taking dbh and dividing it by 2. Then I get this error.
Error in dbh/2 : non-numeric argument to binary operator
setwd("/Users/user/Desktop/")
treeg <- read.csv("treeg.csv",row.names=1)
head(treeg)
heights <- tapply(treeg$height.ft,treeg$forest, identity)
ages <- tapply(treeg$age,treeg$forest, identity)
dbh <- tapply(treeg$dbh.in,treeg$forest, identity)
radius <- dbh / 2
In the vector dbh it is storing the diameter from he csv file in terms of forest which is the ID.
How can I divide dbh by 2, while still retaining format of each value being stored by its receptive ID (which is he forest ---> treeg$forest) and treeg is the dataframe that call the csv file.
> head(treeg)
tree.ID forest habitat dbh.in height.ft age
1 1 4 5 14.6 71.4 55
2 1 4 5 12.4 61.4 45
3 1 4 5 8.8 40.1 35
4 1 4 5 7.0 28.6 25
5 1 4 5 4.0 19.6 15
6 2 4 5 20.0 103.4 107
str(dbh)
List of 9
$ 1: num [1:36] 19.9 18.6 16.2 14.2 12.3 9.4 6.8 4.9 2.6 22 ...
$ 2: num [1:60] 16.5 15.5 14.5 13.7 12.7 11.4 9.5 8 5.9 4.1 ...
$ 3: num [1:50] 18.4 17.2 15.6 13.7 11.6 8.5 5.3 2.8 13.3 10.6 ...
$ 4: num [1:81] 14.6 12.4 8.8 7 4 20 18.8 17 15.9 14 ...
$ 5: num [1:153] 28 27.2 26.1 25 23.7 21.3 19 16.7 12.2 9.8 ...
$ 6: num [1:22] 21.3 20.2 19.1 18 16.9 15.6 14.8 13.3 11.3 9.2 ...
$ 7: num [1:63] 13.9 12.4 10.6 8.1 5.8 3.4 27 25.6 23 20.2 ...
$ 8: num [1:27] 20.8 17.7 15.6 13.2 10.5 7.5 4.8 2.9 12.9 11.3 ...
$ 9: num [1:50] 23.6 20.5 16.9 14.1 11.1 8 5.1 2.9 24.1 20.9 ...
- attr(*, "dim")= int 9
- attr(*, "dimnames")=List of 1
..$ : chr [1:9] "1" "2" "3" "4" ...
Are you just trying to create a radius column that is dbh.in divided by two?
treeg <- read.table(textConnection("tree.ID forest habitat dbh.in height.ft age
1 1 4 5 14.6 71.4 55
2 1 4 5 12.4 61.4 45
3 1 4 5 8.8 40.1 35
4 1 4 5 7.0 28.6 25
5 1 4 5 4.0 19.6 15
6 2 4 5 20.0 103.4 107"), header=TRUE)
treeg$radius <- treeg$dbh.in / 2
Or do you need that dbh list for something...
dbh <- tapply(treeg$dbh.in,treeg$forest, identity)
> dbh
$`4`
[1] 14.6 12.4 8.8 7.0 4.0 20.0
lapply(dbh, function(x)x/2)
List of 1
$ 4: num [1:6] 7.3 6.2 4.4 3.5 2 10