Merging columns from different data frames - r

I have a problem....
I have two data frames
>anna1
name from to result
11 66607 66841 0
11 66846 67048 0
11 67053 67404 0
11 67409 68216 0
11 68221 68786 0
11 68791 69020 0
11 69025 69289 0
11 69294 70167 0
11 70172 70560 0
and the second data frame is
>anna2
name from to result
11 66607 66841 5
11 66846 67048 6
11 67409 68216 7
11 69025 69289 12
11 70172 70560 45
What I want is to create a new data frame similar with the anna1 where all the 0 values will be replaced by the correct results in the correct row from the anna2
you are going to notice that in the anna2 data frame, in the from and to columns have only some same values with the respective in the anna1 data frame
....the intermediate are missing
So i need somehow to take the numbers from the result column in the anna2 and put them in the correct row in the anna1
thank you in advance
Best regards
Anna

A simpler merge:
anna3 <-merge(anna2,anna1[,1:3], all.y=TRUE)
anna3[is.na(anna3)] <- 0
Gives:
> anna3
name from to result
1 11 66607 66841 5
2 11 66846 67048 6
3 11 67053 67404 0
4 11 67409 68216 7
5 11 68221 68786 0
6 11 68791 69020 0
7 11 69025 69289 12
8 11 69294 70167 0
9 11 70172 70560 45

If the "from" column is guaranteed to be unique in both anna1 and anna2, AND every row in anna2 has a matching row in anna1 (though not vice versa), a simple solution is
row.index = function(d) which(anna1$from == d)[1]
indices = sapply(anna2$from, row.index)
anna1$result[indices] = anna2$result

Another approach
require(plyr)
anna <- rbind(anna1, anna2)
ddply(anna, .(name, from, to), summarize, result = sum(result))
EDIT. If the data frames are large, and speed is an issue, think of using data.table
require(data.table)
data.table(anna)[,list(result = sum(result)),'name, from, to']

You can use merge, but you have to explicitly specify what should be done with the two result columns.
d <- merge(anna1, anna2, by=c("name", "from", "to"), all=TRUE)
d$result <- ifelse(d$result.x == 0 & !is.na( d$result.y ), d$result.y, d$result.x)
d <- d[,c("name", "from", "to", "result")]

Related

Multiply various subsets of a data frame by different elements of a vector R

I have a data frame:
df<-data.frame(id=rep(1:10,each=10),
Room1=rnorm(100,0.4,0.5),
Room2=rnorm(100,0.3,0.5),
Room3=rnorm(100,0.7,0.5))
And a vector:
vals <- sample(7:100, 10)
I want to multiply cols Room1, Room2 and Room3 by a different element of the vector for every unique ID number and output a new data frame (df2).
I managed to multiply each column per id by EVERY element of the vector using the following:
samp_func <- function(x) {
x*vals[i]
}
for (i in vals) {
df2 <- df %>% mutate_at(c("Room1", "Room2", "Room3"), samp_func)
}
But the resulting df (df2) is each Room column multiplied by the same element of the vector (vals) for each of the different ids. When what I want is each Room column (per id) multiplied by a different element of the vector vals. Sorry in advance if this is not clear I am a beginner and still getting to grips with the terminology.
Thanks!
EDIT: The desired output should look like the below, where the columns for each ID have been multiplied by a different element of the vector vals.
id Room1 Room2 Room3
1 1 24.674826880 60.1942571 46.81276141
2 1 21.970270107 46.0461779 35.09928150
3 1 26.282357614 -3.5098880 38.68400541
4 1 29.614182061 -39.3025587 25.09146592
5 1 33.030886472 46.0354881 42.68209027
6 1 41.362699668 -23.6624632 26.93845129
7 1 5.429031042 26.7657577 37.49086963
8 1 18.733422977 -42.0620572 23.48992138
9 1 -17.144070723 9.9627315 55.43999326
10 1 45.392182468 20.3959968 -16.52166621
11 2 30.687978299 -11.7194020 27.67351631
12 2 -4.559185345 94.9256561 9.26738357
13 2 86.165076849 -1.2821515 29.36949423
14 2 -12.546711562 47.1763755 152.67588456
15 2 18.285856423 60.5679496 113.85971720
16 2 72.074929648 47.6509398 139.69051486
17 2 -12.332519694 67.8890324 20.73189965
18 2 80.889634991 69.5703581 98.84404415
19 2 87.991093995 -20.7918559 106.13610773
20 2 -2.685594148 71.0611693 47.40278949
21 3 4.764445589 -7.6155681 12.56546664
22 3 -1.293867841 -1.1092243 13.30775785
23 3 16.114831628 -5.4750642 8.58762550
24 3 -0.309470950 7.0656088 10.07624289
25 3 11.225609780 4.2121241 16.59168866
26 3 -3.762529113 6.4369973 15.82362705
27 3 -5.103277731 0.9215625 18.20823042
28 3 -10.623165177 -5.2896293 33.13656839
29 3 -0.002517872 5.0861361 -0.01966699
30 3 -2.183752881 24.4644310 13.55572730
This should solve your problem. You can use a new dataset of all id, value combinations to make sure you calculate each combination and merge on the Room values. Then use mutate to make new Room columns.
Also, in the future I'd recommend setting a seed when asking questions with random data as it's easier for someone to replicate your output.
library(dplyr)
set.seed(0)
df<-data.frame(id=rep(1:10,each=10),
Room1=rnorm(100,0.4,0.5),
Room2=rnorm(100,0.3,0.5),
Room3=rnorm(100,0.7,0.5))
vals <- sample(7:100, 10)
other_df <- data.frame(id=rep(1:10),
val = rep(vals, 10))
df2 <- inner_join(other_df, df)
df2 <- df2 %>%
mutate(Room1 = Room1*val,
Room2 = Room2*val,
Room3 = Room3*val)

Change or keep the value of a specific column considering the value in the same row and another column

I want to know how to change or keep the value of a specific column considering the value in the same row and another column.
Here is my dataset named (df):
BLUP_pop BLUPISM_rate
1 1.94693747 1.00000000
2 1.33774978 0.68710465
3 1.04724481 0.78284058
4 0.95897119 0.91570871
5 0.75524367 0.78755616
6 0.44728346 0.59223728
7 0.35502008 0.79372504
8 0.29392675 0.82791585
9 0.26649710 0.90667862
10 0.15827465 0.59390759
11 -0.00630328 -0.03982495
12 -0.21526737 34.15164327
I'd like to state the following rule:
If df$BLUP_pop <= 0, then paste "0" in df$BLUPISM_rate;
If df$BLUP_pop >= 0, then keep the value.
i.e.
BLUP_pop BLUPISM_rate
1 1.94693747 1.00000000
2 1.33774978 0.68710465
3 1.04724481 0.78284058
4 0.95897119 0.91570871
5 0.75524367 0.78755616
6 0.44728346 0.59223728
7 0.35502008 0.79372504
8 0.29392675 0.82791585
9 0.26649710 0.90667862
10 0.15827465 0.59390759
11 -0.00630328 0
12 -0.21526737 0
Thanks.
BLUPISM_rate is an existing column in data frame df which can be modified according to other column BLUP_pop based of condition using ifelse.
mutate is a function in dplyr package to do manipulations in existing or new columns in given data frame
# ifelse(condition,TRUE,FALSE)
library(dplyr)
df <- df %>%
mutate(
BLUPISM_rate = ifelse(BLUP_pop <= 0 , 0 , BLUPISM_rate)
)
print(df)

Creating a subset of unique entries for a recursive list in R

I have the following data set df
name draught nav_status date
A 22 0 24/12/2014
A 22 0 25/12/2014
A 11 5 26/12/2014
A 11 1 27/12/2014
B 22 0 24/12/2014
B 22 0 25/12/2014
B 22 0 26/12/2014
B 22 5 27/12/2014
B 9 0 28/12/2014
B 22 0 29/12/2014
from this data set, I need to extract the unique draught values for each object of the list.
I am fairly new to R and have made the following attempts
y <- subset(df,!duplicated(df[,draught]),)
and
Dup <- function(x){
x <- x[!duplicated[x$draught],]
y <- lapply(df, Dup)
But this deletes the draught entries for the entire data. I went through some literature regarding split-apply and combine techniques and also tries those options.
Please provide some guidance, literature so as to solve this problem.
The result should be
name draught nav_status date
A 22 0 24/12/2014
A 11 5 26/12/2014
A 11 1 27/12/2014
B 22 0 25/12/2014
B 9 0 28/12/2014
I even tried to subsetthe data based on first and last entries by arranging them sequentially and deleting the duplicate entries, but there was loss of data.Thank you!!
Using data.table library you can arrive at the result by:
library(data.table)
dt <- as.data.table(df)
unique(dt, by = c('name', 'draught'))
One thing though. Why you have two entries of a pair A 11 in your desired result?

looping over the name of the columns in R for creating new columns

I am trying to use the loop over the column names of the existing dataframe and then create new columns based on one of the old column.Here is my sample data:
sample<-list(c(10,12,17,7,9,10),c(NA,NA,NA,10,12,13),c(1,1,1,0,0,0))
sample<-as.data.frame(sample)
colnames(sample)<-c("x1","x2","D")
>sample
x1 x2 D
10 NA 1
12 NA 1
17 NA 1
7 10 0
9 20 0
10 13 0
Now, I am trying to use for loop to generate two variables x1.imp and x2.imp that have values related to D=0 when D=1 and values related to D=1 when D=0(Here I actually don't need for loop but for my original dataset with large cols (variables), I really need the loop) based on the following condition:
for (i in names(sample[,1:2])){
sample$i.imp<-with (sample, ifelse (D==1, i[D==0],i[D==1]))
i=i+1
return(sample)
}
Error in i + 1 : non-numeric argument to binary operator
However, the following works, but it doesn't give the names of new cols as imp.x2 and imp.x3
for(i in sample[,1:2]){
impt.i<-with(sample,ifelse(D==1,i[D==0],i[D==1]))
i=i+1
print(as.data.frame(impt.i))
}
impt.i
1 7
2 9
3 10
4 10
5 12
6 17
impt.i
1 10
2 12
3 13
4 NA
5 NA
6 NA
Note that I already know the solution without loop [here]. I want with loop.
Expected output:
x1 x2 D x1.impt x2.imp
10 NA 1 7 10
12 NA 1 9 20
17 NA 1 10 13
7 10 0 10 NA
9 20 0 12 NA
10 13 0 17 NA
I would greatly appreciate your valuable input in this regard.
This is nuts, but since you are asking for it... Your code with minimum changes would be:
for (i in colnames(sample)[1:2]){
sample[[paste0(i, '.impt')]] <- with(sample, ifelse(D==1, get(i)[D==0],get(i)[D==1]))
}
A few comments:
replaced names(sample[,1:2]) with the more elegant colnames(sample)[1:2]
the $ is for interactive usage. Instead, when programming, i.e. when the column name is to be interpreted, you need to use [ or [[, hence I replaced sample$i.imp with sample[[paste0(i, '.impt')]]
inside with, i[D==0] will not give you x1[D==0] when i is "x1", hence the need to dereference it using get.
you should not name your data.frame sample as it is also the name of a pretty common function
This should work,
test <- sample[,"D"] == 1
for (.name in names(sample)[1:2]){
newvar <- paste(.name, "impt", sep=".")
sample[[newvar]] <- ifelse(test, sample[!test, .name],
sample[test, .name])
}
sample

Count of element in data.frame

I have data that illustrates hurricane tracks crossing through a series of "gates". How would I code it to output the GateID, and the count of times that each GateID occurs in the total data frame?
track_id day hour month year rate gate_id pres_inter vmax_inter
9 10 0 7 1 9.6451E-06 2 97809 23.545
9 10 0 7 1 9.6451E-06 17 100170 13.843
10 3 6 7 1 9.6451E-06 2 96662 31.568
13 22 12 8 1 9.6451E-06 1 94449 48.466
13 22 12 8 1 9.6451E-06 17 96749 30.55
16 13 0 8 1 9.6451E-06 4 98702 19.205
16 13 0 8 1 9.6451E-06 16 98585 18.143
19 27 6 9 1 9.6451E-06 9 98838 20.053
header <- read.table(fname_in, nrows=1)
track <- read.table(fname_in, sep=',', skip=1)
colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter")
I think I would like to count the occurrence of each gate_id, and also perhaps output the maximum wind per gate (vmax_inter), etc....
Totally reading your mind, since you provide nothing concrete to go on. But if GateID is one of your data frame columns, you can get the count for each unique GateID along with other parameters using count from package plyr.
install.packages("plyr")
library("plyr")
count(mydf, vars = "GateID")
See ?count after installing for further details.
For the 2nd part of your question, see ?aggregate and consider the formula interface. For example,
aggregate(gate_id ~ vmax_inter, data = mydf, FUN = max)
or something similar. By the way, you can combine your two read.table steps with 'read.csv`

Resources