Linking two datasets - r

I have a dataset called "J_BL5H1", this includes :
Var1 Freq
4 10
8 10
10 13
11 7
13 3
17 10
19 10
25 1
26 4
27 8
53 13
From this dataset, I want to find all Var1s seperately, and I want to called this new data like J_BL5H1JNVar1Number, here Var1Number denotes to specific Var1s, e.g. "4, 8, 10".
I will use this :
J_BL5H1JNVar1Number <- J_BL5H1$Freq[1]
Here, I want to replace Var1Number to "Var1" values in the old data.
For example, if I want to know the "Freq[4]", my new data should be called like "J_BL5H1JN11", the "Var1Number" will be automatically replaced by the Var1 of Freq[4], in this case by 11.
I hope I can clearly state my problem, Thanks.

First use paste to create the names of the data.sets:
data.string <- "J_BL5H1LN"
split.var <- "Var1"
data.sets <- paste(data.string, J_BL5H1[, split.var], sep = "")
Then use a loop to assign the according values to the data sets:
for( i in seq_along(data.sets) ) assign(data.sets[i], J_BL5H1[i, "Freq"])
Now you have the data sets in your workspace:
ls()
Btw, if you want to access the different data sets without actually calling them every time, you can access them by name using the get function:
sapply(data.sets, get)

Related

Replacing values based on multiple columns in R

I have raw data with multiple observation and I have a cleaning log which contains some new values for specific columns of raw data I want to replace old values with these new ones.
My raw data is :
raw_df<- data.frame(
id=c(1,2,3,4),
name=c("a","b","c","d"),
age=c(15,16,20,22),
add=c("xyz","bc","no","da")
)
MY cleaning log is :
cleaning_log <- data.frame(
id=c(2,4),
question=c("name","age"),
old_value=c("b",22),
new_value=c("bob",25)
)
And my expected result is :
result<-data.frame(
id=c(1,2,3,4),
name=c("a","bob","c","d"),
age=c(15,16,20,25),
add=c("xyz","bc","no","da")
)
Note:At the end how can I check whether these new values are replaced properly or not?
In addition, in cleaning log question column I may have more than two columns like 10 to 20 which possibly will have new value but here I just give two column names as an example.
Thanks in advance for you help
Find out the row number and column number to change in raw_df using match and replace it with cleaning_log$new_value.
row_inds <- match(cleaning_log$id, raw_df$id)
col_inds <- match(cleaning_log$question, names(raw_df))
raw_df[cbind(row_inds, col_inds)] <- cleaning_log$new_value
raw_df
# id name age add
#1 1 a 15 xyz
#2 2 bob 16 bc
#3 3 c 20 no
#4 4 d 25 da

data extraction from object list frq

The task is simple but I do something wrong. I use package sjmisc, and the function frq (frequency table). I would like to get acces to column: valid.prc and store it as a variable (last part is easy, but the initial one makes trouble, i.e. a$valid.prc doesn't work and result is NULL).
Sample data:
a <- sample(seq(from =1, to =7),size = 100,replace = T)
frequencytable <- frq(a)
How to extract data from column valid.prc? Many thanks for help.
frequencytable is a list, use [[ to subset list so that you have a dataframe and then extract column valid.prc as usual
class(frequencytable)
#[1] "sjmisc_frq" "list"
frequencytable[[1]]$valid.prc
#[1] 17 11 14 19 15 11 13 NA

Insert i th vector number into data frame column name - R

This is likely a quick fix! I am trying to place the ith position of my vector into my data frame column name. I am trying to use paste0 to enter the ith number.
sma <- 2:20
> sma
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# Place i number from sma vector to data frame column name
spx.sma <- df$close.sma.paste0("n", sma[i])
Column name should read:
"close.sma.n2"
If I print
paste0("n", sma[i])
I obtain:
> paste0("n", sma[i])
[1] "n2"
So really if i paste this into my data frame column name then it should read:
close.sma.n2
What is the correct method to achieve this?
I achieve the error:
> spx.sma <- df$close.sma.paste0(".n", sma[i])
Error: attempt to apply non-function
You should treat the dataframe as a list. So avoid the "$" operator and instead use [[]].
so:
spx.sma <- df[[paste0("close.sma.n", sma[i])]]

Calling & creating new columns based on string

I have searched quite a bit and not found a question that addresses this issue--but if this has been answered, forgive me, I am still quite green when it comes to coding in general. I have a data frame with a large number of variables that I would like to combine & create new variables from based on names I've put in a 2nd data frame in a loop. The data frame formulas should create & call columns from the main data frame data
USDb = c(1,2,3)
USDc = c(4,5,6)
EURb = c(7,8,9)
EURc = c(10,11,12)
data = data.frame(USDb, USDc, EURb, EURc)
Now I'd like to create a new column data$USDa as defined by
data$USDa = data$USDb - data$USDc
and so on for EUR and other variables. This is easy enough to do manually, but I'd like to create a loop that pulls the names from formulas, something like this:
a = c("USDa", "EURa")
b = c("USDb", "EURb")
c = c("USDc", "EURc")
formulas = data.frame(a,b,c)
for (i in 1:length(formulas[,a])){
data$formulas[i,a] = data$formulas[i,b] - data$formulas[i,c]
}
Obviously data$formulas[i,a] this returns NULL, so I tried data$paste0(formulas[i,a]) and that returns Error: attempt to apply non-function
How can I get these strings to be recognized as variables in this way? Thanks.
There are simpler ways to do this, but I'll stick to most of your code as a means of explanation. Your code should work so long as you edit your for loop to the following:
for (i in 1:length(formulas[,"a"])){
data[formulas[i,"a"]] = data[formulas[i,"b"]] - data[formulas[i,"c"]]
}
formulas[,a] won't work because you have a variable defined as a already that is not appropriate inside an index. Use formulas[, "a"] instead if you want all rows from column "a" in data.frame formulas.
data$formulas is literally searching for the column called "formulas" in the data.frame data. Instead you want to write data[formulas](of course, knowing that you need to index formulas in order to make it a proper string)
logic : iterate through each of the formulae, using a apply which is a for loop internally, and do calculation based on the formula
x = apply(formulas, 1, function(x) data[[x[3]]] - data[[x[2]]])
colnames(x) = formulas$a
x
# USDa EURa
#[1,] 3 3
#[2,] 3 3
#[3,] 3 3
cbind(data, x)
# USDb USDc EURb EURc USDa EURa
#1 1 4 7 10 3 3
#2 2 5 8 11 3 3
#3 3 6 9 12 3 3
Another option is split with sapply
sapply(setNames(split.default(as.matrix(formulas[-1]),
row(formulas[-1])), formulas$a), function(x) Reduce(`-`, data[rev(x)]))
# USDa EURa
#[1,] 3 3
#[2,] 3 3
#[3,] 3 3

Change multiple dataframes in a loop

I have, for example, this three datasets (in my case, they are many more and with a lot of variables):
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame2 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
on each data frame I want to add a variable resulting from a transformation of an existing variable on that data frame. I would to do this by a loop. For example:
datasets <- c("data_frame1","data_frame2","data_frame3")
vars <- c("a","b","c")
for (i in datasets){
for (j in vars){
# here I need a code that create a new variable with transformed values
# I thought this would work, but it didn't...
get(i)$new_var <- log(get(i)[,j])
}
}
Do you have some valid suggestions about that?
Moreover, it would be great for me if it were possible also to assign the new column names (in this case new_var) by a character string, so I could create the new variables by another for loop nested in the other two.
Hope I've not been too tangled in explain my problem.
Thanks in advance.
You can put your dataframes in a list and use lapply to process them one by one. So no need to use a loop in this case.
For example you can do this :
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame3 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
ll <- list(data_frame1,data_frame2,data_frame3)
lapply(ll,function(df){
df$log_a <- log(df$a) ## new column with the log a
df$tans_col <- df$a+df$b+df$c ## new column with sums of some columns or any other
## transformation
### .....
df
})
the dataframe1 becomes :
[[1]]
a b c log_a tans_col
1 1 3 4 0.0000000 8
2 5 6 4 1.6094379 15
3 3 1 1 1.0986123 5
4 3 5 9 1.0986123 17
5 2 5 2 0.6931472 9
I had the same need and wanted to change also the columns in my actual list of dataframes.
I found a great method here (the purrr::map2 method in the question works for dataframes with different columns), followed by
list2env(list_of_dataframes ,.GlobalEnv)

Resources