My problem is that i want to use a function to change a random value to NA in a global data frame.
df is a dataframe with 230 rows and 2 columns.
abstract code:
emptychange<- function(x){
placenumber <- round(runif(1,min= min(1),max=max(nrow(x))))
x[placenumber,2] <<- NA
}
emptychange(df)
The Error is:"Error in x[placenumber, 2] <<- NA : object 'x' not found".
I think the mistake is, that r searches for the global variable 'x' and doesn't use the function x-value (in this case df). How can I fix this? Thanks!
This works. The problem was this: <<- NA Double arrows are used when you want to assign a value to an object outside the function. In you case, your x is inside the function.
df1 <-data.frame(x = 1, y = 1:10)
emptychange<- function(x){
placenumber <- round(runif(1,min= min(1),max=max(nrow(x))))
x[placenumber,2] <- NA
return(x)
}
emptychange(df1)
f you want this to be done at the console, you can just use sample-ing from the row count inside the [<- function:
> df1 <-data.frame(x = 1, y = 1:10)
> df1[sample(nrow(df1), 1) , 2] <- NA
> df1
x y
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 NA
7 1 7
8 1 8
9 1 9
10 1 10
If you want to destructively change the dataframe argument given to a function you should instead assign the value which is returned back to the original name:
> randNA.secCol <- function(df) {df[sample(nrow(df), 1) , 2] <- NA; df}
> df1 <-data.frame(x = 1, y = 1:10)
> df1 <- randNA.secCol(df1)
Best practice in R is avoidance of the use of the <<- function.
Related
I have data as follows:
dataset = list()
a <- c(1,2,3)
b <- c(1,2,3)
country <- c("A","B","C")
source_country <- c("D","D","D")
dataset[[1]] <- data.frame(a,b,country, source_country)
a <- c(NA)
b <- c(NA)
country <- c(NA)
source_country <- c(NA)
dataset[[2]] <- data.frame(a,b,country, source_country)
I want to rename each list item with the source_country from the data frame of the same list item. I tried the following:
for (i in 1:length(dataset)) {
if (!is.null(dataset[[i]])) {
print ("no data")
} else if (nrow(dataset[[i]]) > 1) {
names(dataset)[i] <- dataset[[i]][["source_country"]][[1]]
}
}
But it does not seem to work..
Desired Outcome:
names(dataset)[1] <- "D"
names(dataset)[2] <- "NA"
A purrr option -
library(purrr)
set_names(dataset, map_chr(dataset, pluck, "source_country", 1))
#$D
# a b country source_country
#1 1 1 A D
#2 2 2 B D
#3 3 3 C D
#$<NA>
# a b country source_country
#1 NA NA NA NA
If your R version is less than 4.1.0 then replace \(x) with function(x):
names(dataset) <- sapply(dataset, \(x) x$source_country[1])
This will give your second element a name of NA. If you want that to be a character you can wrap with the function as.character.
The problem with your loop is that you're testing if each element of your list is not null (is.null tests if the element is null, !is.null inverts this). Since each element of your list is a dataframe none of them are null so your loop never enters the else if clause. The only thing you're doing in your if statement is printing so nothing is renamed.
You could do something like:
for (i in 1:length(dataset)) {
if (nrow(dataset[[i]]) == 0) {
print ("no data")
} else if (nrow(dataset[[i]]) >= 1) {
names(dataset)[i] <- dataset[[i]][["source_country"]][1]
}
}
Using base R
setNames(dataset, unlist(sapply(dataset, subset,
subset = seq_along(source_country) == 1, select = source_country)))
-ouptut
$D
a b country source_country
1 1 1 A D
2 2 2 B D
3 3 3 C D
$<NA>
a b country source_country
1 NA NA NA NA
I'm trying to loop this sequence of steps in r for a data frame.
Here is my data:
ID Height Weight
a 100 80
b 80 90
c na 70
d 120 na
....
Here is my code so far
winsorize2 <- function(x) {
Min <- which(x == min(x))
Max <- which(x == max(x))
ord <- order(x)
x[Min] <- x[ord][length(Min)+1]
x[Max] <- x[ord][length(x)-length(Max)]
x}
df<-read.csv("data.csv")
df2 <- scale(df[,-1], center = TRUE, scale = TRUE)
id<-df$Type
full<-data.frame(id,df2)
full[is.na(full)] <- 0
full[, -1] <- sapply(full[,-1], winsorize2)
what i'm trying to do is this:-> Standardize a dataframe, then winsorize the standardized dataframe using the function winsorize2, ie replace the most extreme values with the second least extreme value. This is then repeated 10 times. How do i do a loop for this? Im confused as in the sequence ive already replaced the nas with 0s and so i should remove this step from the loop too?
edit:After discussion with #ekstroem, we decided to change to code to introduce the boundaries
df<-read.csv("data.csv")
id<-df$Type
df2<- scale(df[,-1], center = TRUE, scale = TRUE)
df2[is.na(df2)] <- 0
df2[df2<=-3] = -3
df2[df2>=3] = 3
df3<-df2 #trying to loop again
df3<- scale(df3, center = TRUE, scale = TRUE)
df3[is.na(df3)] <- 0
df3[df3<=-3] = -3
df3[df3>=3] = 3
There are some boundary issues that are not fully specified in your code, but maybe the following can be used (using base R and not super efficient)
wins2 <- function(x, n=1) {
xx <- sort(unique(x))
x[x<=xx[n]] <- xx[n+1]
x[x>=xx[length(xx)-n]] <- xx[length(xx)-n]
x
}
This yields:
x <- 1:11
wins(x,1)
[1] 2 2 3 4 5 6 7 8 9 10 10
wins(x,3)
[1] 4 4 4 4 5 6 7 8 8 8 8
df<-data.frame(w=c("r","q"), x=c("a","b"))
y=c(1,2)
How do I combine df and y into a new data frame that has all combinations of rows from df with elements from y? In this example, the output should be
data.frame(w=c("r","r","q","q"), x=c("a","a","b","b"),y=c(1,2,1,2))
w x y
1 r a 1
2 r a 2
3 q b 1
4 q b 2
This should do what you're trying to do, and without too much work.
dl <- unclass(df)
dl$y <- y
merge(df, expand.grid(dl))
# w x y
# 1 q b 1
# 2 q b 2
# 3 r a 1
# 4 r a 2
data.frame(lapply(df, rep, each = length(y)), y = y)
this should work
library(combinat)
df<-data.frame(w=c("r","q"), x=c("a","b"))
y=c("one", "two") #for generality
indices <- permn(seq_along(y))
combined <- NULL
for(i in indices){
current <- cbind(df, y=y[unlist(i)])
if(is.null(combined)){
combined <- current
} else {
combined <- rbind(combined, current)
}
}
print(combined)
Here is the output:
w x y
1 r a one
2 q b two
3 r a two
4 q b one
... or to make it shorter (and less obvious):
combined <- do.call(rbind, lapply(indices, function(i){cbind(df, y=y[unlist(i)])}))
First, convert class of columns from factor to character:
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
Then, use expand.grid to get a index matrix for all combinations of rows of df and elements of y:
ind.mat = expand.grid(1:length(y), 1:nrow(df))
Finally, loop through the rows of ind.mat to get the result:
data.frame(t(apply(ind.mat, 1, function(x){c(as.character(df[x[2], ]), y[x[1]])})))
Given this data.frame
x y z
1 1 3 5
2 2 4 6
I'd like to add the value of columns x and z plus a coefficient 10, for every rows in dat.
The intended result is this
x y z result
1 1 3 5 16 #(1+5+10)
2 2 4 6 18 #(2+6+10)
But why this code doesn't produce the desired result?
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
Coeff <- 10
# Function
process.xz <- function(v1,v2,cf) {
return(v1+v2+cf)
}
# It breaks here
sm <- apply(dat[,c('x','z')], 1, process.xz(dat$x,dat$y,Coeff ))
# Later I'd do this:
# cbind(dat,sm);
I wouldn't use an apply here. Since the addition + operator is vectorized, you can get the sum using
> process.xz(dat$x, dat$z, Coeff)
[1] 16 18
To write this in your data.frame, don't use cbind, just assign it directly:
dat$result <- process.xz(dat$x, dat$z, Coeff)
The reason it fails is because apply doesn't work like that - you must pass the name of a function and any additional parameters. The rows of the data frame are then passed (as a single vector) as the first argument to the function named.
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
Coeff <- 10
# Function
process.xz <- function(x,cf) {
return(x[1]+x[2]+cf)
}
sm <- apply(dat[,c('x','z')], 1, process.xz,cf=Coeff)
I completely agree that there's no point in using apply here though - but it's good to understand anyway.
So I have a bunch of data frames in a list object. Frames are organised such as
ID Category Value
2323 Friend 23.40
3434 Foe -4.00
And I got them into a list by following this topic. I can also run simple functions on them as shown in this topic.
Now I am trying to run a conditional function with lapply, and I'm running into trouble. In some tables the 'ID' column has a different name (say, 'recnum'), and I need to tell lapply to go through each data frame, check if there is a column named 'recnum', and change its name to 'ID', as in
colnr <- which(names(x) == "recnum"
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
But I'm running into trouble with local scope and who knows what. Any ideas?
Use the rename function from plyr; it renames by name, not position:
x <- data.frame(ID = 1:2,z=1:2)
y <- data.frame('recnum' = 1:2,z=3:4)
.list <- list(x,y)
library(plyr)
lapply(.list, rename, replace = c('recnum' = 'ID'))
[[1]]
ID z
1 1 1
2 2 2
[[2]]
ID z
1 1 3
2 2 4
Your original code works fine:
foo <- function(x){
colnr <- which(names(x) == "recnum")
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
x
}
.list <- list(x,y)
lapply(.list, foo)
Not sure what your problem was.
If you look at the second part of mnel's answer, you can see that the function foo evaluates x as its last expression. Without that, if you try to change the names of the data.frames in your list directly from within the anonymous function passed to lapply, it will likely not work.
Just as an alternative, you could use gsub and avoid loading an additional package (although plyr is a nice package):
xx <- list(data.frame("recnum" = 1:3, "recnum2" = 1:3),
data.frame("ID" = 4:6, "hat" = 4:6))
lapply(xx, function(x){
names(x) <- gsub("^recnum$", "ID", names(x))
return(x)
})
# [[1]]
# ID recnum2
# 1 1 1
# 2 2 2
# 3 3 3
# [[2]]
# ID hat
# 1 4 4
# 2 5 5
# 3 6 6