Change dataframe values R using different column name provided? - r

I have the following data frame:
Column1 Default_Val
1 A 2
2 B 2
3 C 2
4 D 2
5 E 2
...
colnames: "Column1" "Default_Val"
rownames: "1" "2" "3" "4" "5"
This data frame is part of my function and this function changes the default values according to some if's.
I want to generalize the assignment process because I want to support different column names of this data frame.
Please advise how can I change the default value without being dependent of column names?
Here is what I did so far:
df[Column1 == "A","Default_Val"]
[1] 2
df[Column1 == "A","Default_Val"] = 2
df[Column1 == "A","Default_Val"]
[1] 1
I want something generalized like:
t <- colnames(df)
df[t[1] == "A", t[2]] = 7
For some reason it doesn't work (each time this happens I love Python more :)).
Please advise.

I think it must be straightforward. Please check if this solves your problem.
> df
Column1 Default_val
1 A 1
2 B 3
3 A 4
4 C 1
5 D 4
> df[2][df[1] == 'A'] = 3
> df
Column1 Default_val
1 A 3
2 B 3
3 A 3
4 C 1
5 D 4

Related

Subset a dataset to leave the largest 2 values

I have a data set:
col1 col2
A 3
A 3
B 2
C 1
B 2
A 3
D 5
B 2
D 5
B 2
F 0
F 0
A 3
C 1
C 1
How can I subset it so as to "leave" the top 2 col1 values. So my output is this:
col1 col2
A 3
A 3
A 3
D 5
A 3
I have viewed this question, but it didn't answer my question.
Try this, but not sure why you only have one D:
#Code
newdf <- df[df$col2 %in% sort(unique(df$col2),decreasing = T)[1:2],]
I assume that your data is in a data.frame.
First of all, you need to get the top 2 values of col2. Therefore you can take the unique values of it, sort them in decreasing order, and take the first two elements:
col2Values <- unique(df$col2)
top2Elements <- sort(col2Values,decreasing = TRUE)[c(1,2)]
Now you know the top2 values, so you just need to check where these values appear in col2. This can be done via:
df[df$col2 %in% top2Elements,]
Update: Now it should work, I had some typos in there.

R create group variable based on row order and condition

I have a dataframe containing multiple groups that are not explicitly stated. Instead, new group always start when type == 1, and is the same for following rows, containing type == 2. The number of rows per group can vary.
How can I explicitly create new variable based on order of another column? The groups, of course, should be exclusive.
My data:
df <- data.frame(type = c(1,2,2,1,2,1,2,2,2,1),
stand = 1:10)
Expected output with new group myGroup:
type stand myGroup
1 1 1 a
2 2 2 a
3 2 3 a
4 1 4 b
5 2 5 b
6 1 6 c
7 2 7 c
8 2 8 c
9 2 9 c
10 1 10 d
One option could be:
with(df, letters[cumsum(type == 1)])
[1] "a" "a" "a" "b" "b" "c" "c" "c" "c" "d"
Here is another option using rep() + diff(), but not as simple as the approach by #tmfmnk
idx <- which(df$type==1)
v <- diff(which(df$type==1))
df$myGroup <- rep(letters[seq(idx)],c(v <- diff(which(df$type==1)),nrow(df)-sum(v)))
such that
> df
type stand myGroup
1 1 1 a
2 2 2 a
3 2 3 a
4 1 4 b
5 2 5 b
6 1 6 c
7 2 7 c
8 2 8 c
9 2 9 c
10 1 10 d

Sort data frame by column of numbers

I am trying to sort a data frame by a column of numbers and I get an alphanumeric sorting of the digits instead. If the data frame is converted to a matrix, the sorting works.
df[order(as.numeric(df[,2])),]
V1 V2
1 a 1
3 c 10
2 b 2
4 d 3
> m <- as.matrix(df)
> m[order(as.numeric(m[,2])),]
V1 V2
[1,] "a" "1"
[2,] "b" "2"
[3,] "d" "3"
[4,] "c" "10"
V1 <- letters[1:4]
V2 <- as.character(c(1,10,2,3))
df <- data.frame(V1,V2, stringsAsFactors=FALSE)
df[order(as.numeric(df[,2])),]
gives
V1 V2
1 a 1
3 c 2
4 d 3
2 b 10
But
V1 <- letters[1:4]
V2 <- as.character(c(1,10,2,3))
df <- data.frame(V1,V2)
df[order(as.numeric(df[,2])),]
gives
V1 V2
1 a 1
2 b 10
3 c 2
4 d 3
which is due to factors.
thanks to the commentators akrun and Imo. Inspect each of the two dfs with str(df).
Also, there is more detail given the factor() function help menu. Scroll down to 'Warning' for more details of the issue at hand.
Could you be a little more specific about what's your intial dataframe ?
Because by running this code :
df<-data.frame(c("a","b","c","d"),c(1,2,10,3))
colnames(df)<-c("V1","V2")
#print(df)
df.order<-df[order(as.numeric(df[,2])),]
print(df.order)
I get the right answer :
V1 V2
1 a 1
2 b 2
4 d 3
3 c 10
Edit:
The column values might be being treated as factors.
Try forcing to character and then integer.
Example copy and pasted from console:
> Foo <- data.frame('ABC' = c('a','b','c','d'),'123' = c('1','2','10','3'))
> Foo[order(as.integer(as.character(Foo[,2]))),]
ABC X123
1 a 1
2 b 2
4 d 3
3 c 10

access data frame column using variable

Consider the following code
a = "col1"
b = "col2"
d = data.frame(a=c(1,2,3),b=c(4,5,6))
This code produces the following data frame
a b
1 1 4
2 2 5
3 3 6
However the desired data frame is
col1 col2
1 1 4
2 2 5
3 3 6
Further, I'd like to be able to do something like d$a which would then grab d$col1 since a = "col1"
How can I tell R that "a" is a variable and not a name of a column?
After creating your data frame, you need to use ?colnames. For example, you would have:
d = data.frame(a=c(1,2,3), b=c(4,5,6))
colnames(d) <- c("col1", "col2")
You can also name your variables when you create the data frame. For example:
d = data.frame(col1=c(1,2,3), col2=c(4,5,6))
Further, if you have the names of columns stored in variables, as in
a <- "col1"
you can't use $ to select a column via d$a. R will look for a column whose name is a. Instead, you can do either d[[a]] or d[,a].
You can do it this way
a = "col1"
b = "col2"
d = data.frame(a=c(1,2,3),b=c(4,5,6))
>d
a b
1 1 4
2 2 5
3 3 6
#Renaming the columns
names(d) <- c(a,b)
> d
col1 col2
1 1 4
2 2 5
3 3 6
#Calling by names
d[,a]

Check for unique elements

just a simple question.
I have a data frame(only one vector is shown) that looks like:
cln1
A
b
A
A
c
d
A
....
I would like the following output:
cln1
b
c
d
In other words I would like to remove all items that are replicated. The functions "unique" as well as "duplicated" return the output including the replicated element represented one time. I would like to remove it definitively.
You can use setdiff for that :
R> v <- c(1,1,2,2,3,4,5)
R> setdiff(v, v[duplicated(v)])
[1] 3 4 5
You could use count from the plyr package to count the occurences of an item, and delete all who occur more than once.
library(plyr)
l = c(1,2,3,3,4,5,6,6,7)
count_l = count(l)
x freq
1 1 1
2 2 1
3 3 2
4 4 1
5 5 1
6 6 2
7 7 1
l[!l %in% with(count_l, x[freq > 1])]
[1] 1 2 4 5 7
Note the !, which means NOT. You of course put this in a oneliner:
l[!l %in% with(count(l), x[freq > 1])]
Another way using table:
With #juba's data:
as.numeric(names(which(table(v) == 1)))
# [1] 3 4 5
For OP's data, since its a character output, as.numeric is not required.
names(which(table(v) == 1))
# [1] "b" "c" "d"

Resources