I am working on a dataframe and trying to find the index of nth maximum value (n varies by a loop), however, in the columns I have tied values and the program throws an error. Below is a sample dataset. I am basically trying to generate a similar dataframe, but with only the index values of all the values in the column vector of the dataframe.
For the output DF, column 1 in the output DF will have index values of elements of Refer_1, so Output_DF[1,1] will have the index for highest value, while Output_DF[10,1] will have the index of lowest value. Below is the input DF.
Input
1 17
2 21
3 13
4 26
5 204
6 36
7 14
8 25
9 45
10 37
Output (index values)
5
9
10
6
4
8
2
1
7
3
I am currently using which, unlist and partial together to get the indexes, however, I am unable to rectify the error. Note that the ties can occur with any nth maximum value (not necessarily the column maxima).
which(Consolidated_data_new[,i]==unlist(sort(Consolidated_data_new[,i],partial=j)[j]))
Please note that I want the code to return only one value at a time, and handle the 2nd tied value in the next loop iteration.
Please help solve this.
Regards,
library(data.table)
DT<-structure(list(Refer_1 = c(11L, 15L, 7L, 19L, 104L, 24L, 11L,
22L, 39L, 19L), Refer_2 = c(17L, 21L, 13L, 25L, 204L, 36L, 14L,
25L, 45L, 37L)), .Names = c("Refer_1", "Refer_2"), row.names = c(NA,
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000130788>)
DT[,lapply(.SD, order,decreasing=TRUE)]
Refer_1 Refer_2
1: 5 5
2: 9 9
3: 6 10
4: 8 6
5: 4 4
6: 10 8
7: 2 2
8: 1 1
9: 7 7
10: 3 3
Your comments suggest you are working with a dataframe that has more than one column and that you want an output dataframe that has the results of order with decreasing=TRUE applied to every column:
> DF[2] <- sample(1:300, 10)
> DF[3] <- sample(1:300, 10)
> DF
Input V2 V3
1 17 210 3
2 21 72 4
3 13 263 1
4 26 249 6
5 204 223 10
6 36 83 7
7 14 107 2
8 25 295 5
9 45 198 9
10 37 112 8
> ordDF <- as.data.frame(lapply(DF, order, decreasing=TRUE))
> names(ordDF) <- paste0("res", 1:length(DF) )
> ordDF
res1 res2 res3
1 5 8 4
2 9 3 9
3 10 4 2
4 6 5 7
5 4 1 10
6 8 9 8
7 2 10 1
8 1 7 6
9 7 6 3
10 3 2 5
> dput(ordDF)
structure(list(res1 = c(5L, 9L, 10L, 6L, 4L, 8L, 2L, 1L, 7L,
3L), res2 = c(8L, 3L, 4L, 5L, 1L, 9L, 10L, 7L, 6L, 2L), res3 = c(4L,
9L, 2L, 7L, 10L, 8L, 1L, 6L, 3L, 5L)), .Names = c("res1", "res2",
"res3"), row.names = c(NA, -10L), class = "data.frame")
Related
I have a data looks like this but way much bigger
df<- structure(list(names = c("bests-1", "trible-1", "crazy-1", "cool-1",
"nonsense-1", "Mean-1", "Lose-1", "Trye-1", "Trified-1"), Col = c(1L,
2L, NA, 4L, 47L, 294L, 2L, 1L, 3L), col2 = c(2L, 4L, 5L, 7L,
9L, 9L, 0L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
as an example, I am trying to remove -1 from all strings of the first column
I can do this with
as.data.frame(str_remove_all(df$names, "-1"))
the problem is that it will remove all other columns as well.
I dont want to split the data and merge again because I am afraid I Make a mismatch
Is there anyway without interrupting, just getting raid of specific strings?
for instance the output should looks like this
names Col col2
bests 1 2
trible 2 4
crazy NA 5
cool 4 7
nonsense 47 9
Mean 294 9
Lose 2 0
Try 1 2
Trified 3 3
Using gsub, escape the special \\-, and $ for end of string.
transform(df, names=gsub('\\-1$', '', names))
# names Col col2
# 1 bests 1 2
# 2 trible 2 4
# 3 crazy NA 5
# 4 cool 4 7
# 5 nonsense 47 9
# 6 Mean 294 9
# 7 Lose 2 0
# 8 Trye 1 2
# 9 Trified 3 3
Data:
df <- structure(list(names = c("bests-1", "trible-1", "crazy-1", "cool-1",
"nonsense-1", "Mean-1", "Lose-1", "Trye-1", "Trified-1"), Col = c(1L,
2L, NA, 4L, 47L, 294L, 2L, 1L, 3L), col2 = c(2L, 4L, 5L, 7L,
9L, 9L, 0L, 2L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
Using stringr package,
df$names = str_remove_all(df$names, '-1')
names Col col2
1 bests 1 2
2 trible 2 4
3 crazy NA 5
4 cool 4 7
5 nonsense 47 9
6 Mean 294 9
7 Lose 2 0
8 Trye 1 2
9 Trified 3 3
We could use trimws from base R
df$names <- trimws(df$names, whitespace = "-\\d+")
-output
> df
names Col col2
1 bests 1 2
2 trible 2 4
3 crazy NA 5
4 cool 4 7
5 nonsense 47 9
6 Mean 294 9
7 Lose 2 0
8 Trye 1 2
9 Trified 3 3
I have a set of values
col1|col2|col3|col4
5 10 15 20
2 4 6 8
3 6 9 12
4 3 7 15
I would like to replace row 4 with a vector
c(4,8,12,16)
I would like to inset the vector in column 4 and replace the original values. I tried this script.
df[[4]]<- vector_name
I expect the result
col1|col2|col3|col4
5 10 15 20
2 4 6 8
3 6 9 12
4 8 12 16
We can use replace
replace(df1, cbind(nrow(df1), seq_along(df1)), v1)
data
df1 <- structure(list(col1 = c(5L, 2L, 3L, 4L), col2 = c(10L, 4L, 6L,
3L), col3 = c(15L, 6L, 9L, 7L), col4 = c(20L, 8L, 12L, 15L)),
class = "data.frame", row.names = c(NA,
-4L))
v1 <- c(4, 8, 12, 16)
This question already has answers here:
Complete dataframe with missing combinations of values
(2 answers)
Closed 2 years ago.
I have a data frame/tibble that includes a factor variable of bins. There are missing bins because the original data did not include an observation in those 5-year ranges. Is there a way to easily complete the series without having to deconstruct the interval?
Here's a sample df.
library(tibble)
df <- structure(list(bin = structure(c(1L, 3L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L), .Label = c("[1940,1945]",
"(1945,1950]", "(1950,1955]", "(1955,1960]", "(1960,1965]", "(1965,1970]",
"(1970,1975]", "(1975,1980]", "(1980,1985]", "(1985,1990]", "(1990,1995]",
"(1995,2000]", "(2000,2005]", "(2005,2010]", "(2010,2015]", "(2015,2020]",
"(2020,2025]"), class = "factor"), Values = c(2L, 4L, 14L, 11L,
8L, 26L, 30L, 87L, 107L, 290L, 526L, 299L, 166L, 502L, 8L)), row.names = c(NA,
-15L), class = c("tbl_df", "tbl", "data.frame"))
df
# A tibble: 15 x 2
bin Values
<fct> <int>
1 [1940,1945] 2
2 (1950,1955] 4
3 (1960,1965] 14
4 (1965,1970] 11
5 (1970,1975] 8
6 (1975,1980] 26
7 (1980,1985] 30
8 (1985,1990] 87
9 (1990,1995] 107
10 (1995,2000] 290
11 (2000,2005] 526
12 (2005,2010] 299
13 (2010,2015] 166
14 (2015,2020] 502
15 (2020,2025] 8
I would like to add the missing (1945,1950] and (1955,1960] bins.
bins already has the levels that you want. So you can use complete in your df as :
tidyr::complete(df, bin = levels(bin), fill = list(Values = 0))
# A tibble: 17 x 2
# bin Values
# <chr> <dbl>
# 1 (1945,1950] 0
# 2 (1950,1955] 4
# 3 (1955,1960] 0
# 4 (1960,1965] 14
# 5 (1965,1970] 11
# 6 (1970,1975] 8
# 7 (1975,1980] 26
# 8 (1980,1985] 30
# 9 (1985,1990] 87
#10 (1990,1995] 107
#11 (1995,2000] 290
#12 (2000,2005] 526
#13 (2005,2010] 299
#14 (2010,2015] 166
#15 (2015,2020] 502
#16 (2020,2025] 8
#17 [1940,1945] 2
df <- orig_df %>%
mutate(bin = cut_width(Year, width = 5, center = 2.5))
df2 <- df %>%
group_by(bin) %>%
summarize(Values = n()) %>%
ungroup()
tibble(bin = levels(df$bin)) %>%
left_join(df2) %>%
replace_na(list(Values = 0))
I have the following data frame:
Step 1 2 3
1 5 10 6
2 5 11 5
3 5 13 9
4 5 15 10
5 13 18 10
6 15 20 10
7 17 23 10
8 19 25 10
9 21 27 13
10 23 30 7
I would like to retrieve the columns that satisfy one of the following conditions: if step 1 = step 4 or step 4 = step 8. In this case, column 1 and 3 should be retrieved. Column 1 because the value at Step 1 = value at step 4 (i.e., 5), and for column 3, the value at step 4 = value at step 8 (i.e., 10).
I don't know how to do that in R. Can someone help me please?
You can get the column indices by the following code:
df[1, -1] == df[4, -1] | df[4, -1] == df[8, -1]
# X1 X2 X3
# 1 TRUE FALSE TRUE
# data
df <- structure(list(Step = 1:10, X1 = c(5L, 5L, 5L, 5L, 13L, 15L,
17L, 19L, 21L, 23L), X2 = c(10L, 11L, 13L, 15L, 18L, 20L, 23L,
25L, 27L, 30L), X3 = c(6L, 5L, 9L, 10L, 10L, 10L, 10L, 10L, 13L,
7L)), class = "data.frame", row.names = c(NA, -10L))
I struggle a bit with following problem:
I have table A (below) and I would like to merge/reduce/covert intervals defined in there to individual positions like in table B by calculating sum (values in table A) of overlapping positions in intervals (start and end of each interval in table A) if any or just give value if no overlapping positions or 0 if no interval for that position. I would prefer solution for that problem in R. I would really appreciate your help.
Table A
ID Start End Value
1 1 5 9
2 3 7 5
3 5 9 13
4 11 15 1
5 12 16 18
6 14 18 21
Convert to this Table B
Position Value
1 9
2 9
3 14
4 14
5 27
6 18
7 18
8 13
9 13
10 0
11 15
12 33
13 33
14 54
15 54
16 39
17 21
18 21
Not a very straight forward way but it gets the job done:
df<-structure(list(ID = 1:6, Start = c(1L, 3L, 5L, 11L, 12L, 14L),
End = c(5L, 7L, 9L, 15L, 16L, 18L),
Value = c(9L, 5L, 13L, 1L, 18L, 21L)), .Names = c("ID", "Start", "End", "Value"),
class = "data.frame", row.names = c(NA,
-6L))
# create list matrix for each grouping
s1<-lapply(1:6, function(i) {matrix(c(df[i,2]:df[i,3], rep(df[i,4], (df[i,3]-df[i,2]+1))), nrow = (df[i,3]-df[i,2])+1)})
s2<-as.data.frame(do.call(rbind, s1))
#sum all of the like positions
library(dplyr)
wgaps<-summarise(group_by(s2, V1), sum(V2))
#create sequence with no gaps in it and match
nogaps<-data.frame(Position=seq(min(wgaps$V1), max(wgaps$V1)))
nogaps<-left_join(nogaps, wgaps, by=c("Position"= "V1"))
names(nogaps)<-c("Position", "value") #rename
nogaps$value[is.na(nogaps$value)]<-0 #remove 0