I'm trying to create a scatter plot of two data, but I don't know how to specify my sorted result to the plot.
The procedure is like this:
Read "data_to_be_chosen.csv"
Read "data_to_be_plotted.csv"
Sort "data_to_be_chosen.csv" to find the two topmost values (and their names)
Find the corresponding names/columns in "data_to_be_plotted.csv"
Show the scatter plot of the two
I have a problem at the 5th step.
Let's assume that Column C and Column A have the two topmost values.
If I manually set the data to the plot, it would be:
plot(plotted$C, plotted$A)
However, I'd like to have it done automatically, depending on the sorted order.
I thought the following code would work:
plot(plotted[names(sort(chosen_list$chosen[1,], decreasing=TRUE)[1])],
plotted[names(sort(chosen_list$chosen[1,], decreasing=TRUE)[2])])
But, this gives me an error:
Error in stripchart.default(x1, ...) : invalid plotting method
I also tried these, but they don't work either:
plot(names(sort(chosen_list$chosen[1,], decreasing=TRUE)[1]),
names(sort(chosen_list$chosen[1,], decreasing=TRUE)[2]))
plot(colnames(sort(chosen_list$chosen[1,], decreasing=TRUE)[1]),
colnames(sort(chosen_list$chosen[1,], decreasing=TRUE)[2]))
Is there any way to set the sorted result to this plot?
I have no more ideas.
My R version is 4.1.2 (The latest version).
Here's my data:
data_to_be_chosen.csv
A,B,C
2.044281,0.757232,2.188617
data_to_be_plotted.csv
A,B,C
0.34503,-0.38781,-0.3506
0.351566,-0.3901,-0.35244
0.351817,-0.39144,-0.35435
0.351222,-0.39138,-0.35394
0.351222,-0.39113,-0.35366
0.350753,-0.39088,-0.35291
0.350628,-0.39041,-0.3531
0.349127,-0.3881,-0.3511
0.346125,-0.38675,-0.34969
0.346594,-0.38719,-0.34963
Here's my code:
plotted <- read.csv("data_to_be_plotted.csv")
chosen <- read.csv("data_to_be_chosen.csv")
chosen_list <- list(chosen=chosen)
sort(chosen_list$chosen[1,], decreasing=TRUE)[1:2]
names(sort(chosen_list$chosen[1,], decreasing=TRUE)[1:2])
plotted[names(sort(chosen_list$chosen[1,], decreasing=TRUE)[1:2])]
# Correlation can be calculated with the above data frame
cor(plotted[names(sort(chosen_list$chosen[1,], decreasing=TRUE)[1])],
plotted[names(sort(chosen_list$chosen[1,], decreasing=TRUE)[2])])
# What I want is this plot ... except manually specifying C or A
plot(plotted$C, plotted$A)
# The above data frame can NOT be used to plot / Issues "invalid plotting method"
plot(plotted[names(sort(chosen_list$chosen[1,], decreasing=TRUE)[1])],
plotted[names(sort(chosen_list$chosen[1,], decreasing=TRUE)[2])])
# I also tried, but no luck:
plot(names(sort(chosen_list$chosen[1,], decreasing=TRUE)[1]),
names(sort(chosen_list$chosen[1,], decreasing=TRUE)[2]))
plot(colnames(sort(chosen_list$chosen[1,], decreasing=TRUE)[1]),
colnames(sort(chosen_list$chosen[1,], decreasing=TRUE)[2]))
The problem, is just that subsetting one row gives a "data.frame" object instead of a "numeric" vector, what you probably expect. You can check that with class().
class(chosen_list$chosen[1,])
# [1] "data.frame"
The solution is to unlist it.
class(unlist(chosen_list$chosen[1,]))
# [1] "numeric"
In the following make use of creating objects instead of repeating code.
(x <- sort(unlist(chosen_list$chosen[1,]), decreasing=TRUE)[1:2])
# C A
# 2.188617 2.044281
(nx <- names(x))
# [1] "C" "A"
(p_df <- plotted[nx])
# C A
# 1 -0.35060 0.345030
# 2 -0.35244 0.351566
# 3 -0.35435 0.351817
# 4 -0.35394 0.351222
# 5 -0.35366 0.351222
# 6 -0.35291 0.350753
# 7 -0.35310 0.350628
# 8 -0.35110 0.349127
# 9 -0.34969 0.346125
# 10 -0.34963 0.346594
To get the correlation of two vectors of a data frame as a single value, we probably want [, j], since a data frame has two dimensions.
cor(p_df[, 1], p_df[, 2])
# [1] -0.9029339
Check (similar to above):
class(cor(p_df[1], p_df[2]))
# [1] "matrix" "array"
class(cor(p_df[, 1], p_df[, 2]))
# [1] "numeric"
I recommend you to update your knowledge about indexing using brackets.
Or just get the correlation matrix
cor(p_df)
# C A
# C 1.0000000 -0.9029339
# A -0.9029339 1.0000000
Finally use one of those:
plot(plotted[, nx[1]], plotted[, nx[2]])
plot(p_df[1:2])
plot(p_df) ## works in this special case, because we only have two columns
Data:
plotted <- structure(list(A = c(0.34503, 0.351566, 0.351817, 0.351222, 0.351222,
0.350753, 0.350628, 0.349127, 0.346125, 0.346594), B = c(-0.38781,
-0.3901, -0.39144, -0.39138, -0.39113, -0.39088, -0.39041, -0.3881,
-0.38675, -0.38719), C = c(-0.3506, -0.35244, -0.35435, -0.35394,
-0.35366, -0.35291, -0.3531, -0.3511, -0.34969, -0.34963)), class = "data.frame", row.names = c(NA,
-10L))
chosen <- structure(list(A = 2.044281, B = 0.757232, C = 2.188617), class = "data.frame", row.names = c(NA,
-1L))
chosen_list <- list(chosen=chosen)
You can coerce the single-row “data_to_be_chosen” df to a named vector using unlist; then sort, get the first two names, and use these to index into “data_to_be_plotted”:
chosen_vec <- unlist(chosen)
plot(plotted[names(sort(chosen_vec, decreasing = TRUE)[1:2])])
Related
HEADLINE: Is there a way to get R to recognize data.frame column names contained within lists in the same way that it can recognize free-floating vectors?
SETUP: Say I have a vector named varA:
(varA <- 1:6)
# [1] 1 2 3 4 5 6
To get the length of varA, I could do:
length(varA)
#[1] 6
and if the variable was contained within a larger list, the variable and its length could still be found by doing:
list <- list(vars = "varA")
length(get(list$vars[1]))
#[1] 6
PROBLEM:
This is not the case when I substitute the vector for a dataframe column and I don't know how to work around this:
rows <- 1:6
cols <- c("colA")
(df <- data.frame(matrix(NA,
nrow = length(rows),
ncol = length(cols),
dimnames = list(rows, cols))))
# colA
# 1 NA
# 2 NA
# 3 NA
# 4 NA
# 5 NA
# 6 NA
list <- list(vars = "varA",
cols = "df$colA")
length(get(list$vars[1]))
#[1] 6
length(get(list$cols[1]))
#Error in get(list$cols[1]) : object 'df$colA' not found
Though this contrived example seems inane, because I could always use the simple length(variable) approach, I'm actually interested in writing data from hundreds of variables varying in lengths onto respective dataframe columns, and so keeping them in a list that I could iterate through would be very helpful. I've tried everything I could think of, but it may be the case that it's just not possible in R, especially given that I cannot find any posts with solutions to the issue.
You could try:
> length(eval(parse(text = list$cols[1])))
[1] 6
Or:
list <- list(vars = "varA",
cols = "colA")
length(df[, list$cols[1]])
[1] 6
Or with regex:
list <- list(vars = "varA",
cols = "df$colA")
length(df[, sub(".*\\$", "", list$cols[1])])
[1] 6
If you are truly working with a data frame d, then nrow(d) is the length of all of the variables in d. There should be no reason to use length in this case.
If you are actually working with a list x containing variables of potentially different lengths, then you should use the [[ operator to extract those variables by name (see ?Extract):
x <- list(a = 1:10, b = rnorm(20L))
l <- list(vars = "a")
length(d[[l$vars[1L]]]) # 10
If you insist on using get (you shouldn't), then you need to supply a second argument telling it where to look for the variable (see ?get):
length(get(l$vars[1L], x)) # 10
I think I'm missing something super simple, but I seem to be unable to find a solution directly relating to what I need: I've got a data frame that has a letter as the row name and a two columns of numerical values. As part of a loop I'm running I create a new vector (from an index) that has both a letter and number (e.g. "f2") which I then need to be the name of a new row, then add two numbers next to it (based on some other section of code, but I'm fine with that). What I get instead is the name of the vector/index as the title of the row name, and I'm not sure if I'm missing a function of rbind or something else to make it easy.
Example code:
#Data frame and vector creation
row.names <- letters[1:5]
vector.1 <- c(1:5)
vector.2 <- c(2:6)
vector.3 <- letters[6:10]
data.frame <- data.frame(vector.1,vector.2)
rownames(data.frame) <- row.names
data.frame
index.vector <- "f2"
#what I want the data frame to look like with the new row
data.frame <- rbind(data.frame, "f2" = c(6,11))
data.frame
#what the data frame looks like when I attempt to use a vector as a row name
data.frame <- rbind(data.frame, index.vector = c(6,11))
data.frame
#"why" I can't just type "f" every time
index.vector2 = paste(index.vector, "2", sep="")
data.frame <- rbind(data.frame, index.vector2 = c(6,11))
data.frame
In my loop the "index.vector" is a random sample, hence where I can't just write the letter/number in as a row name, so need to be able to create the row name from a vector or from the index of the sample.
The loop runs and a random number of new rows will be created, so I can't specify what number the row is that needs a new name - unless there's a way to just do it for the newest or bottom row every time.
Any help would be appreciated!
Not elegant, but works:
new_row <- data.frame(setNames(list(6, 11), colnames(data.frame)), row.names = paste(index.vector, "2", sep=""))
data.frame <- rbind(data.frame, new_row)
data.frame
# vector.1 vector.2
# a 1 2
# b 2 3
# c 3 4
# d 4 5
# e 5 6
# f22 6 11
I Understood the problem , but not able to resolve the issue. Hence, suggesting an alternative way to achieve the same
Alternate solution: append your row labels after the data binding in your loop and then assign the row names to your dataframe at the end .
#Data frame and vector creation
row.names <- letters[1:5]
vector.1 <- c(1:5)
vector.2 <- c(2:6)
vector.3 <- letters[6:10]
data.frame <- data.frame(vector.1,vector.2)
#loop starts
index.vector <- "f2"
data.frame <- rbind(data.frame,c(6,11))
row.names<-append(row.names,index.vector)
#loop ends
rownames(data.frame) <- row.names
data.frame
output:
vector.1 vector.2
a 1 2
b 2 3
c 3 4
d 4 5
e 5 6
f2 6 11
Hope this would be helpful.
If you manipulate the data frame with rbind, then the newest elements will always be at the "bottom" of your data frame. Hence you could also set a single row name by
rownnames(data.frame)[nrow(data.frame)] = "new_name"
I want to have the intersection of all groups of a data table. So for the given data:
data.table(a=c(1,2,3, 2, 3,2), myGroup=c("x","x","x", "y", "z","z"))
I want to have the result:
2
I know that
Reduce(intersect, list(c(1,2,3), c(2), c(3,2)))
will give me the desired result but I didn't figure out how to produce a list of groups of a data.table query.
I would try using Reduce in the following way (assuming dt is your data)
Reduce(intersect, dt[, .(list(unique(a))), myGroup]$V1)
## [1] 2
Here's one approach.
nGroups <- length(unique(dt[,myGroup]))
dt[, if(length(unique(myGroup))==nGroups) .BY else NULL, by="a"][[1]]
# [1] 2
And here it is with some explanatory comments.
## Mark down the number of groups in your data set
nGroups <- length(unique(dt[,myGroup]))
## Then, use `by="a"` to examine in turn subsets formed by each value of "a".
## For subsets having the full complement of groups
## (i.e. those for which `length(unique(myGroup))==nGroups)`,
## return the value of "a" (stored in .BY).
## For the other subsets, return NULL.
dt[, if(length(unique(myGroup))==nGroups) .BY else NULL, by="a"][[1]]
# [1] 2
If that code and the comments aren't clear on their own, a quick glance at the following might help. Basically, the approach above is just looking for and reporting the value of a for those groups that return x,y,z in column V1 below.
dt[,list(list(unique(myGroup))), by="a"]
# a V1
# 1: 1 x
# 2: 2 x,y,z
# 3: 3 x,z
I defined the following function, which takes two DataFrames, DF_TAGS_LIST and DF_epc_list. Both data frames have a column with a different number of rows. I want to search each value DF_TAGS_LIST in DF_epc_list, and if found, store it in another dataframe
One example of DF_TAGS_LIST:
TAGS_LIST
3036029B539869100000000B
3036029B537663000000002A
3036029B5398694000000009
3036029B539869400000000C
3036029B5398690000000006
3036029B5398692000000007
And one example of DF_epc_list:
EPC
3036029B539869100000000B
3036029B537663000000002A
3036029B5398690000000006
3036029B5398692000000007
3036029B5398691000000006
3036029B5376630000000034
3036029B53986940000000WF
3036029B5398694000000454
3036029B5398690000000234
3036029B53986920000000FG
In this case, I would like one dataframe output that had the following values:
FOUND_TAGS
3036029B5398690000000006
3036029B5398692000000007
3036029B539869100000000B
3036029B537663000000002A
My function is:
FOUND_COMPARE_TAGS<-function(DF_TAGS_LIST, DF_epc_list){
DF_epc_list<-toString(DF_epc_list)
DF_TAGS_LIST<-toString(DF_TAGS_LIST)
DF_found_epc_tags <- data.frame(DF_found_epc_tags=intersect(DF_TAGS_LIST$DF_TAGS_LIST, DF_epc_list$DF_epc_list)); setdiff(union(DF_TAGS_LIST$DF_TAGS_LIST, DF_epc_list$DF_epc_list), DF_found_epc_tags$DF_found_epc_tags)
#DF_found_epc_tags <- data.frame(DF_found_epc_tags = DF_TAGS_LIST[unique(na.omit(match(DF_epc_list$DF_epc_list, DF_TAGS_LIST$DF_TAGS_LIST))),])
return(DF_found_epc_tags)
}
I now returns an empty data frame with two columns. Only recently programmed in R
You can use %in% or (as I mentioned in my comment) intersect:
DF_TAGS_LIST[DF_TAGS_LIST$TAGS_LIST %in% DF_epc_list$EPC, , drop = FALSE]
# TAGS_LIST
# 1 3036029B539869100000000B
# 2 3036029B537663000000002A
# 5 3036029B5398690000000006
# 6 3036029B5398692000000007
intersect(DF_TAGS_LIST$TAGS_LIST, DF_epc_list$EPC)
# [1] "3036029B539869100000000B" "3036029B537663000000002A"
# [3] "3036029B5398690000000006" "3036029B5398692000000007"
FOUND_TAGS <- rbind(TAGS_LIST, EPC)
FOUND_TAGS <- FOUND_TAGS[duplicated(FOUND_TAGS), , drop = FALSE]
Suppose that I have a vector x whose elements I want to use to extract columns from a matrix or data frame M.
If x[1] = "A", I cannot use M$x[1] to extract the column with header name A, because M$A is recognized while M$"A" is not. How can I remove the quotes so that M$x[1] is M$A rather than M$"A" in this instance?
Don't use $ in this case; use [ instead. Here's a minimal example (if I understand what you're trying to do).
mydf <- data.frame(A = 1:2, B = 3:4)
mydf
# A B
# 1 1 3
# 2 2 4
x <- c("A", "B")
x
# [1] "A" "B"
mydf[, x[1]] ## As a vector
# [1] 1 2
mydf[, x[1], drop = FALSE] ## As a single column `data.frame`
# A
# 1 1
# 2 2
I think you would find your answer in the R Inferno. Start around Circle 8: "Believing it does as intended", one of the "string not the name" sub-sections.... You might also find some explanation in the line The main difference is that $ does not allow computed indices, whereas [[ does. from the help page at ?Extract.
Note that this approach is taken because the question specified using the approach to extract columns from a matrix or data frame, in which case, the [row, column] mode of extraction is really the way to go anyway (and the $ approach would not work with a matrix).