I am trying to plot all columns of a data frame based on a column in the data frame. The df basically looks like this:
iters a b c
1 1 0.92 0.83 0.97
2 2 0.12 0.93 0.76
3 3 0.55 0.41 0.87
4 4 0.43 0.55 0.49
So far I have tried this code:
df <- melt(acc_s1, id.vars = 'iter', variable.name = 'letter')
ggplot(df, aes(iter,value)) + geom_line(aes(colour = letter))
Unfortunately, my results looks like this (don't mind the slightly different names):
Any ideas, where this comes from?
Thanks
Related
Imagine there are 4 cards on the desk and there are several rows of them (e.g., 5 rows in the demo). The value of each card is already listed in the demo data frame. However, the exact position of the card is indexed by the pos columns, see the demo data I generated below.
To achieve this, I swap the cards with the [] function across the rows to switch the cards' values back to their original position. The following code already fulfills such a purpose. To avoid explicit usage of the loop, I wonder whether I can achieve a similar effect if I use the vectorization function with packages from tidyverse family, e.g. pmap or related function within the package purrr?
# 1. data generation ------------------------------------------------------
rm(list=ls())
vect<-matrix(round(runif(20),2),nrow=5)
colnames(vect)<-paste0('card',1:4)
order<-rbind(c(2,3,4,1),c(3,4,1,2),c(1,2,3,4),c(4,3,2,1),c(3,4,2,1))
colnames(order)=paste0('pos',1:4)
dat<-data.frame(vect,order,stringsAsFactors = F)
# 2. data swap ------------------------------------------------------------
for (i in 1:dim(dat)[1]){
orders=dat[i,paste0('pos',1:4)]
card=dat[i,paste0('card',1:4)]
vec<-card[order(unlist(orders))]
names(vec)=paste0('deck',1:4)
dat[i,paste0('deck',1:4)]<-vec
}
dat
You could use pmap_dfr :
card_cols <- grep('card', names(dat))
pos_cols <- grep('pos', names(dat))
dat[paste0('deck', seq_along(card_cols))] <- purrr::pmap_dfr(dat, ~{
x <- c(...)
as.data.frame(t(unname(x[card_cols][order(x[pos_cols])])))
})
dat
# card1 card2 card3 card4 pos1 pos2 pos3 pos4 deck1 deck2 deck3 deck4
#1 0.05 0.07 0.16 0.86 2 3 4 1 0.86 0.05 0.07 0.16
#2 0.20 0.98 0.79 0.72 3 4 1 2 0.79 0.72 0.20 0.98
#3 0.50 0.79 0.72 0.10 1 2 3 4 0.50 0.79 0.72 0.10
#4 0.03 0.98 0.48 0.06 4 3 2 1 0.06 0.48 0.98 0.03
#5 0.41 0.72 0.91 0.84 3 4 2 1 0.84 0.91 0.41 0.72
One thing to note here is to make sure that the output from pmap function does not have original names of the columns. If they have the original names, it would reshuffle the columns according to the names and output would not be in correct order. I use unname here to remove the names.
This is an example of the variables that I would like to visualize
id post.test.score pre.test.score messages forum.posts av.assignment.score
1 0.37 0.48 68 7 0.19
2 0.52 0.37 83 22 0.28
3 0.42 0.37 81 7 0.25
4 0.56 0.34 94 14 0.27
5 0.25 0.39 42 11 0.07
I've copied the data from your post above so you can skip the variable assignment
library("tidyverse")
df <- read.table(file = "clipboard", header = T) %>%
as_tibble()
You need to modify your data structure slightly before you pass it to ggplot. Get each of your test names into a single variable with tidyr::gather. Then pipe to ggplot:
df %>%
gather(test, value, -id) %>%
ggplot(aes(x = value)) +
geom_histogram() +
facet_grid(~test)
how can I compute the mean R, R1, R2, R3 values from the rows sharing the same lon,lat field? I'm sure this questions exists multiple times but I could not easily find it.
lon lat length depth R R1 R2 R3
1 147.5348 -35.32395 13709 1 0.67 0.80 0.84 0.83
2 147.5348 -35.32395 13709 2 0.47 0.48 0.56 0.54
3 147.5348 -35.32395 13709 3 0.43 0.29 0.36 0.34
4 147.4290 -35.27202 12652 1 0.46 0.61 0.60 0.58
5 147.4290 -35.27202 12652 2 0.73 0.96 0.95 0.95
6 147.4290 -35.27202 12652 3 0.77 0.92 0.92 0.91
I'd recommend using the split-apply-combine strategy, where you're splitting by BOTH lon and lat, applying mean to each group, then recombining into a single data frame.
I'd recommend using dplyr:
library(dplyr)
mydata %>%
group_by(lon, lat) %>%
summarize(
mean_r = mean(R)
, mean_r1 = mean(R1)
, mean_r2 = mean(R2)
, mean_r3 = mean(R3)
)
I need to randomly sample a dataset which is arranged in long format. In my dataset, each subject has 4 observations, so if I randomly sample a row I am randomly losing one or more observation per subject.
This is a simulated data for illustration purposes, my data is much bigger.
sub sex group dv1 dv2
P1 m A 0.66 0.94
P1 m B 0.98 0.26
P1 m C 0.02 0.03
P1 m D 0.60 0.30
P2 m A 0.92 0.99
P2 m B 0.82 0.09
P2 m C 0.44 0.67
P2 m D 0.53 0.80
P3 f A 0.29 0.22
P3 f B 0.46 0.20
P3 f C 0.37 0.77
P3 f D 0.76 0.54
P4 m A 0.28 0.99
P4 m B 0.16 0.57
P4 m C 0.46 0.75
P4 m D 0.28 0.21
In this example, I need to randomly select 2 males. For example, I tried using dplyr packaged (see below), but if I give a sample of 2, it just gives me 2 rows for sex="m" and 2 for sex="f". In total, 4 randomly chosen rows. What I need it to do is to give me 8 rows where 4 come from one male and 4 from another. Changing grouping parameter to sub doesn't work, as it barks that there are only 2 levels in the group (actually, it would work in this toy example as there are 4 levels for each sub, but note that I am choosing like 50 samples from a bigger dataset). Also, it would just give me 2 random rows for each sub, which is not what I need.
library(dplyr)
subset <- data %>%
group_by(sex) %>%
sample_n(2)
Please do not suggest to reshape the date to wide format and sample it there, as I know that I can do that. I am sure there must be a way to sample in long format.
I would sample from the patient names and then filter by those sampled names:
Look at all males
male_subset <- data %>% filter(sex == "m")
Look for unique male ID
male_IDs <- unique(male_subset$sub)
Sample from the unique IDs
sampled_IDs <- sample(male_IDs, 2)
Now you subset your data based on these sampled IDs:
data %>% filter(sub %in% sampled_IDs)
This should return all four rows for each of the 2 sampled individuals.
I'm not sure if I've quite understood what you want. Would this do it?
data %>% filter(sex == 'm') %>% filter(sub %in% sample(paste0('P',1:4), 2))
You'd have to change what's in the paste0 function for your real data, of course.
In base R,
set.seed(1)
subset<- sample(data[data$sex == "m",]$sub,2)
data_subset<-data[data$sub %in% subset,]
nrow(data_subset)
# [1] 8
Works, but not flashy.
I have a data frame in the following form:
Data <- data.frame(X = sample(1:10), Y = sample(1:10))
I would like to color the dots obtained with
plot(Data$X,Data$Y)
using the values from another data frame:
X1 X2 X3 X4 X5
1 0.57 0.40 0.64 0.07 0.57
2 0.40 0.45 0.49 0.21 0.39
3 0.72 0.65 0.74 0.61 0.71
4 0.73 0.54 0.76 0.39 0.64
5 0.88 0.81 0.89 0.75 0.64
6 0.70 0.65 0.78 0.51 0.66
7 0.84 0.91 0.89 0.86 0.83
8 -0.07 0.39 -0.02 0.12 -0.01
9 0.82 0.83 0.84 0.81 0.79
10 0.82 0.55 0.84 0.51 0.59
So to have five different graphs using the five columns from the second data frame to color the dots. I manage to do this by looking here (Colour points in a plot differently depending on a vector of values), but I'm not able to figure out how to set the same color scale for all the five different plots.
The columns in the second data frame could have different minimum and maximum so If I generate the colors using the cut function on the first column this will generate factors, and later colors, that are relative to this column.
Hope this is clear,
Thanks.
You need your color ramp to include all values so you likely want to get them in the same vector. I would probably melt the data, then make the color ramp, then use the facet function in ggplot to get multiple plots. Alternately if you don't want to use ggplot you could cast the data back to multiple columns with 5 extra columns for your colors.
require(reshape2)
require(ggplot2)
Data.m <- melt(Data,id=Y)
rbPal <- colorRampPalette(c('red','blue'))
Data.m$Col <- rbPal(10)[as.numeric(cut(Data.m$value,breaks = 10))]
ggplot(Data.m, aes(value, Y,col=Col)) +
geom_point() +
facet_grid(variable~.)
Your Data object has two variables, X and Y, but then you talk about making 5 graphs, so that part is a little unclear, but I think the melt function will help getting a comprehensive color ramp and the facet_grid function may make it easier to do 5 graphs at once if that is what you want.