importing data from excel to R via psych::read.clipboard - r

I am trying to streamline a process by which I select and copy two columns from an excel worksheet and import them into R, where I further subset them. Here is my issue:
The excel data has multiple sets of data in the same column. So for example: column 1 is [V,1,2,3,4,V,1,2,3,4] and column two is [A,2,4,6,10,A,3,6,9,12] where V and A are the column headers. I tried copying the two relevant columns, then running the following code in R:
testing<-read.clipboard(header=TRUE, sep=" ")
testinga<-testing[1:4,]
the resulting table looks fine, but when plotted in ggplot
ggplot(testing, aes(V,A))+geom_point()
resulting graphs orders my data points by the first number (i.e. the 10 is plotted as a 1)
This is NOT an issue if I simply copy the first data set and import it using read.clipboard
What is going on here, and how do I get around it?
Edit:
# from dput()
testing <- structure(list(V = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), .Label = c("1", "2", "3", "4", "V"), class = "factor"), A = structure(c(3L, 5L, 6L, 1L, 8L, 4L, 6L, 7L, 2L), .Label = c("10", "12", "2", "3", "4", "6", "9", "A"), class = "factor")), .Names = c("V", "A"), class = "data.frame", row.names = c(NA, -9L))

Your problem is that the big data.frame's columns get converted to factors (not numerics) if there are things other than numbers in them, like more column names. You just need to convert back to numeric.
testinga <- testing[1:4, ]
testinga <- sapply(testinga, FUN = function(x){as.numeric(as.character(x))})
Then you should be able to plot just fine.

Related

Bars of a ggplot disappear when geom_errorbar is added

I'm building a bar plot with ggplot2 and the code works fine until I add error bars with geom_errorbar. My dataset consists of two factors [Sex(two levels) and Time(seven levels)] and several dependent continuous variables. ABA.mean is the mean ABA.se is the standard error.
Data structure
Here's the code for the plot (I made sure Sex and Time were factors).
p<- ggplot(data=sex.data1, aes(x=Time, y=ABA.mean, ymin=ABA.mean-ABA.se, ymax=ABA.mean+ABA.se))
p1<-p + geom_bar(aes(fill=Sex), stat="identity",
position="dodge")+ geom_errorbar(aes(color=Sex), position="dodge")
And here's the plot:
output of bar plot with error bars:
Here's also some data (not showing all data to facilitate comprehension)
dput(sex.data1)
structure(list(Sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", "male"), class = "factor"),Time = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,4L, 5L, 6L, 7L), .Label = c("1", "2", "3", "4", "5", "6",
"7"), class = "factor"), RWC.mean = c(46.87233333, 56.971,
5.884, 6.562666667, 10.30466667, 80.95266667, 79.22333333,
72.04366667, 80.87166667, 77.15266667, 6.962, 8.733, 86.051,
84.586), ABA.mean = c(9.532666667, 322.969, 28.4, 30.15066667,
45.529, 46.298, 18.60933333, 13.838, 46.31466667, 202.3803333,
10.5005, 16.637, 17.64466667, 6.595333333),RWC.se = c(6.428766324,19.39234553, 2.152576673, 0.328793924, 1.972588936, 1.542849888,4.434089322, 8.443211501, 3.087210679, 5.593021853, 0.574815043,NA, 9.684611522, 1.546559515), ABA.se = c(2.654699878, 89.919,11.59730729, 10.52325178, 24.42691451, 29.76969347, 8.154232119,4.295445767, 21.57449026, 132.4679665, 1.1755, NA, 9.29181176,3.315272605)
However, when I compute the plot without the geom_errobar, the bars appear.
p<-ggplot(sex.data1, aes(x=Time, y=ABA.mean, fill=Sex))
p+geom_bar(stat="identity", position=position_dodge())
I'm guessing there's something wrong with the code of geom_errorbar.
Many thanks in advance!
Your plotting code looks fine to me, but your dput formatting was a bit strange. I had fix the syntax in the data, so it seems like that might have been the issue (ie format/syntax of your input data). The code you posted produces the plot just fine:
library(ggplot2)
ggplot(data=sex.data1, aes(x=Time, y=ABA.mean, ymin=ABA.mean-ABA.se, ymax=ABA.mean+ABA.se)) +
geom_bar(aes(fill=Sex), stat="identity", position="dodge") +
geom_errorbar(aes(color=Sex), position="dodge")
data:
sex.data1 <- data.frame(
Sex = c("F", "F", "F", "F", "F", "F", "F", "M", "M", 'M', "M", "M", 'M', "M"),
Time = c("1", "2", "3", "4", "5", "6", "7"),
RWC.mean = c(46.87233333, 56.971, 5.884, 6.562666667, 10.30466667, 80.95266667, 79.22333333, 72.04366667, 80.87166667, 77.15266667, 6.962, 8.733, 86.051, 84.586),
ABA.mean = c(9.532666667, 322.969, 28.4, 30.15066667, 45.529, 46.298, 18.60933333, 13.838, 46.31466667, 202.3803333, 10.5005, 16.637, 17.64466667, 6.595333333),
RWC.se = c(6.428766324,19.39234553, 2.152576673, 0.328793924, 1.972588936, 1.542849888,4.434089322, 8.443211501, 3.087210679, 5.593021853, 0.574815043,NA, 9.684611522, 1.546559515),
ABA.se = c(2.654699878, 89.919,11.59730729, 10.52325178, 24.42691451, 29.76969347, 8.154232119,4.295445767, 21.57449026, 132.4679665, 1.1755, NA, 9.29181176,3.315272605))

Make output of two rows into columns R

I am currently working with behavioural data in R from video analyses in BORIS. Every observation is 15 seconds and during this observation I noted the subject, its behaviour but also some background information such as the date, time of day, temperature, etc. However, the program has put this background information under the column "Behaviour" (so one of the behaviours is now "date") and its output under the column "Modifier" (which now says "15-10-2020" for example).
What I want is make more columns of date, time etc (from the column "Behaviour") and put its output (from the column "Modifier") in these columns, so that every behaviour has a subject, date, time, temperature, and so forth. I have however no idea how to do this.
I thought about using the function aggregate, but this gives me lots of extra rows with mainly NA's. I also looked into the package "tibble" but can't really make that work either.
Any suggestions would be greatly appreciated!
Some example rows (from dput()):
structure(list(Subject = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 7L), .Label = c("fallow deer female", "fallow deer female + calf",
"red deer female + calf", "roe deer male", "wild boar + young",
"wild boar male", "wild boar unknown sex"), class = "factor"),
Behavior = structure(c(1L, 2L, 8L, 7L, 12L, 3L, 5L, 10L,
6L, 4L), .Label = c("auditory vigilant", "date", "day/night",
"foraging", "nr. of individuals", "running", "temperature",
"time of day", "unknown behaviour", "walking", "walking while vigilant",
"weather"), class = "factor"), Behavioral.category = structure(c(4L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 3L), .Label = c("", "Background information",
"Non-vigilant", "Vigilant"), class = "factor"), Modifiers = structure(c(1L,
4L, 21L, 27L, 35L, 36L, 32L, 1L, 1L, 1L), .Label = c("",
"0346", "0347", "07172020", "07182020", "07212020", "07242020",
"07262020", "07272020", "08032020", "08052020", "1", "12",
"1307", "1327", "1342", "1343", "1430", "1528", "16", "1604",
"17", "1744", "21", "2119", "2120", "22", "23", "25", "26",
"3", "4", "7", "Clear", "Cloudy", "Day", "Night"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
The output that I'd like to have would give as column names: Subject; Behavior; Date; Time of Day; Temperature. The modifier output would be the values of the columns "Date", "Time of Day", "Temperature". When this works, I could delete the column Modifiers (since all its values are already in assigned columns).
Split up the dataframe in actual behaviours and background information. Perform this code on the background information:
tidyr::pivot_wider(your_data, names_from = Behavior, values_from = Modifiers)
Merge the dataframes!

How can I create a function that creates a matrix using values from my dataset in R?

I have a dataset containing 120 observations of 6 variables. Five variables are factors, 1 variable is my target variable.
I need to write a function that will creates a matrix (for each factor) which contains each level of the factor as columns, and the maximum value of the target variable as first row, and the minimum value of the target variable as the second row.
I know how to create a matrix, however I am lost when I need to make it through a function.
Is there someone who can help?
Here is a simple example of what I want to reach with a fictive easy dataset.
Example
As you can see, for each level of the factor (on the picture factor 1), I want to indicate the highest value of the target, and the lowest value of the target.
Here is a subset of my own data:
> dput(data_plu[1:4, ])
structure(list(NaNO3 = structure(c(2L, 8L, 8L, 3L), .Label = c("10",
"14", "18", "2", "22", "26", "30", "6"), class = "factor"),
CaCl2 = structure(c(4L,
8L, 8L, 8L), .Label = c("0.1", "0.28", "0.46", "0.64", "0.82",
"1", "1.19", "1.37"), class = "factor"), PO4 = structure(c(1L,
5L, 5L, 6L), .Label = c("0.1", "0.8", "1.5", "2.2", "2.9", "3.6",
"4.3", "5"), class = "factor"), NH4Cl = structure(c(5L, 3L, 3L,
6L), .Label = c("0.5", "10.86", "12.93", "15", "2.58", "4.65",
"6.72", "8.79"), class = "factor"), MgSO4 = structure(c(4L, 7L,
1L, 7L), .Label = c("0.21", "0.35", "0.5", "0.64", "0.79", "0.93",
"1.08", "1.22"), class = "factor"), DC = c(15000L, 707500L, 720000L,
872500L)), row.names = c(NA, 4L), class = "data.frame")
You may be able to modify this to meet your needs. I wrote a function to handle one factor and then use lapply to handle them all. I've called your sample data dta:
stats <- function(x, y) {
minmax <- aggregate(y, list(x), range)
cols <- minmax[, 1]
result <- as.matrix(t(minmax[, -1]))
dimnames(result) <- list(c("Min", "Max"), Levels=as.character(cols))
return(result)
}
out <- lapply(dta[, -6], function(x) stats(x, dta$DC))
head(out, 1)
# $NaNO3
# Levels
# 14 18 6
# Min 15000 872500 707500
# Max 15000 872500 720000

How to replicate random effects in lme4 from SAS?

I am wishing to run a linear mixed model on a dependent variable DV that is collected under two different Condition at three different Timepoint. The data is structured as follows:
## dput(head(RawData,5))
structure(list(Participant = structure(c(2L, 2L, 2L, 2L, 4L),
.Label = c("Jessie", "James", "Gus", "Hudson", "Flossy",
"Bobby", "Thomas", "Alfie", "Charles", "Will", "Mat", "Paul", "Tim",
"John", "Toby", "Blair"), class = "factor"),
xVarCondition = c(1, 1, 0, 0, 1),
Measure = structure(c(1L, 2L, 3L, 4L, 1L),
.Label = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12"), class = "factor"),
Sample = structure(c(1L, 2L, 1L, 2L, 1L),
.Label = c("1", "2"), class = "factor"),
Condition = structure(c(2L, 2L, 1L, 1L, 2L),
.Label = c("AM", "PM"), class = "factor"),
Timepoint = structure(c(2L, 2L, 2L, 2L, 1L),
.Label = c("Baseline", "Mid", "Post"), class = "factor"),
DV = c(83.6381348645853, 86.9813802115179, 69.2691666620429,
71.3949807856125, 87.8931998204771)),
.Names = c("Participant", "xVarCondition", "Measure",
"Sample", "Condition", "Timepoint", "DV"),
row.names = c(NA, 5L), class = "data.frame")
Each Participant performs two trials per Condition across three Timepoints as depicted by Measure; however, there are missing data so not necessarily 12 levels per participant. The column xVarCondition is simply a dummy variable that includes a 1 for each entry of AM in Condition. The column Sample refers to the 2 trials for each Condition at each Timepoint.
I am an R user but the statistician is a SAS user who believes the code for the model should be:
proc mixed data=RawData covtest cl alpha=α
class Participant Condition Timepoint Measure Sample;
model &dep=Condition Timepoint/s ddfm=sat outp=pred residual noint;
random int xVarCondition xVarCondition*TimePoint*Sample
TimePoint/subject=Participant s;
The above SAS code gives sensible answers and is working perfectly. We believe the resulting lme4 syntax for the above model to be:
TestModel = lmer(DV ~ Condition + Timepoint +
(1 | Participant/Timepoint) +
(0 + xVarCondition | Participant) +
(1 | Participant:xVarCondition:Measure), data = RawData)
However, I get the following error when running this model:
Error: number of levels of each grouping factor must be < number of observations
Are the random effects specified correctly?
I can't quite tell from your description, but most likely your Participant:xVarCondition:Measure term constructs a grouping variable that has no more than one more observation in each level of classification, which will make the (1|Participant:xVarCondition:Measure) term redundant with the residual error term which is always included in an lmer model. You can override the error if you really want to by including
control=lmerControl(check.nobs.vs.nlev = "ignore")
in your function call, but (if I've diagnosed the problem correctly) this will lead to the residual variance and the Participant:xVarCondition:Measure variance being jointly unidentifiable. Such unidentifiability usually doesn't cause any problems with the rest of the model, but I am more comfortable with an identifiable model (there's always the possibility that such unidentifiability will lead to numerical issues).
There's a similar example here.
You can check my conjecture as follows:
ifac <- with(RawData,
interaction(Participant,xVarCondition,Measure,drop=TRUE))
length(levels(ifac)) == nrow(RawData)

Extracting data frames from a list based on column names in r

I am looking at extracting df's from within a list of multiple df's into separate data frames based on a condition (if the column names of a df within the list contains the name I am looking for).
For illustration purposes I have created an example which resembles the situation I am in.
I have list with multiple data frames and the dput of that list is given below:
structure(list(V1 = structure(list(lvef = c(0.965686195194885,
0.0806777632648268, -0.531729196500083, -0.511913109608259, -0.413670941196816,
-0.0501899795864357, -0.337583918771946, 1.16086745780346, -0.478358865835724,
-1.95009138673888), hbc = c(-0.389950511350405, -0.904388183933348,
0.811821977223064, -0.868381700124344, -0.637307418402866, -1.04703715824204,
-0.394340445217658, -0.194653869597247, 0.00822402232044511,
-0.145032587618231), id = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "NA", class = "factor")), .Names = c("lvef",
"hbc", "id"), row.names = c(NA, -10L), class = "data.frame"),
V2 = structure(list(ersta = c(-0.254360310986174, 0.3859806928747,
-0.135741797055127, 1.03929145413636, -0.484219739337178,
0.255476285148917, 1.0479422937128, 0.146613094683722, -0.914377222535014,
1.75052418161618, -0.275059500684816, 2.34861397588234, 0.00183723766664941,
0.97612891408903, 0.278868537504227, 0.456979477254684, 1.46323739326792,
0.664511602217853, 0.870420202897545, 1.38228375734407),
pgrsta = c(-1.49129812271989, 0.820330747101906, -0.0469488167129374,
0.471549380446308, -1.71312120132398, 0.0578140025416816,
1.67016363826724, 0.226180835709491, -2.00294530465909,
-0.0464857361954717, 0.306942902768782, -0.785096914460742,
0.283822632249141, -0.260774679911329, -1.2865970194309,
0.307972619170242, 0.223715024597144, -1.01642533651475,
-0.12229427204957, 0.223326519096996), id = structure(c(7L,
7L, 7L, 7L, 4L, 1L, 3L, 5L, 6L, 2L, 7L, 7L, 7L, 7L, 4L,
1L, 3L, 5L, 6L, 2L), class = "factor", .Label = c("-0.10863576856322",
"-0.317324527228699", "-0.422764348315332", "0.285132258310185",
"1.23305496219042", "1.39326602279981", "NA"))), .Names = c("ersta",
"pgrsta", "id"), row.names = c(NA, -20L), class = "data.frame"),
V3 = structure(list(hormrec = 1:15, event = structure(c(10L,
10L, 10L, 10L, 10L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"NA"), class = "factor")), .Names = c("hormrec", "event"), row.names = c(NA,
-15L), class = "data.frame"), V4 = structure(list(asat = c(-0.321423784000631,
0.181345361079582, 0.389158724418319, -1.15251833725336,
-0.351981383678293, -0.506888212379408, 0.870705917350059,
-0.626883041051641, -0.321843006223371, -0.674564527029912,
-0.609383943267379, -0.181661119817784, -1.63676077872658
), lab = structure(c(1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 2L), .Label = c("btest", "NA", "rtest"), class = "factor")), .Names = c("asat",
"lab"), row.names = c(NA, -13L), class = "data.frame")), .Names = c("V1",
"V2", "V3", "V4"))
I am trying to extract data frames from the list based on the condition that if a data frame within the list contains the column name/s required then that data frame from the list should go into a separate data frame. So far, I have been able to extract the data frames into a list using the following code:
# function to extract required df's
trial <- function(x)
{
reqname <- c("hbc","ersta") # column names to check for
data <- x
lapply(seq(data), function(i){ # loop through all the data frames in the list
y <- data.frame(data[[i]]) # extract df in y
names <- names(y) # extract names of df
for(a in 1:length(reqname)) # loop through the length of reqname
{
if(reqname[a]%in%names) # check if column name/s present in current df
{
z <- y # extract df into another df
return(z) # return df
}
}
}
)
}
The above function returns a list of matching df's along with nulls where there was not a match. I am looking for a modification so that the selected data frame comes out separately. If there are two df's matching the requirement then the output should be two separate data frames.
I will appreciate all and any help in finding a solution.
You can easily use the lapply() plus a custom function to identify wanted outputs. For instance, if k is your list,
trial <- function(x)
{
reqnames <- c("hbc","ersta")
k <- lapply(k, function(x) any(names(x) %in% reqnames))
k <- which(k==1)
x[k]
}
This outputs a list with only the dataframes containing at least one of the names in reqnames.
We can remove the NULL elements with Filter
lst1 <- Filter(length, trial(lst))
If we need multiple data.frame objects in the global environment, use list2env after renaming the list elements with the object names
names(lst1) <- paste0('dat' seq_along(lst1))
list2env(lst1, envir = .GlobalEnv)

Resources