I am new to R programming. I was trying to visualize some dataset. I was using Googlevis in R and was unable to visualize it.
The error I got was:
Error: Length of logical index vector must be 1 or 8, got: 14835
Can someone help?
Dataset is here:
https://www.kaggle.com/c/predict-west-nile-virus/data
Code is below
# Read competition data files:
library(readr)
data_dir <- "C:/Users/Wesley/Desktop/input"
train <- read_csv(file.path(data_dir, "train.csv"))
spray <- read_csv(file.path(data_dir, "spray.csv"))
# Generate output files with write_csv(), plot() or ggplot()
# Any files you write to the current directory get shown as outputs
# Install and read packages
library(lubridate)
library(googleVis)
# Create useful date columns
spray$Date <- as.Date(as.character(spray$Date),format="%Y-%m-%d")
spray$Week <- isoweek(spray$Date)
spray$Year <- year(spray$Date)
# Create a total count of measurements
spray$Total <- 1
for(i in 1:nrow(spray)) {
spray$Total[i] = i
}
# Aggregate data by Year, Week, Trap and order by old-new
spray_agg <- aggregate(cbind(Total)~Year+Week+Latitude+Longitude,data=spray,sum)
spray_agg <- spray[order(spray$Year,spray$Week),]
# Create a misc format for Week for Google Vis Motion Chart
spray_agg$Week_Format <- paste(spray_agg$Year,"W",spray_agg$Week,sep="")
# Function to create a motion chart together with a overview table
# It takes the aggregated data as input as well as a year of choice (2007,2009,2011,2013)
# It filters out "no presence" weeks since they distort the graphical view
# Next to that it creates an overview table of that year
# With gvisMerge you can merge the 3 html outputs into 1
create_motion <- function(data=spray_agg,year=2011){
data_motion <- data[data$Year==year]
motion <- gvisMotionChart(data=data_motion,idvar="Total",timevar="Week_Format",xvar="Longitude",yvar="Latitude"
,sizevar=0.1,colorvar="Blue",options=list(width="600"))
return(motion)
}
# Get the per year motion charts
#motion1 <- create_motion(spray_agg,2007)
#motion2 <- create_motion(spray_agg,2009)
motion3 <- create_motion(spray_agg,2011) : (Error: Length of logical index vector must be 1 or 8, got: 14835)
motion4 <- create_motion(spray_agg,2013) :(Error: Length of logical index vector must be 1 or 8, got: 14835)
# Merge them together into 1 dashboard
output <- gvisMerge(gvisMerge(motion1,motion2,horizontal=TRUE),gvisMerge(motion3,motion4,horizontal=TRUE),horizontal=FALSE)
plot(output)
# Plot the output in your browser
Related
I am looking to achieve something along the lines of what is done here, with the intention of creating a drug-target interaction network.
I have downloaded data from here and I would like to reproduce that network.
My data has the below form:
#Drug Gene
DB00357 P05108
DB02721 P00325
DB00773 P23219
DB07138 Q16539
DB08136 P24941
DB01242 P23975
DB01238 P08173
DB00186 P48169
DB00338 P10635
DB01151 P08913
DB01244 P05023
DB01745 P07477
DB01996 P08254
I consulted this previous post as a first step in order to create the similary matrix. The resulting matrix on the entire data set is large, so I tried recreating the procedure on a smaller data frame as per below.
# packages used
library("qgraph")
library("dplyr")
drugs <- c("DB00357","DB02721","DB00773",
"DB07138","DB08136",
"DB01242","DB01238",
"DB00186","DB00338",
"DB01151","DB01244",
"DB01745","DB01996")
genes <- c("P05108", "P00325","P23219",
"Q16539","P24941",
"P23975","P08173",
"P48169","P10635",
"P08913","P05023",
"P07477","P08254")
# Dataframe with a small subset of observations
df <- data.frame(drugs, genes)
# Consulting the other post
b <- df %>% full_join(df, by = "genes")
tb <- table(b$drugs.x, b$genes)
My next step I believe is to create the correlation matrix and the network as per the guide I'm trying to replicate. Here I face issues, below are my attempts documented:
# Follow guide trying to replicate correlation matrix
cormatrix <- cor_auto(tb)
### Error ###
"Removing factor variables: Var1; Var2
Error in data[, sapply(data, function(x) mean(is.na(x))) != 1] :
incorrect number of dimensions"
So I instead tried using cor(), and this works. However when I try to apply it on the entire dataframe it just keeps running/never produce output.
# Second way, using cor() instead to replicate correlation matrix
cormatrix <- cor(tb)
graph1 <- qgraph(tb, verbose = FALSE)
Therefor I wonder if anyone has any ideas for it to run properly and produce the network as intended?
I'm trying to visualize monthly averages of RegCM output I joined using CDO but I'm not being able to do it.
In order to do that I was trying to find a way to plot de monthly averages of my variable "pr" as you could do using GrADS.
I found that a way to do this was using the brick function and the raster library. So I was trying to use the code suggested in another question to convert my netcdf file into a raster brick:
NetCDF to Raster Brick "Unable to find inherited method for function 'brick' for 'ncdf4'"
# load package
library(sp)
library(raster)
library(ncdf4)
# read ncdf file
nc<-nc_open('dat.nc')
# extract variable name, size and dimension
v <- nc$var[[1]]
size <- v$varsize
dims <- v$ndims
nt <- size[dims] # length of time dimension
lat <- nc$dim$xlat$vals # latitude position
lon <- nc$dim$xlong$vals # longitude position
# read pr variable
r<-list()
for (i in 1:nt) {
start <- rep(1,dims) # begin with start=(1,1,...,1)
start[dims] <- i # change to start=(1,1,...,i) to read timestep i
count <- size # begin with count=(nx,ny,...,nt), reads entire var
count[dims] <- 1 # change to count=(nx,ny,...,1) to read 1 tstep
dt<-ncvar_get(nc, varid = 'pr', start = start, count = count)
# convert to raster
r[i]<-raster(dt)
}
# create layer stack with time dimension
r<-stack(r)
# transpose the raster to have correct orientation
rt<-t(r)
extent(rt)<-extent(c(range(lon), range(lat)))
# plot the result
spplot(rt)
But once I tried to run the for loop in the code, I get the following error:
Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, :
Error: variable has 3 dims, but start has 2 entries. They must match!
The file I'm trying to visualize can be found in the following link:
https://drive.google.com/file/d/13KsOpnt-Wk2v93WwGcOU6AHw8KGOFlai/view?usp=sharing
I would really appreciate any insights with this problem!
Summary: Despite a complicated lead-up, the solution was very simple: In order to plot a row of a dataframe as a line instead of a lattice, I needed to transpose the data in order to invert from x obs of y variables to y obs of x variables.
I am using RStudio on a Windows 10 computer.
I am using scientific equipment to write measurements to a csv file. Then I ZIP several files and read to R using read.csv. However, the data frame behaves strangely. Commands "length" and "dim" disagree and the "plot" function throws errors. Because I can create simulated data that doesn't throw the errors, I think the problem is either in how the machine wrote the data or in my loading and processing of the data.
Two ZIP files are located in my stackoverflow repository (with "Monterey Jack" in the name):
https://github.com/baprisbrey/stackoverflow
Here is my code for reading and processing them:
# Unzip the folders
unZIP <- function(folder){
orig.directory <- getwd()
setwd(folder)
zipped.folders <- list.files(pattern = ".*zip")
for (i in zipped.folders){
unzip(i)}
setwd(orig.directory)
}
folder <- "C:/Users/user/Documents/StackOverflow"
unZIP(folder)
# Load the data into a list of lists
pullData <- function(folder){
orig.directory <- getwd()
setwd(folder)
#zipped.folders <- list.files(pattern = ".*zip")
#unzipped.folders <- list.files(folder)[!(list.files(folder) %in% zipped.folders)]
unzipped.folders <- list.dirs(folder)[-1] # Removing itself as the first directory.
oData <- vector(mode = "list", length = length(unzipped.folders))
names(oData) <- str_remove(unzipped.folders, paste(folder,"/",sep=""))
for (i in unzipped.folders) {
filenames <- list.files(i, pattern = "*.csv")
#setwd(paste(folder, i, sep="/"))
setwd(i)
files <- lapply(filenames, read.csv, skip = 5, header = TRUE, fileEncoding = "UTF-16LE") #Note unusual encoding
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- vector(mode="list", length = length(files))
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- files
}
setwd(orig.directory)
return(oData)
}
theData <- pullData(folder) #Load the data into a list of lists
# Process the data into frames
bigFrame <- function(bigList) {
#where bigList is theData is the result of pullData
#initialize the holding list of frames per set
preList <- vector(mode="list", length = length(bigList))
names(preList) <- names(bigList)
# process the data
for (i in 1:length(bigList)){
step1 <- lapply(bigList[[i]], t) # transpose each data
step2 <- do.call(rbind, step1) # roll it up into it's own matrix #original error that wasn't reproduced: It showed length(step2) = 24048 when i = 1 and dim(step2) = 48 501. Any comments on why?
firstRow <- step2[1,] #holding onto the first row to become the names
step3 <- as.data.frame(step2) # turn it into a frame
step4 <- step3[grepl("µA", rownames(step3)),] # Get rid of all those excess name rows
rownames(step4) <- 1:(nrow(step4)) # change the row names to rowID's
colnames(step4) <- firstRow # change the column names to the first row steps
step4$ID <- rep(names(bigList[i]),nrow(step4)) # Add an I.D. column
step4$Class[grepl("pos",tolower(step4$ID))] <- "Yes" # Add "Yes" class
step4$Class[grepl("neg",tolower(step4$ID))] <- "No" # Add "No" class
preList[[i]] <- step4
}
# bigFrame <- do.call(rbind, preList) #Failed due to different number of measurements (rows that become columns) across all the data sets
# return(bigFrame)
return(preList) # Works!
}
frameList <- bigFrame(theData)
monterey <- rbind(frameList[[1]],frameList[[2]])
# Odd behaviors
dim(monterey) #48 503
length(monterey) #503 #This is not reproducing my original error of length = 24048
rowOne <- monterey[1,1:(ncol(monterey)-2)]
plot(rowOne) #Error in plot.new() : figure margins too large
#describe the data
quantile(rowOne, seq(0, 1, length.out = 11) )
quantile(rowOne, seq(0, 1, length.out = 11) ) %>% plot #produces undesired lattice plot
# simulate the data
doppelganger <- sample(1:20461,501,replace = TRUE)
names(doppelganger) <- names(rowOne)
# describe the data
plot(doppelganger) #Successful scatterplot. (With my non-random data, I want a line where the numbers in colnames are along the x-axis)
quantile(doppelganger, seq(0, 1, length.out = 11) ) #the random distribution is mildly different
quantile(doppelganger, seq(0, 1, length.out = 11) ) %>% plot # a simple line of dots as desired
# investigating structure
str(rowOne) # results in a dataframe of 1 observation of 501 variables. This is a correct interpretation.
str(as.data.frame(doppelganger)) # results in 501 observations of 1 variable. This is not a correct interpretation but creates the plot that I want.
How do I convert the rowOne to plot like doppelganger?
It looks like one of my errors is not reproducing, where calls to "dim" and "length" apparently disagree.
However, I'm confused as to why the "plot" function is producing a lattice plot on my processed data and a line of dots on my simulated data.
What I would like is to plot each row of data as a line. (Next, and out of the scope of this question, is I would like to classify the data with adaboost. My concern is that if "plot" behaves strangely then the classifier won't work.)
Any tips or suggestions or explanations or advice would be greatly appreciated.
Edit: Investigating the structure with ("str") of the two examples explains the difference between plots. I guess my modified question is, how do I switch between the two structures to enable plotting a line (like doppelganger) instead of a lattice (like rowOne)?
I am answering my own question.
I am leaving behind the part about the discrepancy between "length" and "dim" since I can't provide a reproducible example. However, I'm happy to leave up for comment.
The answer is that in order to produce my plot, I simply have to transpose the row as follows:
rowOne %>% t() %>% as.data.frame() %>% plot
This inverts the structure from one observation of 501 variables to 501 obs of one variable as follows:
rowOne %>% t() %>% as.data.frame() %>% str()
#'data.frame': 501 obs. of 1 variable:
# $ 1: num 8712 8712 8712 8712 8712 ...
Because of the unusual encoding I used, and the strange "length" result, I failed to see a simple solution to my "plot" problem.
Hi all I am a novice to R and appreciate your hints on this case.
I've been struggling to convert the variables (objects) in my dataframe to strings and plot them using a for loop, as detailed below.
COUNTRY: China Belgium ...
COMPANY: XXX Inc. YYY Inc. ...
Here, COUNTRY and COMPANY are categorical variables.
I've used toString() as well as as.character() to convert variable name to a string so I can specify the plot name but I cant seem to get it to work. I need 4 variable as listed in code below in for loop for 2 purposes:
as String for naming plot
use in barplot()
but neither string conversion nor the for loop is working properly as I meant to.
Could somebody assist me with the proper command for this purpose?
Your help is greatly appreciated...
Kind regards,
CODE
Frequency_COUNTRY <- table(COUNTRY)#Get Frequency for COUNTRY
Relative_Frequency_COUNTRY <- table(COUNTRY) / length(COUNTRY)#Get Relative
#Frequency (Percentage %) for Variable COUNTRY
Frequency_COMPANY <- table(COMPANY) #Get Frequency and Relative Frequency for COMPANY
Relative_Frequency_COMPANY <- table(COMPANY) / length(COMPANY)
Categorical_Variable_List = c(Frequency_COUNTRY,
Relative_Frequency_COUNTRY ,
Frequency_COMPANY,
Relative_Frequency_COMPANY)`# Get list of 4 variables above
for (Categorical_Variable in Categorical_Variable_List){#Plot 4 variables using a for loop
A = toString(Categorical_Variable) #Trying to convert non-string variable name to string
plotName <- paste("BarChart_", A, sep = "_")# Specify plot name, e.g. BarChart_Frequency_COUNTRY
png(file = plotName)#Create png file
barplot(Categorical_Variable) #use barplot() to make graph
dev.off()`# Switch off dev
}
Your code is treating Categorical_Variable_List as if it were a named list of categorical variables. It is neither.
The following code corrects those errors and plots a graph of 4 barplots. In your code, remove the two calls to par, one before and the other after the for loop.
I will make up a dataset, to test the code.
set.seed(1234)
n <- 20
COUNTRY <- sample(LETTERS[1:5], n, TRUE)
COMPANY <- sample(letters[1:4], n, TRUE)
Frequency_COUNTRY <- table(COUNTRY) # Get Frequency for COUNTRY
Relative_Frequency_COUNTRY <- table(COUNTRY) / length(COUNTRY)#Get Relative
# Frequency (Percentage %) for Variable COUNTRY
Frequency_COMPANY <- table(COMPANY) # Get Frequency and Relative Frequency for COMPANY
Relative_Frequency_COMPANY <- table(COMPANY) / length(COMPANY)
Variable_List <- list(Frequency_COUNTRY = Frequency_COUNTRY,
Relative_Frequency_COUNTRY = Relative_Frequency_COUNTRY,
Frequency_COMPANY = Frequency_COMPANY,
Relative_Frequency_COMPANY = Relative_Frequency_COMPANY) # Get list of 4 variables above
Variable_Name <- names(Variable_List)
old_par <- par(mfrow = c(2, 2))
for (i in seq_along(Variable_List)){ # Plot 4 variables using a for loop
plotName <- paste("BarChart", Variable_Name[[i]], sep = "_") # Specify plot name
print(plotName) # for debugging only
#png(file = plotName) # Create png file
barplot(Variable_List[[i]]) # use barplot() to make graph
#dev.off() # Switch off dev
}
par(old_par)
I have some hierarchical data, e.g.,
> library(dplyr)
> df <- data_frame(id = 1:6, parent_id = c(NA, 1, 1, 2, 2, 5))
> df
Source: local data frame [6 x 2]
id parent_id
(int) (dbl)
1 1 NA
2 2 1
3 3 1
4 4 2
5 5 2
6 6 5
I would like to plot the tree in a "top down" view through a circle packing plot:
http://bl.ocks.org/mbostock/4063530
The above link is for a d3 library. Is there an equivalent that allows me to make such a plot in ggplot2?
(I want this plot in a shiny app, which does support d3, but I haven't used d3 before and am unsure about the learning curve. If d3 is the obvious choice, I will try to get that working instead. Thanks.)
There were two steps: (1) aggregate the data, then (2) convert to json. After that, all the javascript has been written in that example page, so you can just plug in the resulting json data.
Since the aggregated data should have a similar structure to a treemap, we can use the treemap package to do the aggregation (could also use a loop with successive aggregation). Then, d3treeR (from github) is used to convert the treemap data to a nested list, and jsonlite to convert the list to json.
I'm using some example data GNI2010, found in the d3treeR package. You can see all of the source files on plunker.
library(treemap)
library(d3treeR) # devtools::install_github("timelyportfolio/d3treeR")
library(data.tree)
library(jsonlite)
## Get treemap data using package treemap
## Using example data GNI2010 from d3treeR package
data(GNI2010)
## aggregate by these: continent, iso3,
## size by population, and color by GNI
indexList <- c('continent', 'iso3')
treedat <- treemap(GNI2010, index=indexList, vSize='population', vColor='GNI',
type="value", fun.aggregate = "sum",
palette = 'RdYlBu')
treedat <- treedat$tm # pull out the data
## Use d3treeR to convert to nested list structure
## Call the root node 'flare' so we can just plug it into the example
res <- d3treeR:::convert_treemap(treedat, rootname="flare")
## Convert to JSON using jsonlite::toJSON
json <- toJSON(res, auto_unbox = TRUE)
## Save the json to a directory with the example index.html
writeLines(json, "d3circle/flare.json")
I also replaced the source line in the example index.html to
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js"></script>
Then fire up the index.html and you should see
To create the shiny bindings should be doable using htmlwidgets and following some examples (the d3treeR source has some). Note that certain things aren't working, like the coloring. The json that gets stored here actually contains a lot of information about the nodes (all the data aggregated using the treemap) that you could leverage in the figure.