This is what the chart currently looks like:
This is all the data in the database that it is currently using. (please excuse how the headers are not properly aligning here)
Id(Key) Confidence Love Stress Date/Time
193 0 0 0 12/3/2010 11:33:47 PM
194 55 55 55 12/3/2010 11:34:04 PM
195 30 40 20 12/3/2010 11:34:11 PM
196 40 50 30 12/3/2010 11:34:20 PM
197 50 60 40 12/3/2010 11:34:28 PM
198 60 70 50 12/3/2010 11:34:45 PM
199 70 80 60 12/3/2010 11:34:53 PM
200 80 90 70 12/3/2010 11:34:59 PM
201 20 3 11 12/3/2010 11:36:42 PM
202 20 3 11 12/3/2010 11:37:08 PM
203 76 34 34 12/3/2010 11:37:41 PM
204 3 4 2 12/4/2010 12:14:15 AM
205 5 100 8 12/4/2010 12:17:57 AM
206 77 89 3 12/12/2010 8:08:49 PM
This is the SQL statement I have the chart configured too:
SELECT [ConfidenceLevel], [LoveLevel],
[DateTime], [StressLevel] FROM
[UserData]
My issue is in cases like this example, the data recorded around 12/4 looses it's fidelity and is un - "see able", it all blends together.
How can I configure the chart so that the last 20 days are always readable on the chart and that they don't blur together?
Thank You.
Here are some thoughts, although please be forgiving as it's hard to get to the root of your challenge without a little more context for the data and the purpose of the graph.
First, the data sample itself seems less than adequate. Indeed, it has gaping holes: no data from 12/5 to 12/11? With data, it's garbage in, garbage out. From that perspective, one of your challenges would be to start getting more consistent data.
The line graph falsely implies there is data for the 12/5 to 12/11 date range. Was the "Confidence" level on 12/8 really 38? Maybe it was 0? Maybe it was 140? The point is, we don't know, yet the connected line graph (by it's very nature) is faking us into thinking we do know.
Also, if you're going to use a scale of "days", then I don't understand how you can have multiple values for a single day and try to plot all of those? (Maybe that's not what you're doing... .) I would think you would want to take the average of each of the categories grouped by day and have those groupings be your daily data values. And if you don't have data for a day, then refrain from plotting data you don't have.
i think the issue is with the chart type you selected as compared to the density/sparsity of your data.
this maybe reminds me of a box-and-whisker type chart,
also consider a bubble chart.
Related
I have the following dataframe. I would like to know which bacteria contribute more when comparing the location of the bacteria(categorical) and its pH(numeric).
For instance at the end i would like to say for example that a certain bacterial type is more frequently found in a certain location when looking at the temperature.
Bacillus Lactobacillus Janibacter Brevibacterium Lawsonella Location temperature
Sample1 2 30 164 8 21 48 bedroom 27
Sample2 0 211 0 996 195 108 bedroom 35
Sample3 1 938 1 21 38 43 pool 45
Sample4 0 95 17 1 4 334 pool 10
Sample5 0 192 91 25 1207 1659 soil 14
Sample6 0 12 33 6 12 119 soil 21
Sample7 0 16 3 0 0 805 soil 12
The idea is to run randomforest to select those features (bacteria) that are more important when looking at both the location and the temperature.
Is randomforest suitable for this ? When i run the follozinw command i get the following error:
randomForest(Location+Temperature ~.,data=mydf)
Error in Location + Temperature : non-numeric argument to binary operator.
From the error it looks that i cannot use a continous and categorical variable together. How can i fix this ?
Is for exemple convert the numeric temperature variable to ranges of temperatures as a categorical variables would be a solution ?
In fact i have tried and it worked by converting the numeric temperature to ranges and pasting the location so that i have a combination of location and temperature.
randomForest(Location_temperature ~.,data=dat)
I get the list of important bacteria which is what i was looking for. Now how can i know which one contributes more to one location or another since my model i was using all sites ? For example how to check that your important variables(let´s say Bacillus is the most important from the randomforest model ) is important in the pool location (how much variation it explains in the pool) ??
Hope it is clear....
Given the following list of region coordinates:
x y Width Height
1 1 65 62
1 59 66 87
1 139 78 114
1 218 100 122
1 311 126 84
1 366 99 67
1 402 102 99
7 110 145 99
I wish to identify all possible rectangle combinations that can be formed by combining any two or more of the above rectangles. For instance, one rectangle could be
1 1 66 146 by combining 1 1 65 62 and 1 59 66 87
What would be the most efficient way to find all possible combinations of rectangles using this list?
Okay. I'm sorry for not being specific about the problem.
I have been working on an algorithm for object detection that identifies different windows across the image that might have the object. But sometimes, the object gets divided into several windows. So, when looking for objects, I want to use all the windows that I identified as well as try their different combinations for object detection.
So far I have tried using 2 loops and going across all the windows one by one
: If the x coordinate of the second loop window lies in the first loop window, then I merge those two by taking the left-most, top-most, right-most and bottom-most coordinates.
However, this has been taking a lot of time and there are a lot of duplicates present in the final output. I would like to see if there is a more efficient and easier way to do this.
I hope this information helps.
Morning Community,
I wanted to ask a quick question regarding rCharts graph outputs compared to native R.
Question 1: Why are graphs from rCharts displayed in my browser rather than the viewer in R?
Question 2: How can I force (or choose to use) the graphing function in native R instead?
See these two screen shots:
Code for native R:
# Simple Scatterplot
attach(mtcars)
plot(wt, mpg, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
Code for rChart:
library(rCharts)
myData
plot<-Highcharts$new()
plot$chart(polar = TRUE, type = "line",height=NULL)
plot$xAxis(categories=myData$Subject.ID, tickmarkPlacement= 'on', lineWidth=1)
plot$yAxis(gridLineInterpolation= 'circle', lineWidth=1, min=NULL,max=NULL,endOnTick=T,tickInterval=10)
plot$series(data = myData[,"A"],name = "A", pointPlacement="on")
plot
rChart Data used
Subject.ID A B C
1 1 65 29 60
2 2 87 67 59
3 3 98 54 24
4 4 67 44 23
5 5 54 50 4
6 6 83 60 54
7 7 82 55 27
8 8 80 48 32
9 9 88 56 44
10 10 68 68 56
11 11 90 76 69
12 12 41 47 45
13 13 NA 82 NA
14 14 NA 55 NA
Ps: As an aside, I understand that I am graphing two different functions, a scatterplot vs radar plot. My goal is to understand whether or not native R can display (or perhaps another word) the graph output from rCharts - Even if I lose interactivity.
I have reached out to the developer for rCharts and he has replied back to me:
"The native viewer that comes with the R GUI is NOT capable of displaying html. So, the only way to view html output like what rCharts generates is to use the browser. The RStudio viewer on the other hand is capable of displaying html and so rCharts takes advantage of that."
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am using a Lenovo Laptop, CPU # 2.20GHz, 7.86 GB of usable memory, 64-bit Windows 8.
I am analyzing in R studio datasets usually with over 250,000 rows. The function reads a table (called ppt) and goes through all the rows of this table and take decisions through the statements in the body of the while loop:
while (i < (length(ppt[,1]) - 192)) {
print(i)
.
.
.
.
i = i+1
}
After some hours running the code and not finishing it, I inserted the print(i) in the function to trace it.
For a table having 294991 rows (size = 6.17MB), i goes from 20 to 270781 in about 14 seconds, then it stops and does, and no more i is printed which I assume the code is not analyzing anymore but still running. In fact I would have to hit STOP in order to continue working with R studio.
Then I deleted some rows of this dataset making it to have 147635 rows. Same thing, but now i goes from 20 to 147400 (in about 8 seconds) and seems to be still working and printing no i's.
I still made the data shorter, having 37000 rows. Now, it goes all the way up to the last and finishes running.
Sample data:
> ppt<- read.csv("Flow_pptJoint - Copy - Copy.csv")
> ppt[60:70,]
date precip flow NA.
60 12/1/2003 14:45 NA 85 NA
61 12/1/2003 15:00 NA 85 NA
62 12/1/2003 15:15 NA 85 NA
63 12/1/2003 15:30 NA 85 NA
64 12/1/2003 15:45 NA 85 NA
65 12/1/2003 16:00 NA 83 NA
66 12/1/2003 16:15 NA 83 NA
67 12/1/2003 16:30 NA 83 NA
68 12/1/2003 16:45 NA 83 NA
69 12/1/2003 17:00 NA 83 NA
70 12/1/2003 17:15 NA 83 NA
I was wondering if that should be a memory problem, and if yes how I could approach the issue.
Given your hardware it seems unlikely that you are facing a memory issue (by the way, it is generally expected to give columns as well as rows in order to give a more accurate idea of the size of the data). Also, memory issues generally end with an "Error: Cannot allocate memory" or "Bad alloc" or something of the sorts.
This seems rather like an endless loop. Check your while statements and the specific rows of data that they get stuck on.
An option to do this is with a browser statement in the iteration of the loop that gets stuck.
Also, in general loops are quite ineffective in R. When possible, consider other approaches (maybe ddply with a custom function that computes the statements?).
I'm using Google analytics since one week on a new website. I didn't change anything. I got the code and I placed it on my website like I did with other websites before. The problem now is the dashboard shows the number of users from yesturday is 1,200. I was surprised because I had more. So what I did is I changed the graph to show the number of users per hour for yesturday only and I extracted the data into CSV. The excel file says my users is 2,100 which is right and that's what the graph shows over 24 hours (I calculated them too).
So my question is why does the dashboard shows wrong number of users? (It's not daily, but if you select weekly there's a wrong too).
Attached are the screenshot and the CSV file from yesturday.
What can I do in this situation?
Thank you!
Hour Index Users
0 77
1 52
2 39
3 24
4 14
5 10
6 15
7 27
8 51
9 71
10 98
11 142
12 123
13 138
14 133
15 121
16 141
17 142
18 130
19 125
20 118
21 122
22 103
23 108
2,124
The users per hour do not sum up to the users during the day. Each hour, Google Anaytics tells you how many users there were on the site during that hour. If Tom visited your site at 9:00am and also at 11:00am, he would count 1 one user for both the rows 9 and 11 in your spreadsheet. But, he would only count as 1 user for the day.
Works the same way for days and weeks. If you add up the users each day, it will typically be less than the users for the week. Because some users likely visited your site on more than one day.