Python/jupyter: suggestions on data structure and visulization for a diary - graph

my child's pediatrician asked us to keep a diary of bedwetting. He draws a sun or a raining cloud on our wall calendar on the corresponding day every morning when he wakes up.
He thinks it would be fun to do "science" with this data. So we would want to show calendars with "green" (dry) and "red" nights, an of course do the custom piecharts, descriptive statistics (including longest dry streaks) etc.
What kind of data structure would you suggest to record each dry and wet knight in the simplest ongoing way (he could maybe fill a spreadsheet with a data column and a dry/wet column?) and is there a "calendar" type of graphics we could use?

Related

Organize groups with timestamps in NoSQL

The bounty expires in 1 hour. Answers to this question are eligible for a +300 reputation bounty.
Fredyonge wants to draw more attention to this question.
I have a question regarding the formatting of data in a nosql database
I have the following use case:
n User can be in
n Groups where each user has
1 time goal per day
1 time he is currently tracking (which increments during the day)
The goal: Each user works toward a goal (in minutes) per day and can see each others progress.
The hardest part for me is the time goal. I have no idea how to structure that.
My thinking right now:
Users
unique_id:
name: user1
timeTracked:{
{11.02.2023:45}
}groups:[group1, group2, group3]
groups
groups1:
members:{user1,user2...}
time:{
22.02.22:{
user1:43
user2:60
}....
goals:{
22.02.22:{
user1:80
Would that be a sensible way of structuring the data? That would make it quite hard to track if users have achieved their goals in the past: If I only have a timestamp from 05.02.22 because the user didn't change their current goal, I would have to extrapolate it for the days up until 11.02.22
I would suggest to make a separated collection for each user's time goals.
The separated collection will store the users' time goal for a specific day.
user_id: "user1"
date: "2020-02-15"
goal: 54
You will be able to track each user's progress towards their goal for each day easily, even if they don't update their current time tracking.

Using OptaPlanner to create school time tables with some tricky constraints

I'm going to use OptaPlanner to lay out time tables for a school.
We're laying out the time tables for a full semester and every week could, if necessary, be slightly different.
There are some tricky constraints to take into account:
1. Weekly schedules
The lectures in one subject should be spread out somewhat evenly over the semester.
We can't for example put 20 math lectures the first week and "be done" with math for this semester.
In fact, it's nice to have some weekly predictibility
"Science year 2 have biology on Tuesday mornings"
This constraint must not be carved in stone however. Some weeks have to include work experience sessions, PE excursions, etc, in which case they must deviate from other weeks.
Problem
If I create a constraint that say, gives -1soft for not scheduling a subject the same time as the previous week, then OptaPlanner will waste a lot of time before it "accidentally" finds a good placement for a lecture, and even if it manages to converge so that each subject is scheduled the same time every week, it will never ever manage to move the entire series of lectures by moving them one by one. (That local optimum will never be escaped.)
2. Cross student group subjects
There's a large correlation between student groups and courses; For example, all students in Science year 2 mostly reads the same courses: Chemistry for Science year 2, Biology for Sience year 2, ...
The exception being language courses.
Each student can choose to study French, German or Spanish. So Spanish for year 2 is studied by a cross section of Science year 2 students, and Social Studies year 2 students, etc.
From the experience of previous (manual) scheduling, the optimal solution it's almost guaranteed to schedule all language classes in the same time slots. (If French is scheduled at 9 on Thursdays, then German and Spanish can be scheduled "for free" at 9 on Thursdays.)
Problem
There are many time slots in one semester, and the chances that OptaPlanner will discover a solution where all language lectures are scheduled at the same time by randomly moving individual lectures is small.
Also, similarly to problem 1: If OptaPlanner does manage to schedule French, German and Spanish at the same time, these "blocks" will never be moved elsewhere, since they are individual lectures, and the chances that all lectures will "randomly" move to the same new slot is tiny. Even with a large Tabu history length and so on.
My thoughts so far
As for problem 1 ("Weekly predictability") I'm thinking of doing the following:
In the construction phase for the full-semester-schedule I create a reduced version of the problem, that schedules (a reduced set of lectures) into a single "template week". Let's call it a "single-week-pre-scheduling". This template week is then repeated in the construction of the initial solution of the full semester which is the "real" planning entity.
The local search steps will then only focus on inserting PE excursions etc, and adjusting the schedule for the affected weeks.
As for problem 2 I'm thinking that the solution to problem 1 might solve this. In a 1 week schedule, it seems reasonable to assume that OptaPlaner will realize that language classes should be scheduled at the same time.
Regarding the local optimum settled by the single-week-pre-scheduling ("Biology is scheduled on Tuesday mornings"), I imagine that I could create a custom move operation that "bundles" these lectures into a single move. I have no idea how simple this is. I would really like to keep the code as simple as possible.
Questions
Are my thoughts reasonable? Is there a more clever way to approach these problems? If I have to create custom moves anyways, perhaps I don't need to construct a template-week?
Is there a way to assign hints or weights to moves? If so, I could perhaps generate moves with slightly larger weight that adjusts scheduling to adhere to predictable weeks and language scheduled in the same time slots.
A question well asked!
With regards to your first problem, I suggest you take a look at OptaWeb Employee Rostering and the concept of rotations. A rotation is "how things generally are" and then Planner has the freedom to diverge from the rotation at a penalty. Once you understand the concept of the rotation from the UI, take a look at the planning entity Shift and how the rotation is implemented with the use of employee and rotationEmployee variables. Note that only the employee is an actual #PlanningVariable, with the rotationEmployee being fixed.
That means that you have to define your rotations manually, therefore doing the work of the solver yourself. However, since this operation is only done once a semester I assume, maybe the solution could be to have a simpler solver generate a reasonable general rotation first, and then a second solver would take it and figure out the specific necessary adjustments?
With regards to your second problem, rotations could help there too. But I'm thinking maybe some move filtering and custom moves to help OptaPlanner to either move all language classes, or none? Writing efficient custom moves is not easy, and filtering stock moves is cumbersome. So I would only do it when the potential of other options is exhausted. If you end up doing this, look for MoveIteratorFactory.
My answer is a little vague, as we do not get into the specifics of the domain model, but for the purposes of designing the overall solution, it hopefully gives enough clues.

Plot movement over time in (preferably) Google Maps

I have a spreadsheet with columns for person, date, event, place name, latitude, and longitude. This is the result of many years of genealogical research that shows the birth, marriage, and death locations for several hundred of my direct ancestors as they migrated across the world and finally converged in South Africa for the last few generations.
I'd very much like to create an animation or video showing their movements over time, preferably with a marker flashing at the location, then fading away, with or without lines linking the markers for the duration of the person's life. At 9 generations ago this would then show 512 births happening at roughly the same time, moving on to them converging into 256 places as couples got married, then between those 256 marriages and the original 512 deaths, the 256 births of people of the next generation would flash on, and so on, finally converging on just my birth. I believe such an animation would be an excellent way to make the vast family tree accessible in a visual way, and other genealogical researchers would probably also enjoy doing this. The ability to automatically zoom in on the bounding box of the locations at any given time would be needed to show movements within a smaller geographic location, but first and foremost I simply want to plot points over time.
Does anyone know of a free or commercial tool that would allow doing this? I have explored this in most genealogical software solutions but they provide very limited tools showing one person or one couple at a time, so I suspect I'm going to have to plug this into a generic 'plot movement over time' tool in a good map service.
I have used GraphXR for plotting family tree members linked to one of their several maps, with the edges being either a birth, marriage or death date. The data is queried from Neo4j which has a seamless interface with GraphXR.
I'm now working on a Neo4j PlugIn for genealogy and collaborating with GraphXR developers to make such visualizations easier for end users.
It's not exactly what you are looking for, but it may be helpful?
http://gfg.md/blogpost/7

How to represent days in timeline tree for Neo4j/graphDB

In reading this blog, this reference, and reviewing the answer to this question, I'm confused as to how one can represent distinct days in a timeline tree. In both cases they show a limited number of days on the example database and my thought is that this model cannot hold if you wanted to model an entire year or an unbounded temporal calendar period.
I am reading these examples such that the 'day' nodes are merely just the number of the day '1', '2', .. '31'. As every month has a day labeled '1', '2', etc, how do you traverse the path when you connect all days to months?
For example, in the attached modified drawing, month 12 and month 1 BOTH have days 1, 2 and 31 in them. When I look at event 2, how do I know if this took place on 12/31 or 1/31? I'd like to model all days for all months and using the template (as I understand it) creates ambiguous paths on the graph that do not allow for discrete temporal queries.
Or should a 'days' entity be a more unique number that represents the number of the day for that YEAR where the attribute for that number is the number of the day in the month?
As it stands I do not understand how you can create a complete timeline tree for an entire year as it is modeled in the above links.
The modified drawing shares the day nodes between months and that's why it's difficult to know whether event 2 took place on 12/31/2010 or 1/1/2011
The timetree for a single year, with a resolution of Day, will have 365/366 day nodes. So the Day node with value 31 is not shared by both January and December, but January and December have their own Day 31 nodes. In other words a day node relates to exactly one month, and a month node relates to exactly one year.
Then you can follow the path from the event back up to the root without having it diverge at month.
Peter's blog post referenced above shows the crossing of December into January but does not share the Day node 31- you can see that he's able to answer all your queries above.
To create the timeline, GraphAware has a module that maintains the timeline for you and helps you attach events as well- http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html
Disclaimer: I work at GraphAware

Confusion over Google Analytics (GA) Absolute Unique Visitors data

GA Unique Visitors data isn't making sense to me. From the GA FAQ we get the following definition for 'Visits vs. Visitors'
"The initial session by a user during any given date range is considered to be an additional visit and an additional visitor. Any future sessions from the same user during the selected time period are counted as additional visits, but not as additional visitors. "
The part that I can't resolve with the GA graph is "Any future sessions from the same user during the selected time period are counted as additional visits, but not as additional visitors". For the graph below covering a 30-day period, I would understand the GA definition to mean that the data represents uniqueness across all 30 days, right? But if you look at the screen shot below, you see a regular pattern for each week over the 30-day period the report covers. From that, it seems the numbers we are seeing associated with each of the days of the graph (e.g. 3.92% (4142) for Tuesday, September 8) is a count of unique visitors just in the context of that one day - i.e. without correlating their uniqueness to the rest of the days in the 30-day period. If the graph actually showed uniqueness across the 30-day period, I would expect the daily numbers to start high in the early days of the period and decrease over the 30-day period as the number of already-seen visitors (i.e. returning visitors) increases, no?
What am I missing here?
UPDATE
Helpful clue from Jonathan S. below got me on the right track.
I think I understand now what the daily bar graph values mean, but it's a little counter-intuitive and I'd bet not what some others might be assuming as well. The reports states "39,822 Absolute Unique Visitors" at the top, which means just that: over the 30-day period we saw this many uniques. Fair enough. The confusing part is that the daily (or weekly) bar values in the graph below are not mutually exclusive uniques as I had assumed, but are values relative only to the 39,822 total - i.e. there is overlap between the unique visitor counts across any group of days. This means the sum of the daily % values > 100% and the sum of the daily count values > 39,822. The algorithm is: when you visit for the first time in the 30-day period, call that "today", you add 1 to the total (39,822) and 1 to the "today" bar value. When you show up again "tomorrow", you are NOT counted again in the total, but ARE counted as 1 in the "tomorrow" bar value.
alt text http://img.skitch.com/20090922-djti81ejj5gqn575ibf8cj1e8x.jpg
I believe it's just an issue of grouping. The top right of the graph has 3 icons to group by day, week, or month. It's currently grouping by day. So if I visit your site today and come back tomorrow, I'll be counted once for each day.
I tried looking at the month view for one of my sites but it didn't give me much meaningful data. I believe the above should answer your original confusion though.
Is it possible that you're searching for something what isn't existing anymore? Unique Visitors/Visits is old terminology. Check: https://www.seroundtable.com/google-analytics-sessions-users-18424.html
Then check how sessions and users are defined:
Sessions ("ex-visits", it's very detailed): https://support.google.com/analytics/answer/2731565?hl=en&ref_topic=1012046
Users in Google Analytics reporting are defined as "Users who have initiated at least one session during the date range". So IMHO it's not about 30 days, it's about the SELECTED date range.
I hope this helps.

Resources