While looking through the Rserve logs, I found that we sometimes attempt to graph data that cannot be graphed in a reasonable way. Examples:
Data from the CSV file ends on 2018-02-13. We try to graph 2018-02-13 to 2018-02-14. A line needs at least 2 points, so we're showing an almost empty graph.
Data from the CSV file ends on 2018-02-13. We try to graph 2018-02-14 to 2018-02-15. We don't have a single data point to graph, so we're not showing anything at all.
I could imagine we do the following things, if we can't display a single data point or line segment:
Display a "No Data Available" placeholder like Relay Search.
Note that we might need this for other cases, too, when users pass parameters that we cannot process.
We might as well not display anything, which is what we do right now. Might be less usable, though.
Display an empty graph with all requested dates on the x axis and no data points.
Note that if we want to do this, we need to decide when to "trim" the graph to the available dates and when to show all requested dates from start to end. Example: Try to draw a graph from 2000-01-01 to 2001-01-01, and then try to draw another graph from 2000-01-01 to 2018-01-01. Should both graphs display the full requested time period, or should just the second graph be trimmed to available data?
Thoughts?
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
For avoiding the 500 server error when trying to access the png (as reported in detail in legacy/trac#25468 (moved)) an empty graph/png stating 'no data available for this parameter choice' should be generated. Currently, the corresponding csv only contains the header. Maybe, also add a comment here too in case no data is available. It should be avoided to call R if there is no data available.
For avoiding the 500 server error when trying to access the png (as reported in detail in legacy/trac#25468 (moved)) an empty graph/png stating 'no data available for this parameter choice' should be generated.
Sounds good. We'll also need such a static file in the PDF format. I'll create placeholders using R/ggplot2 in a bit. We just need to write the code to put them in.
Currently, the corresponding csv only contains the header. Maybe, also add a comment here too in case no data is available.
Sounds good, too. Want to suggest a text? I can put it in then.
It should be avoided to call R if there is no data available.
Well, we need to call R to find out whether there's data available for the requested period of time. Or what did you mean?
For avoiding the 500 server error when trying to access the png (as reported in detail in legacy/trac#25468 (moved)) an empty graph/png stating 'no data available for this parameter choice' should be generated.
Sounds good. We'll also need such a static file in the PDF format. I'll create placeholders using R/ggplot2 in a bit. We just need to write the code to put them in.
Attached. I generated these two files with the following code:
For avoiding the 500 server error when trying to access the png (as reported in detail in legacy/trac#25468 (moved)) an empty graph/png stating 'no data available for this parameter choice' should be generated.
Sounds good. We'll also need such a static file in the PDF format. I'll create placeholders using R/ggplot2 in a bit. We just need to write the code to put them in.
Currently, the corresponding csv only contains the header. Maybe, also add a comment here too in case no data is available.
Sounds good, too. Want to suggest a text? I can put it in then.
It should be avoided to call R if there is no data available.
Well, we need to call R to find out whether there's data available for the requested period of time. Or what did you mean?
The java code currently returns the SC_INTERNAL_SERVER_ERROR. This is decided by the properties of the RObject received. As RObject is our code it should always be available in order to avoid the null check.
In case, the R code doesn't give a result or fails the RObject should 'know', i.e., a method needs to be added, e.g., boolean error() for any unforeseeable error w/o any file returned (triggers error message) otherwise (in case of error()==false) it returns the wanted file or the 'no-data' version.
Regarding the text:
Unfortunately there is no data available for the chosen parameters.
Is fine for all and we should include the parameters chosen into the response, e.g., in the pdf, csv, png.
Huh, I'm afraid I don't follow. Do you mind writing this patch?
I'm also not clear where you'd want to include parameters in the pnd or pdf. Do you mean we should generate a custom "No Data Available" graph that displays the chosen parameters? If so, why? (Maybe this will be clearer with a patch, too.)
The use case I'm thinking of:
Some people retrieve only the pdf/png/csv using the links provided. If such a file is downloaded a URL in there with all parameters and the no-data-info makes clear what happened.
On the web page we wouldn't need to display a graph w/o dots just the text, b/c the parameters are visible still.
Other than that code might be easier to discuss. I'll provide a patch.
Trac: Status: new to accepted Owner: metrics-team to iwakeh
I see. It's a possible use case. But I'd say let's focus on our main use case first, which is users looking at the website. And if somebody links to our PDF/PNG/CSV directly, and these return "No Data Available", it should be up to them to also provide details to find out why that is the case.
https://styleguide.torproject.org/visuals/#colors has the approved colors. I could also make SVG, PDF, etc. versions if that is useful. If we do include a new image to be used, we should also update Relay Search at the same time, I can make a patch for that quite easily and then both could be deployed together.
Please review this patch branch, which contains one commit fixing the usage of StringBuilder.append and a second one implementing the wanted functionality.
The implementation actually catches R errors and makes the respective no-data file available instead. Java can only guess if an R problem occurred. Thus, every call for R-graph generation is now wrapped in an R function robust_call that creates the no-data objects in case of an error. (String function = ... was moved down in order to avoid a checkstyle complaint).
Hmm, looks like we broke the CSV links. Any link that I open now says "No data available for the given parameters." The code looks right to me. Any idea what's wrong?
Trac: Resolution: fixed toN/A Status: closed to reopened Priority: Low to High
There are errors in rserve.log, unfortunately w/o timestamp. It is not clear from the log, if R was restarted successfully and when ...
Maybe, try to restart making sure all R servers are down first?
That was a strange error situation. The call list created in java wasn't proper, but worked for graphs and failed for csvs. That's why there was only no-data in csv files, because there was an R error.
Now, the behavior for graphs is the same as before, but csv files always contain their column headers, i.e., the no-data message will not appear. It only shows up when there is an error.
Could you remove the cached csvs from the server? metrics.torproject.org/userstats-relay-country.csv?start=2018-01-04&end=2018-04-04&country=all&events=off is still 'no-data'.