
We had a short meeting today. Our next meeting is on 25 January; Tema will lead the meeting. Here is a list of our current issues and action items: 1. CloudFoundry. IBM is terminating support for CloudFoundry. Rick, what will replace CloudFoundry and what do we need to change in our system? 1. Ground Truth Clogging Dataset. We want to create a labeled data set that assigns a label of "good", "clogged", or "other-failure" to each rain gauge on a daily basis. We will use this dataset to develop and test new QC methods for precipitation * Rick: can you export the quality objects in json format for the period from January 2017 through October 2022? * Austin and Victoria will analyze the exported JSON for each station and day. For each block of time where a station has been flagged, they will look at the maximum precipitation that was reported. If this is less than epsilon (for some threshold, maybe 5mm?) for the entire block of time, they will mark the rain gauge as "clogged" during that time. If the maximum reported precipitation is greater than epsilon, they will mark the rain gauge as "other failure". Finally, for all days where the QC object is ok, they will mark the rain gauge as "good". The output should be a CSV file with columns for station id, sensor id (to handle stations with multiple rain gauges), date, reported precipitation, and the assigned class label {good, clogged, other-failure}. JSON format would be fine also; whichever you prefer. I assume we define a day as running from 0:00-23:59:59 GMT. 1. Hand-off of the job manager. Michael will meet with Austin and Tema to finalize the hand-off of responsibility for monitoring the job manager. This will be delayed until the floods in California are over. 1. Collective Classification. Our current approach flags each station individually. A flaw in this approach is that if a station is bad, we tend to flag not only the bad station but also many of its neighbors. The idea of collective classification is to jointly analyze the stations. We start by obtaining the anomaly scores for all of the stations computed in the usual way. Let Open := the set of all stations for which we have scores and Done := {} * Repeat i. Let s be the station in Open with the highest anomaly score. ii. If the score is below the anomaly threshold then mark all remaining stations in Open as "ok" and exit iii. Else Flag s as "broken", remove it from Open, and add it to Done iv. Recompute the anomaly scores of all neighbors of s (i.e., all stations that use s as a neighbor). In the current system, we have no way of doing this, so we can just flag all of them as "ok", remove them from Open and add them to Done. * The action item is to implement collective classification. This is not currently assigned to anyone. Any volunteers to implement this? 1. GP Clogging Model. Implement and test the algorithm proposed by Ciira and Tom for detecting clogged sensors. This item is assigned to Ciira and Tom. 1. Multi-day Detection. Implement and test a version of the current algorithm that sums the precipitation for K consecutive days. We suspect that K of 5-8 days might give much more accurate results. This will involve modifying the training code and the scoring code. Every day, this code will score the sum of the precipitation for the K most recent days. Unfortunately, this will also increase the problems caused by missing data, because if any data is missing from any of those K days (for the target station or any of its neighbors), then we will not be able to score the station. This item is not assigned to anyone. Any volunteers to implement this? I will be traveling in India from January 22 through February 19. I will therefore miss the next two meetings (25 January and 8 February). Tema will run the meetings in my absence. --Tom Thomas G. Dietterich, Distinguished Professor Voice: 541-737-5559 School of Electrical Engineering FAX: 541-737-1300 and Computer Science URL: eecs.oregonstate.edu/~tgd US Mail: 1148 Kelley Engineering Center Office: 2067 Kelley Engineering Center Oregon State Univ., Corvallis, OR 97331-5501