
Hi Rick, Thank you for the updates. Happy New Year, --Tom Thomas G. Dietterich, Distinguished Professor Voice: 541-737-5559 School of Electrical Engineering FAX: 541-737-1300 and Computer Science URL: eecs.oregonstate.edu/~tgd US Mail: 1148 Kelley Engineering Center Office: 2067 Kelley Engineering Center Oregon State Univ., Corvallis, OR 97331-5501 From: Rainqc-jobman <rainqc-jobman-bounces@engr.oregonstate.edu> On Behalf Of Rick Hagenaars - CITG Sent: Thursday, January 12, 2023 03:17 To: RainQC Job Manager <rainqc-jobman@engr.oregonstate.edu> Subject: Re: [Rainqc-jobman] Action Items and Next Steps for the RainQC system [This email originated from outside of OSU. Use caution with links and attachments.] [This email originated from outside of OSU. Use caution with links and attachments.] Hello all, Sorry about missing the meeting, got covid this week so schedule got a bit messed up. Thanks for the written overview of discussed points. On the CloudFoundry issue: During November/December we had a lot of discussion with IBM to keep the CF active within the academic license until it's fully deprecated from their platforms. In the end they approved since they couldn't answer yet whether we would get sufficient resources on the new IBM Cloud Code Engine. Unfortunately the question about the amount of resources for Cloud Code Engine still hasn't been answered, so as of know we still don't know whether we need to migrate there or towards kubernetes/plain VPS instances before June this year. It's not just the SensorDX systems, but all ~20 TAHMO applications as well which are subject to this. If we don't have a decisive answer by end of February, we will need to start migrating to a non-IBM vender locked system. Until that time I suggest we just wait, although planning-wise we do need to reserve time for the migration since that will 100% need to happen in May. On the "ground truth dataset": I'll provide the dataset by next week Wednesday. Kind regards, Rick Hagenaars TAHMO Project, Faculty CiTG, TU Delft, Stevinweg 1, 2628CN Room 4.72, Delft The Netherlands T (M) +31(0)645833496 E-mail: h.f.hagenaars@tudelft.nl<mailto:h.f.hagenaars@tudelft.nl> or rhagenaars@tahmo.org<mailto:rhagenaars@tahmo.org> ________________________________ From: Rainqc-jobman <rainqc-jobman-bounces@engr.oregonstate.edu<mailto:rainqc-jobman-bounces@engr.oregonstate.edu>> on behalf of Dietterich, Thomas <tgd@oregonstate.edu<mailto:tgd@oregonstate.edu>> Sent: Wednesday, January 11, 2023 5:06:51 PM To: RainQC Job Manager Subject: [Rainqc-jobman] Action Items and Next Steps for the RainQC system We had a short meeting today. Our next meeting is on 25 January; Tema will lead the meeting. Here is a list of our current issues and action items: 1. CloudFoundry. IBM is terminating support for CloudFoundry. Rick, what will replace CloudFoundry and what do we need to change in our system? 1. Ground Truth Clogging Dataset. We want to create a labeled data set that assigns a label of "good", "clogged", or "other-failure" to each rain gauge on a daily basis. We will use this dataset to develop and test new QC methods for precipitation * Rick: can you export the quality objects in json format for the period from January 2017 through October 2022? * Austin and Victoria will analyze the exported JSON for each station and day. For each block of time where a station has been flagged, they will look at the maximum precipitation that was reported. If this is less than epsilon (for some threshold, maybe 5mm?) for the entire block of time, they will mark the rain gauge as "clogged" during that time. If the maximum reported precipitation is greater than epsilon, they will mark the rain gauge as "other failure". Finally, for all days where the QC object is ok, they will mark the rain gauge as "good". The output should be a CSV file with columns for station id, sensor id (to handle stations with multiple rain gauges), date, reported precipitation, and the assigned class label {good, clogged, other-failure}. JSON format would be fine also; whichever you prefer. I assume we define a day as running from 0:00-23:59:59 GMT. 1. Hand-off of the job manager. Michael will meet with Austin and Tema to finalize the hand-off of responsibility for monitoring the job manager. This will be delayed until the floods in California are over. 1. Collective Classification. Our current approach flags each station individually. A flaw in this approach is that if a station is bad, we tend to flag not only the bad station but also many of its neighbors. The idea of collective classification is to jointly analyze the stations. We start by obtaining the anomaly scores for all of the stations computed in the usual way. Let Open := the set of all stations for which we have scores and Done := {} * Repeat i. Let s be the station in Open with the highest anomaly score. ii. If the score is below the anomaly threshold then mark all remaining stations in Open as "ok" and exit iii. Else Flag s as "broken", remove it from Open, and add it to Done iv. Recompute the anomaly scores of all neighbors of s (i.e., all stations that use s as a neighbor). In the current system, we have no way of doing this, so we can just flag all of them as "ok", remove them from Open and add them to Done. * The action item is to implement collective classification. This is not currently assigned to anyone. Any volunteers to implement this? 1. GP Clogging Model. Implement and test the algorithm proposed by Ciira and Tom for detecting clogged sensors. This item is assigned to Ciira and Tom. 1. Multi-day Detection. Implement and test a version of the current algorithm that sums the precipitation for K consecutive days. We suspect that K of 5-8 days might give much more accurate results. This will involve modifying the training code and the scoring code. Every day, this code will score the sum of the precipitation for the K most recent days. Unfortunately, this will also increase the problems caused by missing data, because if any data is missing from any of those K days (for the target station or any of its neighbors), then we will not be able to score the station. This item is not assigned to anyone. Any volunteers to implement this? I will be traveling in India from January 22 through February 19. I will therefore miss the next two meetings (25 January and 8 February). Tema will run the meetings in my absence. --Tom Thomas G. Dietterich, Distinguished Professor Voice: 541-737-5559 School of Electrical Engineering FAX: 541-737-1300 and Computer Science URL: eecs.oregonstate.edu/~tgd US Mail: 1148 Kelley Engineering Center Office: 2067 Kelley Engineering Center Oregon State Univ., Corvallis, OR 97331-5501