[This email originated from outside of OSU. Use caution with links and attachments.]

Hi all,


After spending 50 hours trying to migrate the RabbitMQ into the new "Messages for RabbitMQ" service on IBM Cloud we decided to rather use an external managed RabbitMQ cluster and everything worked within 1 hour. Systems should now be operational again and the regular QA/QC is now catching up with the data from the last couple of days.


I expect from tomorrow on everything should be running normal again (at least until we need to start migrating CF applications end of May).



Kind regards,

 

Rick Hagenaars
TAHMO Project, Faculty CiTG, TU Delft, Stevinweg 1, 2628CN, Delft The Netherlands 
T (M) +31(0)645833496
E-mail: h.f.hagenaars@tudelft.nl or rhagenaars@tahmo.org


From: Slater, Michael <slater@oregonstate.edu>
Sent: Tuesday, March 14, 2023 3:50:04 AM
To: Rick Hagenaars - CITG
Subject: RE: [Rainqc-jobman] RainQC Job Manager daily report 2023-03-11
 

Hi Rick,

 

Thanks for letting me know about the outages. At worst, we’ll just re-add some scoring jobs and run them again. The last few days have been rather puzzling to me, as my own data completeness estimates showed that station data was in the system, but RainQC seemed to be having troubles reading any data after a certain date. That might be the difference between reading raw data vs. controlled data, though.

 

Best wishes for the migrations and other updates!

 

Cheers,

Michael

 

From: Rainqc-jobman <rainqc-jobman-bounces@engr.oregonstate.edu> On Behalf Of Rick Hagenaars - CITG
Sent: Sunday, March 12, 2023 1:03 AM
To: RainQC Job Manager <rainqc-jobman@engr.oregonstate.edu>
Subject: Re: [Rainqc-jobman] RainQC Job Manager daily report 2023-03-11

 

[This email originated from outside of OSU. Use caution with links and attachments.]

[This email originated from outside of OSU. Use caution with links and attachments.]

Hi Michael,

 

There’s indeed been some substantial outages, last week due to METER and this week because we needed to do migrations for the database. The compose services will no longer be operational after next Wednesday on IBM cloud. For sensordx we already changed the Postgres last November so it shouldn’t get affected.

 

There might be some more downtime on Monday since the rabbitmq migration wasn’t successful last week.

 

Kind regards,

 

Rick Hagenaars



On 11 Mar 2023, at 10:28, Slater, Michael <slater@oregonstate.edu> wrote:



I haven’t made the RainQC call in a while, but what appears to be large scale data layer failures seem to be popping up more than usual.

 

When I pulled the daily station data to compute the data completeness estimate, we had >60% of the models at 100% data completeness. For some unknown reason, a few minutes afterwards, the RainQC server could not pull data sufficient to compute even the scores for even a single model. But DID compute the scores for 11 models from before. So there was some sort of lapse in the availability of yesterday’s data (for today’s jobs).

 

--ms

 

From: slater@oregonstate.edu <slater@oregonstate.edu>
Sent: Saturday, March 11, 2023 1:06 AM
To: rainqc-jobman@ENGR.ORST.EDU
Cc: Slater, Michael <slater@oregonstate.edu>
Subject: RainQC Job Manager daily report 2023-03-11

 

Current UTC date: 2023-03-11 -> scoring models for previous day: 2023-03-10
---------------------------------------------------------------------------------
Daily Model Data Completeness Check:
data completeness    50% | complete models: 180 of 273 (65.93%)
data completeness    60% | complete models: 179 of 273 (65.57%)
data completeness    70% | complete models: 179 of 273 (65.57%)
data completeness    75% | complete models: 179 of 273 (65.57%)
data completeness    80% | complete models: 179 of 273 (65.57%)
data completeness    85% | complete models: 178 of 273 (65.20%)
data completeness    90% | complete models: 178 of 273 (65.20%)
data completeness    95% | complete models: 178 of 273 (65.20%)
data completeness   100% | complete models: 168 of 273 (61.54%)
----------------------------
station status | total: 313, delayed: 71, offline 24h: 48, offline week: 42
 | battery, min: 0, max: 100, mean: 63.81, std dev: 31.58
 | battery, common values: [(100, 221), (0, 50), (85, 2), (72, 2), (83, 2)]
 | battery <= mean, common countries: [('GH', 18), ('UG', 13), ('KE', 7), ('ZM', 4), ('ML', 4)]
----------------------------
63 LOW DATA (< 0.9) and 55 NO DATA weather stations impacted 95 RainQC models
LOW/NO data station impact on models: [('TA00076', 6), ('TA00231', 4), ('TA00530', 4), ('TA00700', 4), ('TA00011', 3), ('TA00173', 3), ('TA00032', 3), ('TA00035', 3), ('TA00036', 3), ('TA00267', 3), ('TA00066', 3), ('TA00126', 3), ('TA00217', 3), ('TA00482', 3), ('TA00542', 3), ('TA00308', 2), ('TA00043', 2), ('TA00050', 2), ('TA00102', 2), ('TA00165', 2), ('TA00487', 2), ('TA00684', 2), ('TA00210', 2), ('TA00223', 2), ('TA00232', 2), ('TA00262', 2), ('TA00271', 2), ('TA00314', 2), ('TA00691', 2), ('TA00339', 2), ('TA00373', 2), ('TA00398', 2), ('TA00451', 2), ('TA00462', 2), ('TA00471', 2), ('TA00278', 1), ('TA00014', 1), ('TA00031', 1), ('TA00044', 1), ('TA00095', 1), ('TA00157', 1), ('TA00201', 1), ('TA00212', 1), ('TA00219', 1), ('TA00229', 1), ('TA00237', 1), ('TA00260', 1), ('TA00276', 1), ('TA00287', 1), ('TA00290', 1), ('TA00336', 1), ('TA00350', 1), ('TA00362', 1), ('TA00369', 1), ('TA00422', 1), ('TA00432', 1), ('TA00493', 1), ('TA00524', 1), ('TA00533', 1), ('TA00652', 1), ('TA00655', 1), ('TA00677', 1), ('TA00702', 1)]
-----------------------------------------------------------
Processed daily jobs for UTC date: 2023-03-10
Start time: 2023-03-11T08:22:49+00:00
End time  : 2023-03-11T09:05:45+00:00
Elapsed time HH:MM:SS: 0:42:56
---------------------
Before job processing job table stats:
Total 'success' count: 122
Total 'failure' count: 137
Total record count: 1093
Job history table record count: 45128
Scoring job record table record count: 182
---------------------
After job processing job table stats:
Total 'success' count:               11 (flag=2 count:   0) (flag 2->1 downgrades:   0)
 | 'success' count for 2023-03-10:    0 (flag=2 count:   0)
 | 'success' count for 2023-03-09:   11 (flag=2 count:   0)
 | 'success' count for 2023-03-08:    0 (flag=2 count:   0)
 | 'success' count for 2023-03-07:    0 (flag=2 count:   0)
 | 'success' count for 2023-03-06:    0 (flag=2 count:   0)
 | 'success' count for 2023-03-05:    0 (flag=2 count:   0)
 | 'success' count for 2023-03-04:    0 (flag=2 count:   0)
Anomalies (flag=2):
--------
Total 'failure' count: 134
Total record count: 1107
Job history table record count: 45387
Scoring job record table record count: 183
-----------------------------------------------------------

--
Rainqc-jobman mailing list
Rainqc-jobman@ENGR.ORST.EDU
https://urldefense.com/v3/__https://it.engineering.oregonstate.edu/mailman/listinfo/rainqc-jobman__;!!PAKc-5URQlI!6QGgmZLAwrD_dlWr4-HM0eUE7_rYBr-0ArOApIjXdJ1cqmyYyeT2I1cUNp2-y7XgdMI0enN3em4U3EcC4x6ZqA8J9dA$