摘要:Prudent and meaningful performance evaluation of algorithms is essential for the progression of any research field. In the field of Non-Intrusive Load Monitoring (NILM), performance evaluation can be conducted on real-world aggregate signals, provided by smart energy meters or artificial superpositions of individual load signals (i.e., denoised aggregates). It has long been suspected that testing on these denoised aggregates provides better evaluation results mainly due to the fact that the signal is less complex. Complexity in real-world aggregate signals increases with the number of unknown/untracked loads. Although this is a known performance reporting problem, an investigation into the actual performance gap between real and denoised testing is still pending. In this paper, we examine the performance gap between testing on real-world and denoised aggregates with the aim of bringing clarity into this matter. Starting with an assessment of noise levels in datasets, we find significant differences in test cases. We give broad insights into our evaluation setup comprising three load disaggregation algorithms, two of them relying on neural network architectures. The results presented in this paper, based on studies covering three scenarios with ascending noise levels, show a strong tendency towards load disaggregation algorithms providing significantly better performance on denoised aggregate signals. A closer look at the outcome of our studies reveals that all appliance types could be subject to this phenomenon. We conclude the paper by discussing aspects that could be causing these considerable gaps between real and denoised testing in NILM.
其他摘要:Abstract Prudent and meaningful performance evaluation of algorithms is essential for the progression of any research field. In the field of Non-Intrusive Load Monitoring (NILM), performance evaluation can be conducted on real-world aggregate signals, provided by smart energy meters or artificial superpositions of individual load signals (i.e., denoised aggregates). It has long been suspected that testing on these denoised aggregates provides better evaluation results mainly due to the fact that the signal is less complex. Complexity in real-world aggregate signals increases with the number of unknown/untracked loads. Although this is a known performance reporting problem, an investigation into the actual performance gap between real and denoised testing is still pending. In this paper, we examine the performance gap between testing on real-world and denoised aggregates with the aim of bringing clarity into this matter. Starting with an assessment of noise levels in datasets, we find significant differences in test cases. We give broad insights into our evaluation setup comprising three load disaggregation algorithms, two of them relying on neural network architectures. The results presented in this paper, based on studies covering three scenarios with ascending noise levels, show a strong tendency towards load disaggregation algorithms providing significantly better performance on denoised aggregate signals. A closer look at the outcome of our studies reveals that all appliance types could be subject to this phenomenon. We conclude the paper by discussing aspects that could be causing these considerable gaps between real and denoised testing in NILM.