Publications
Here you can find my machine learning research papers, MSc thesis, and EngD thesis.
2023
- Probabilistic Demand Forecasting with Graph Neural Networks citationsKozodoi, Nikita, Zinovyeva, Liza, Valentin, Simon and 2 more authorsIn ECML-PKDD 2023 International Workshop on Machine Learning for Irregular Time Series 2023
@inproceedings{Kozodoi2023, author = {Kozodoi, Nikita and Zinovyeva, Liza and Valentin, Simon and Pereira, João and Agundez, Rodrigo}, title = {Probabilistic Demand Forecasting with Graph Neural Networks}, year = {2023}, url = {https://www.amazon.science/publications/probabilistic-demand-forecasting-with-graph-neural-networks}, booktitle = {ECML-PKDD 2023 International Workshop on Machine Learning for Irregular Time Series}, citation_count_index = {6} }
2021
- EngD ThesisFIOD Image Intelligence: An Application for Large-Scale Object Detection and AnalysisPereira, João2021
@phdthesis{Pereira2021EngDThesis, title = {FIOD Image Intelligence: An Application for Large-Scale Object Detection and Analysis}, institution = {Eindhoven University of Technology}, author = {Pereira, João}, year = {2021}, type = {{EngD} thesis}, note = {Permanently Confidential.}, }
2019
- Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection 65 citationsPereira, João, and Silveira, MargaridaIn 2019 IEEE International Conference on Big Data and Smart Computing (BigComp) Feb 2019
The amount of time series data generated in Healthcare is growing very fast and so is the need for methods that can analyse these data, detect anomalies and provide meaningful insights. However, most of the data available is unlabelled and, therefore, anomaly detection in this scenario has been a great challenge for researchers and practitioners. Recently, unsupervised representation learning with deep generative models has been applied to find representations of data, without the need for big labelled datasets. Motivated by their success, we propose an unsupervised framework for anomaly detection in time series data. In our method, both representation learning and anomaly detection are fully unsupervised. In addition, the training data may contain anomalous data. We first learn representations of time series using a Variational Recurrent Autoencoder. Afterwards, based on those representations, we detect anomalous time series using Clustering and the Wasserstein distance. Our results on the publicly available ECG5000 electrocardiogram dataset show the ability of the proposed approach to detect anomalous heartbeats in a fully unsupervised fashion, while providing structured and expressive data representations. Furthermore, our approach outperforms previous supervised and unsupervised methods on this dataset.
@inproceedings{Pereira2019BigComp, author = {Pereira, João and Silveira, Margarida}, citation_count_index = {1}, booktitle = {2019 IEEE International Conference on Big Data and Smart Computing (BigComp)}, title = {Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection}, year = {2019}, month = feb, publisher = {IEEE}, volume = {}, number = {}, pages = {1-7}, doi = {10.1109/BIGCOMP.2019.8679157}, url = {https://ieeexplore.ieee.org/document/8679157}, }
- Unsupervised Representation Learning and Anomaly Detection in ECG Sequences 26 citationsPereira, João, and Silveira, MargaridaInternational Journal of Data Mining and Bioinformatics, Aug 2019
While the big data revolution takes place, large amounts of electronic health records, such as electrocardiograms (ECGs) and vital signs data, have become available. These signals are often recorded as time series of observations and are now easier to obtain. In particular, with the arise of smart devices that can perform ECG, there is the quest for developing novel approaches that allow to monitor these signals efficiently, and quickly detect anomalies. However, since most data generated remains unlabelled, the task of anomaly detection is still very challenging. Unsupervised representation learning using deep generative models (e.g., variational autoencoders) has been used to learn expressive feature representations of sequences that can make downstream tasks, such as anomaly detection, easier to execute and more accurate. We propose an approach for unsupervised representation learning of ECG sequences using a variational autoencoder parameterised by recurrent neural networks, and use the learned representations for anomaly detection using multiple detection strategies. We tested our approach on the ECG5000 electrocardiogram dataset of the UCR time series classification archive. Our results show that the proposed approach is able to learn expressive representations of ECG sequences, and to detect anomalies with scores that outperform other both supervised and unsupervised methods.
@article{pereiraIJDMB, title = {Unsupervised Representation Learning and Anomaly Detection in ECG Sequences}, citation_count_index = {2}, journal = {International Journal of Data Mining and Bioinformatics,}, author = {Pereira, João and Silveira, Margarida}, volume = {22}, number = {4}, pages = {389-407}, numpages = {}, year = {2019}, month = aug, issn = {1748-5673}, publisher = {Inderscience Publishers}, }
2018
- Unsupervised Anomaly Detection in Energy Time Series Data Using Variational Recurrent Autoencoders with Attention 147 citationsPereira, João, and Silveira, MargaridaIn 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) Dec 2018
In the age of big data, time series are being generated in massive amounts. In the energy field, smart grids are enabling a unprecedented data acquisition with the integration of sensors and smart devices. In the context of renewable energies, there has been an increasing interest in solar photovoltaic energy generation. These installations are often integrated with smart sensors that measure the energy production. Such amount of data collected makes the quest for developing smart monitoring systems that can detect anomalous behaviour in these systems, trigger alerts and enable maintenance operations. In this paper, we propose a generic, unsupervised and scalable framework for anomaly detection in time series data, based on a variational recurrent autoencoder. Furthermore, we introduce attention in the model, by means of a variational self-attention mechanism (VSAM), to improve the performance of the encoding-decoding process. Afterwards, we perform anomaly detection based on the probabilistic reconstruction scores provided by our model. Our results on solar energy generation time series show the ability of the proposed approach to detect anomalous behaviour in time series data, while providing structured and expressive representations. Since it does not need labels to be trained, our methodology enables new applications for anomaly detection in energy time series data and beyond.
@inproceedings{Pereira2018ICMLA, author = {Pereira, João and Silveira, Margarida}, booktitle = {2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)}, year = {2018}, month = dec, publisher = {IEEE}, volume = {}, number = {}, pages = {1275-1282}, doi = {10.1109/ICMLA.2018.00207}, url = {https://ieeexplore.ieee.org/document/8614232}, citation_count_index = {0} }
- MSc ThesisUnsupervised Anomaly Detection in Time Series Data Using Deep Learning 7 citationsPereira, JoãoNov 2018
Detecting anomalies in time series data is an important task in areas such as energy, healthcare and security. The progress made in anomaly detection has been mostly based on approaches using supervised machine learning algorithms that require big labelled datasets to be trained. However, in the context of applications, collecting and annotating such large-scale datasets is difficult, time-consuming or even too expensive, while it requires domain knowledge from experts in the field. Therefore, anomaly detection has been such a great challenge for researchers and practitioners. This Thesis proposes a generic, unsupervised and scalable framework for anomaly detection in time series data. The proposed approach is based on a variational autoencoder, a deep generative model that combines variational inference with deep learning. Moreover, the architecture integrates recurrent neural networks to capture the sequential nature of time series data and its temporal dependencies. Furthermore, an attention mechanism is introduced to improve the performance of the encoding-decoding process. The results on solar energy generation and electrocardiogram time series data show the ability of the proposed model to detect anomalous patterns in time series from different fields of application, while providing structured and expressive data representations.
@mastersthesis{Pereira2018MScThesis, title = {Unsupervised Anomaly Detection in Time Series Data Using Deep Learning}, citation_count_index = {3}, author = {Pereira, João}, year = {2018}, month = nov, institution = {Instituto Superior Técnico, University of Lisbon}, }