Building analytical platform with Big Data solutions for log files of PanDA infrastructure

Bibliografiska uppgifter
Parent link:Journal of Physics: Conference Series
Vol. 1015 : Information Technologies in Business and Industry (ITBI2018).— 2018.— [032003, 6 p.]
Institutionell upphovsman: Национальный исследовательский Томский политехнический университет Инженерная школа неразрушающего контроля и безопасности Центр промышленной томографии Научно-производственная лаборатория "Бетатронная томография крупногабаритных объектов"
Övriga upphovsmän: Alekseev A. A. Aleksandr Aleksandrovich, Barreiro Megino F. G., Klimentov A. A., Korchuganova T. A., Maendo T., Padolski S. V.
Sammanfattning:Title screen
The paper describes the implementation of a high-performance system for the processing and analysis of log files for the PanDA infrastructure of the ATLAS experiment at the Large Hadron Collider (LHC), responsible for the workload management of order of 2M daily jobs across the Worldwide LHC Computing Grid. The solution is based on the ELK technology stack, which includes several components: Filebeat, Logstash, ElasticSearch (ES), and Kibana. Filebeat is used to collect data from logs. Logstash processes data and export to Elasticsearch. ES are responsible for centralized data storage. Accumulated data in ES can be viewed using a special software Kibana. These components were integrated with the PanDA infrastructure and replaced previous log processing systems for increased scalability and usability. The authors will describe all the components and their configuration tuning for the current tasks, the scale of the actual system and give several real-life examples of how this centralized log processing and storage service is used to showcase the advantages for daily operations.
Språk:engelska
Publicerad: 2018
Serie:Mathematical simulation and data processing
Ämnen:
Länkar:http://dx.doi.org/10.1088/1742-6596/1015/3/032003
http://earchive.tpu.ru/handle/11683/52921
Materialtyp: Elektronisk Bokavsnitt
KOHA link:https://koha.lib.tpu.ru/cgi-bin/koha/opac-detail.pl?biblionumber=659478
Beskrivning
Sammanfattning:Title screen
The paper describes the implementation of a high-performance system for the processing and analysis of log files for the PanDA infrastructure of the ATLAS experiment at the Large Hadron Collider (LHC), responsible for the workload management of order of 2M daily jobs across the Worldwide LHC Computing Grid. The solution is based on the ELK technology stack, which includes several components: Filebeat, Logstash, ElasticSearch (ES), and Kibana. Filebeat is used to collect data from logs. Logstash processes data and export to Elasticsearch. ES are responsible for centralized data storage. Accumulated data in ES can be viewed using a special software Kibana. These components were integrated with the PanDA infrastructure and replaced previous log processing systems for increased scalability and usability. The authors will describe all the components and their configuration tuning for the current tasks, the scale of the actual system and give several real-life examples of how this centralized log processing and storage service is used to showcase the advantages for daily operations.
DOI:10.1088/1742-6596/1015/3/032003