2024
- Getting practical with {G}eo{SPARQL} and {A}pache {J}enaIn: Timo Homburg, Beyza Yaman, Mohamed Ahmed Sherif and Axel-Cyrille Ngonga Ngomo (eds.): Proceedings of the 6th International Workshop on Geospatial Linked Data 2024 co-located with 21st Extended Semantic Web Conference (ESWC 2024), {CEUR} Workshop Proceedings. vol. 3743. Hersonissos, GreeceSimon Bin, Claus Stadler, Lorenz B{ü}hmann and Michael MartinThis paper explores the integration of geo-spatial data into RDF (Resource Description Framework) using Apache Jena, a popular Java-based framework for building Semantic Web applications. We explain the basic representation of geo-spatial data in RDF with a focus on both the new GeoSPARQL 1.1 standard and Apache Jena. Our investigation covers advanced techniques, such as transformation of coordinate reference systems, aggregation of geo-spatial data, creation of new geo-objects, and simplification of polygons. Additionally, we discuss the usage of the H3 Grid as a Discrete Global Grid System (DGGS) for geo-spatial conversion. Furthermore, we present performance optimisations specific to Apache Jena, including per-graph geo-indexing, improved geo-index serialization for faster startup times, and manual optimisation of geo-spatial queries. We conclude with a comparison of different geo-functions and outline future directions for enhancing geo-spatial data management in RDF.
@inproceedings{bin2024geosparql,
abstract = {This paper explores the integration of geo-spatial data into RDF (Resource Description Framework) using Apache Jena, a popular Java-based framework for building Semantic Web applications. We explain the basic representation of geo-spatial data in RDF with a focus on both the new GeoSPARQL 1.1 standard and Apache Jena. Our investigation covers advanced techniques, such as transformation of coordinate reference systems, aggregation of geo-spatial data, creation of new geo-objects, and simplification of polygons. Additionally, we discuss the usage of the H3 Grid as a Discrete Global Grid System (DGGS) for geo-spatial conversion. Furthermore, we present performance optimisations specific to Apache Jena, including per-graph geo-indexing, improved geo-index serialization for faster startup times, and manual optimisation of geo-spatial queries. We conclude with a comparison of different geo-functions and outline future directions for enhancing geo-spatial data management in RDF.},
address = {Hersonissos, Greece},
author = {Bin, Simon and Stadler, Claus and B{ü}hmann, Lorenz and Martin, Michael},
booktitle = {Proceedings of the 6th International Workshop on Geospatial Linked Data 2024 co-located with 21st Extended Semantic Web Conference (ESWC 2024)},
editor = {Homburg, Timo and Yaman, Beyza and Sherif, Mohamed Ahmed and Ngomo, Axel-Cyrille Ngonga},
keywords = {sys:relevantFor:infai},
month = {05},
series = {{CEUR} Workshop Proceedings},
title = {Getting practical with {G}eo{SPARQL} and {A}pache {J}ena},
volume = 3743,
year = 2024
}%0 Conference Paper
%1 bin2024geosparql
%A Bin, Simon
%A Stadler, Claus
%A B{ü}hmann, Lorenz
%A Martin, Michael
%B Proceedings of the 6th International Workshop on Geospatial Linked Data 2024 co-located with 21st Extended Semantic Web Conference (ESWC 2024)
%C Hersonissos, Greece
%D 2024
%E Homburg, Timo
%E Yaman, Beyza
%E Sherif, Mohamed Ahmed
%E Ngomo, Axel-Cyrille Ngonga
%T Getting practical with {G}eo{SPARQL} and {A}pache {J}ena
%U https://ceur-ws.org/Vol-3743/paper2.pdf
%V 3743
%X This paper explores the integration of geo-spatial data into RDF (Resource Description Framework) using Apache Jena, a popular Java-based framework for building Semantic Web applications. We explain the basic representation of geo-spatial data in RDF with a focus on both the new GeoSPARQL 1.1 standard and Apache Jena. Our investigation covers advanced techniques, such as transformation of coordinate reference systems, aggregation of geo-spatial data, creation of new geo-objects, and simplification of polygons. Additionally, we discuss the usage of the H3 Grid as a Discrete Global Grid System (DGGS) for geo-spatial conversion. Furthermore, we present performance optimisations specific to Apache Jena, including per-graph geo-indexing, improved geo-index serialization for faster startup times, and manual optimisation of geo-spatial queries. We conclude with a comparison of different geo-functions and outline future directions for enhancing geo-spatial data management in RDF. - {KGCW2024} Challenge Report: {RDF}{P}rocessing{T}oolkitIn: David Chaves{-}Fraga, Anastasia Dimou, Ana Iglesias{-}Molina, Umutcan Serles and Dylan Van Assche (eds.): Proceedings of the 5th International Workshop on Knowledge Graph Construction co-located with 21th Extended Semantic Web Conference {(ESWC} 2024), {CEUR} Workshop Proceedings. vol. 3718. Hersonissos, GreeceClaus Stadler and Simon BinThis is the report of the participation of the RDFProcessingToolkit (RPT) in the KGCW2024 Challenge at ESWC 2024. The RPT system processes RML specifications by translating them into a series of extended SPARQL CONSTRUCT queries. The necessary SPARQL extensions are provided as plugins for the Apache Jena framework. This year’s challenge comprises a performance and a conformance track. For the performance track, a homogeneous environment was kindly provided by the workshop organizers in order to facilitate comparability of measurements. In this track, we mainly adapted the setup from our last year’s participation. For the conformance track, we updated our system with support for the rml-core module of the upcoming RML revision. We also report on the issues and shortcomings we encountered as a base for future improvements.
@inproceedings{DBLP:conf/kgcw/StadlerB24,
abstract = {This is the report of the participation of the RDFProcessingToolkit (RPT) in the KGCW2024 Challenge at ESWC 2024. The RPT system processes RML specifications by translating them into a series of extended SPARQL CONSTRUCT queries. The necessary SPARQL extensions are provided as plugins for the Apache Jena framework. This year’s challenge comprises a performance and a conformance track. For the performance track, a homogeneous environment was kindly provided by the workshop organizers in order to facilitate comparability of measurements. In this track, we mainly adapted the setup from our last year’s participation. For the conformance track, we updated our system with support for the rml-core module of the upcoming RML revision. We also report on the issues and shortcomings we encountered as a base for future improvements.},
address = {Hersonissos, Greece},
author = {Stadler, Claus and Bin, Simon},
booktitle = {Proceedings of the 5th International Workshop on Knowledge Graph Construction co-located with 21th Extended Semantic Web Conference {(ESWC} 2024)},
editor = {Chaves{-}Fraga, David and Dimou, Anastasia and Iglesias{-}Molina, Ana and Serles, Umutcan and Assche, Dylan Van},
keywords = {sys:relevantFor:infai},
month = {05},
series = {{CEUR} Workshop Proceedings},
title = {{KGCW2024} Challenge Report: {RDF}{P}rocessing{T}oolkit},
volume = 3718,
year = 2024
}%0 Conference Paper
%1 DBLP:conf/kgcw/StadlerB24
%A Stadler, Claus
%A Bin, Simon
%B Proceedings of the 5th International Workshop on Knowledge Graph Construction co-located with 21th Extended Semantic Web Conference {(ESWC} 2024)
%C Hersonissos, Greece
%D 2024
%E Chaves{-}Fraga, David
%E Dimou, Anastasia
%E Iglesias{-}Molina, Ana
%E Serles, Umutcan
%E Assche, Dylan Van
%T {KGCW2024} Challenge Report: {RDF}{P}rocessing{T}oolkit
%U https://ceur-ws.org/Vol-3718/paper13.pdf
%V 3718
%X This is the report of the participation of the RDFProcessingToolkit (RPT) in the KGCW2024 Challenge at ESWC 2024. The RPT system processes RML specifications by translating them into a series of extended SPARQL CONSTRUCT queries. The necessary SPARQL extensions are provided as plugins for the Apache Jena framework. This year’s challenge comprises a performance and a conformance track. For the performance track, a homogeneous environment was kindly provided by the workshop organizers in order to facilitate comparability of measurements. In this track, we mainly adapted the setup from our last year’s participation. For the conformance track, we updated our system with support for the rml-core module of the upcoming RML revision. We also report on the issues and shortcomings we encountered as a base for future improvements. - {FAIR} Data Publishing with Apache MavenIn: Leyla Jael Castro, Dietrich Rebholz-Schuhmann, Danilo Dessì and Sonja Schimmler (eds.): Proceedings of the Fourth Workshop on Metadata and Research (objects) Management for Linked Open Science — DaMaLOS 2024 co-located with Extended Semantic Web Conference (ESWC). Hersonissos, Greece : PUBLISSOClaus Stadler, Lorenz Bühmann and Simon Bin
@inproceedings{Stadler2024fair,
address = {Hersonissos, Greece},
author = {Stadler, Claus and Bühmann, Lorenz and Bin, Simon},
booktitle = {Proceedings of the Fourth Workshop on Metadata and Research (objects) Management for Linked Open Science — DaMaLOS 2024 co-located with Extended Semantic Web Conference (ESWC)},
editor = {Castro, Leyla Jael and Rebholz-Schuhmann, Dietrich and Dessì, Danilo and Schimmler, Sonja},
keywords = {sys:relevantFor:infai},
month = {05},
publisher = {PUBLISSO},
title = {{FAIR} Data Publishing with Apache Maven},
year = 2024
}%0 Conference Paper
%1 Stadler2024fair
%A Stadler, Claus
%A Bühmann, Lorenz
%A Bin, Simon
%B Proceedings of the Fourth Workshop on Metadata and Research (objects) Management for Linked Open Science — DaMaLOS 2024 co-located with Extended Semantic Web Conference (ESWC)
%C Hersonissos, Greece
%D 2024
%E Castro, Leyla Jael
%E Rebholz-Schuhmann, Dietrich
%E Dessì, Danilo
%E Schimmler, Sonja
%I PUBLISSO
%R 10.4126/FRL01-006474023
%T {FAIR} Data Publishing with Apache Maven
%U https://repository.publisso.de/resource/frl:6483281/data - LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPTIn: Christian Zinke-Wehlmann and Julia Friedrich (eds.): First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow (AITomorrow) 2023, Informatik aktuell. Wiesbaden : Springer Fachmedien Wiesbaden — ISBN 978–3‑658–43705‑3, pp. 103–115Lars-Peter Meyer, Claus Stadler, Johannes Frey, Norman Radtke, Kurt Junghanns, Roy Meissner, Gordian Dziwis, Kirill Bulert, Michael MartinKnowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.
@inproceedings{Meyer2023LLMassistedKnowledge,
abstract = {Knowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.},
address = {Wiesbaden},
author = {Meyer, Lars-Peter and Stadler, Claus and Frey, Johannes and Radtke, Norman and Junghanns, Kurt and Meissner, Roy and Dziwis, Gordian and Bulert, Kirill and Martin, Michael},
booktitle = {First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow (AITomorrow) 2023},
editor = {Zinke-Wehlmann, Christian and Friedrich, Julia},
keywords = {sys:relevantFor:infai},
month = {04},
pages = {103–115},
publisher = {Springer Fachmedien Wiesbaden},
series = {Informatik aktuell},
title = {LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT},
year = 2024
}%0 Conference Paper
%1 Meyer2023LLMassistedKnowledge
%A Meyer, Lars-Peter
%A Stadler, Claus
%A Frey, Johannes
%A Radtke, Norman
%A Junghanns, Kurt
%A Meissner, Roy
%A Dziwis, Gordian
%A Bulert, Kirill
%A Martin, Michael
%B First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow (AITomorrow) 2023
%C Wiesbaden
%D 2024
%E Zinke-Wehlmann, Christian
%E Friedrich, Julia
%I Springer Fachmedien Wiesbaden
%P 103–115
%R 10.1007/978–3‑658–43705-3_8
%T LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT
%U https://link.springer.com/chapter/10.1007/978–3‑658–43705-3_8
%X Knowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.
%@ 978–3‑658–43705‑3
2023
- Base Platform for Knowledge Graphs with Free SoftwareIn: Sebastian Tramp, Ricardo Usbeck, Natanael Arndt, Julia Holze and Sören Auer (eds.): Proceedings of the International Workshop on Linked Data-driven Resilience Research 2023, {CEUR} Workshop Proceedings. vol. 3401. Hersonissos, GreeceSimon Bin, Claus Stadler, Norman Radtke, Kurt Junghanns, Sabine Gründer-Fahrer and Michael MartinWe present an Open Source base platform for the CoyPu knowledge graph project in the resilience domain. We report on our experiences with several tools which are used to create, maintain, serve, view and explore a modular large-scale knowledge graph, as well as the adaptions that were necessary to enable frictionless interaction from both performance and usability perspectives. For this purpose, several adjustments had to be made. We provide a broad view of different programs which are of relevance to this domain. We demonstrate that while it is already possible to achieve good results with free software, there are still several pain points that need to be addressed. Resolution of these issues is often not only a matter of configuration but requires modification of the source code as well.
@inproceedings{bin-2023–base-platform,
abstract = {We present an Open Source base platform for the CoyPu knowledge graph project in the resilience domain. We report on our experiences with several tools which are used to create, maintain, serve, view and explore a modular large-scale knowledge graph, as well as the adaptions that were necessary to enable frictionless interaction from both performance and usability perspectives. For this purpose, several adjustments had to be made. We provide a broad view of different programs which are of relevance to this domain. We demonstrate that while it is already possible to achieve good results with free software, there are still several pain points that need to be addressed. Resolution of these issues is often not only a matter of configuration but requires modification of the source code as well.},
address = {Hersonissos, Greece},
author = {Bin, Simon and Stadler, Claus and Radtke, Norman and Junghanns, Kurt and Gründer-Fahrer, Sabine and Martin, Michael},
booktitle = {Proceedings of the International Workshop on Linked Data-driven Resilience Research 2023},
editor = {Tramp, Sebastian and Usbeck, Ricardo and Arndt, Natanael and Holze, Julia and Auer, Sören},
keywords = {sys:relevantFor:infai},
month = {05},
series = {{CEUR} Workshop Proceedings},
title = {Base Platform for Knowledge Graphs with Free Software},
volume = 3401,
year = 2023
}%0 Conference Paper
%1 bin-2023–base-platform
%A Bin, Simon
%A Stadler, Claus
%A Radtke, Norman
%A Junghanns, Kurt
%A Gründer-Fahrer, Sabine
%A Martin, Michael
%B Proceedings of the International Workshop on Linked Data-driven Resilience Research 2023
%C Hersonissos, Greece
%D 2023
%E Tramp, Sebastian
%E Usbeck, Ricardo
%E Arndt, Natanael
%E Holze, Julia
%E Auer, Sören
%T Base Platform for Knowledge Graphs with Free Software
%U https://ceur-ws.org/Vol-3401/paper6.pdf
%V 3401
%X We present an Open Source base platform for the CoyPu knowledge graph project in the resilience domain. We report on our experiences with several tools which are used to create, maintain, serve, view and explore a modular large-scale knowledge graph, as well as the adaptions that were necessary to enable frictionless interaction from both performance and usability perspectives. For this purpose, several adjustments had to be made. We provide a broad view of different programs which are of relevance to this domain. We demonstrate that while it is already possible to achieve good results with free software, there are still several pain points that need to be addressed. Resolution of these issues is often not only a matter of configuration but requires modification of the source code as well. - {KGCW}2023 Challenge Report {RDF}{P}rocessing{T}oolkit / SansaIn: 4th International Workshop on Knowledge Graph Construction @ ESWC 2023, CEUR workshop proceedings. Hersonissos, GreeceSimon Bin, Claus Stadler and Lorenz BühmannThis is the report of our participation in the KGCW2023 Challenge @ ESWC 2023 with our RDFProcessingToolkit/Sansa system which won the “fastest” tool award. The challenge was about the construction of RDF knowledge graphs from RML specifications with varying complexity in regard to the mix of input formats, characteristics of the data and the needed join operations. We detail how we integrated our tool into the provided benchmark framework. Thereby we also report on the issues and shortcomings we encountered as a base for future improvements. Furthermore, we provide an analysis of the data measured with the benchmark framework.
@inproceedings{stadler2023-kgcw-challenge,
abstract = {This is the report of our participation in the KGCW2023 Challenge @ ESWC 2023 with our RDFProcessingToolkit/Sansa system which won the “fastest” tool award. The challenge was about the construction of RDF knowledge graphs from RML specifications with varying complexity in regard to the mix of input formats, characteristics of the data and the needed join operations. We detail how we integrated our tool into the provided benchmark framework. Thereby we also report on the issues and shortcomings we encountered as a base for future improvements. Furthermore, we provide an analysis of the data measured with the benchmark framework.},
address = {Hersonissos, Greece},
author = {Bin, Simon and Stadler, Claus and Bühmann, Lorenz},
booktitle = {4th International Workshop on Knowledge Graph Construction @ ESWC 2023},
keywords = {sys:relevantFor:infai},
number = 3471,
series = {CEUR workshop proceedings},
title = {{KGCW}2023 Challenge Report {RDF}{P}rocessing{T}oolkit / Sansa},
year = 2023
}%0 Conference Paper
%1 stadler2023-kgcw-challenge
%A Bin, Simon
%A Stadler, Claus
%A Bühmann, Lorenz
%B 4th International Workshop on Knowledge Graph Construction @ ESWC 2023
%C Hersonissos, Greece
%D 2023
%N 3471
%T {KGCW}2023 Challenge Report {RDF}{P}rocessing{T}oolkit / Sansa
%U https://ceur-ws.org/Vol-3471/paper12.pdf
%X This is the report of our participation in the KGCW2023 Challenge @ ESWC 2023 with our RDFProcessingToolkit/Sansa system which won the “fastest” tool award. The challenge was about the construction of RDF knowledge graphs from RML specifications with varying complexity in regard to the mix of input formats, characteristics of the data and the needed join operations. We detail how we integrated our tool into the provided benchmark framework. Thereby we also report on the issues and shortcomings we encountered as a base for future improvements. Furthermore, we provide an analysis of the data measured with the benchmark framework. - Scaling RML and SPARQL-based Knowledge Graph Construction with Apache SparkIn: 4th International Workshop on Knowledge Graph Construction @ ESWC 2023, CEUR workshop proceedings. vol. 3471. Hersonissos, GreeceClaus Stadler, Lorenz Bühmann, Lars-Peter Meyer and Michael MartinApproaches for the construction of knowledge graphs from heterogeneous data sources range from ad-hoc scripts to dedicated mapping languages. Two common foundations are thereby RML and SPARQL. So far, both approaches are treated as different: On the one hand there are tools specifically for processing RML whereas on the other hand there are tools that extend SPARQL in order to incorporate additional data sources. In this work, we first show how this gap can be bridged by translating RML to a sequence of SPARQL CONSTRUCT queries and introduce the necessary SPARQL extensions. In a subsequent step, we employ techniques to optimize SPARQL query workloads as well as individual query execution times in order to obtain an optimized sequence of queries with respect to the order and uniqueness of the generated triples. Finally, we present a corresponding SPARQL query execution engine based on the Apache Spark Big Data framework. In our evaluation on benchmarks we show that our approach is capable of achieving RML mapping execution performance that surpasses the current state of the art.
@inproceedings{stadler2023-scaling-rml,
abstract = {Approaches for the construction of knowledge graphs from heterogeneous data sources range from ad-hoc scripts to dedicated mapping languages. Two common foundations are thereby RML and SPARQL. So far, both approaches are treated as different: On the one hand there are tools specifically for processing RML whereas on the other hand there are tools that extend SPARQL in order to incorporate additional data sources. In this work, we first show how this gap can be bridged by translating RML to a sequence of SPARQL CONSTRUCT queries and introduce the necessary SPARQL extensions. In a subsequent step, we employ techniques to optimize SPARQL query workloads as well as individual query execution times in order to obtain an optimized sequence of queries with respect to the order and uniqueness of the generated triples. Finally, we present a corresponding SPARQL query execution engine based on the Apache Spark Big Data framework. In our evaluation on benchmarks we show that our approach is capable of achieving RML mapping execution performance that surpasses the current state of the art.},
address = {Hersonissos, Greece},
author = {Stadler, Claus and Bühmann, Lorenz and Meyer, Lars-Peter and Martin, Michael},
booktitle = {4th International Workshop on Knowledge Graph Construction @ ESWC 2023},
keywords = {sys:relevantFor:infai},
series = {CEUR workshop proceedings},
title = {Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark},
volume = 3471,
year = 2023
}%0 Conference Paper
%1 stadler2023-scaling-rml
%A Stadler, Claus
%A Bühmann, Lorenz
%A Meyer, Lars-Peter
%A Martin, Michael
%B 4th International Workshop on Knowledge Graph Construction @ ESWC 2023
%C Hersonissos, Greece
%D 2023
%T Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark
%U https://ceur-ws.org/Vol-3471/paper8.pdf
%V 3471
%X Approaches for the construction of knowledge graphs from heterogeneous data sources range from ad-hoc scripts to dedicated mapping languages. Two common foundations are thereby RML and SPARQL. So far, both approaches are treated as different: On the one hand there are tools specifically for processing RML whereas on the other hand there are tools that extend SPARQL in order to incorporate additional data sources. In this work, we first show how this gap can be bridged by translating RML to a sequence of SPARQL CONSTRUCT queries and introduce the necessary SPARQL extensions. In a subsequent step, we employ techniques to optimize SPARQL query workloads as well as individual query execution times in order to obtain an optimized sequence of queries with respect to the order and uniqueness of the generated triples. Finally, we present a corresponding SPARQL query execution engine based on the Apache Spark Big Data framework. In our evaluation on benchmarks we show that our approach is capable of achieving RML mapping execution performance that surpasses the current state of the art. - QROWD—A Platform for Integrating Citizens in Smart City Data AnalyticsIn: Pradeep Kumar Singh, Marcin Paprzycki, Mohamad Essaaidi and Shahram Rahimi (eds.): Sustainable Smart Cities: Theoretical Foundations and Practical Considerations. Cham : Springer International Publishing — ISBN 978–3‑031–08815‑5, pp. 285–321Luis-Daniel Ib{á}{\~{n}}ez, Eddy Maddalena, Richard Gomer, Elena Simperl, Mattia Zeni, Enrico Bignotti, Ronald Chenu-Abente, Fausto Giunchiglia, Patrick Westphal, et al.Optimizing mobility services is one of the greatest challenges Smart Cities face in their efforts to improve residents’ wellbeing and reduce {\$}{\$}{\backslash}text {\{}CO{\}}{\_}{\{}2{\}}{\$}{\$}emissions. The advent of IoT has created unparalleled opportunities to collect large amounts of data about how people use transportation. This data could be used to ascertain the quality and reach of the services offered and to inform future policy—provided cities have the capabilities to process, curate, integrate and analyse the data effectively. At the same time, to be truly ‘Smart’, cities need to ensure that the data-driven decisions they make reflect the needs of their citizens, create feedback loops, and widen participation. In this chapter, we introduce QROWD, a data integration and analytics platform that seamlessly integrates multiple data sources alongside human, social and computational intelligence to build hybrid, automated data-centric workflows. By doing so, QROWD applications can take advantage of the best of both worlds: the accuracy and scale of machine computation, and the skills, knowledge and expertise of people. We present the architecture and main components of the platform, as well as its usage to realise two mobility use cases: estimating the modal split, which refers to trips people take that involve more than one type of transport, and urban auditing.
@inbook{Ibanez2023-qrowd,
abstract = {Optimizing mobility services is one of the greatest challenges Smart Cities face in their efforts to improve residents’ wellbeing and reduce {\$}{\$}{\backslash}text {\{}CO{\}}{\_}{\{}2{\}}{\$}{\$}emissions. The advent of IoT has created unparalleled opportunities to collect large amounts of data about how people use transportation. This data could be used to ascertain the quality and reach of the services offered and to inform future policy—provided cities have the capabilities to process, curate, integrate and analyse the data effectively. At the same time, to be truly ‘Smart’, cities need to ensure that the data-driven decisions they make reflect the needs of their citizens, create feedback loops, and widen participation. In this chapter, we introduce QROWD, a data integration and analytics platform that seamlessly integrates multiple data sources alongside human, social and computational intelligence to build hybrid, automated data-centric workflows. By doing so, QROWD applications can take advantage of the best of both worlds: the accuracy and scale of machine computation, and the skills, knowledge and expertise of people. We present the architecture and main components of the platform, as well as its usage to realise two mobility use cases: estimating the modal split, which refers to trips people take that involve more than one type of transport, and urban auditing.},
address = {Cham},
author = {Ib{á}{\~{n}}ez, Luis-Daniel and Maddalena, Eddy and Gomer, Richard and Simperl, Elena and Zeni, Mattia and Bignotti, Enrico and Chenu-Abente, Ronald and Giunchiglia, Fausto and Westphal, Patrick and Stadler, Claus and Dziwis, Gordian and Lehmann, Jens and Yumusak, Semih and Voigt, Martin and Sanguino, Maria-Angeles and Villaz{á}n, Javier and Ruiz, Ricardo and Pariente-Lobo, Tomas},
booktitle = {Sustainable Smart Cities: Theoretical Foundations and Practical Considerations},
editor = {Singh, Pradeep Kumar and Paprzycki, Marcin and Essaaidi, Mohamad and Rahimi, Shahram},
keywords = {sys:relevantFor:infai},
pages = {285–321},
publisher = {Springer International Publishing},
title = {QROWD—A Platform for Integrating Citizens in Smart City Data Analytics},
year = 2023
}%0 Book Section
%1 Ibanez2023-qrowd
%A Ib{á}{\~{n}}ez, Luis-Daniel
%A Maddalena, Eddy
%A Gomer, Richard
%A Simperl, Elena
%A Zeni, Mattia
%A Bignotti, Enrico
%A Chenu-Abente, Ronald
%A Giunchiglia, Fausto
%A Westphal, Patrick
%A Stadler, Claus
%A Dziwis, Gordian
%A Lehmann, Jens
%A Yumusak, Semih
%A Voigt, Martin
%A Sanguino, Maria-Angeles
%A Villaz{á}n, Javier
%A Ruiz, Ricardo
%A Pariente-Lobo, Tomas
%B Sustainable Smart Cities: Theoretical Foundations and Practical Considerations
%C Cham
%D 2023
%E Singh, Pradeep Kumar
%E Paprzycki, Marcin
%E Essaaidi, Mohamad
%E Rahimi, Shahram
%I Springer International Publishing
%P 285–321
%R 10.1007/978–3‑031–08815-5_16
%T QROWD—A Platform for Integrating Citizens in Smart City Data Analytics
%U https://svn.aksw.org/papers/2022/SSC_qrowd/public.pdf
%X Optimizing mobility services is one of the greatest challenges Smart Cities face in their efforts to improve residents’ wellbeing and reduce {\$}{\$}{\backslash}text {\{}CO{\}}{\_}{\{}2{\}}{\$}{\$}emissions. The advent of IoT has created unparalleled opportunities to collect large amounts of data about how people use transportation. This data could be used to ascertain the quality and reach of the services offered and to inform future policy—provided cities have the capabilities to process, curate, integrate and analyse the data effectively. At the same time, to be truly ‘Smart’, cities need to ensure that the data-driven decisions they make reflect the needs of their citizens, create feedback loops, and widen participation. In this chapter, we introduce QROWD, a data integration and analytics platform that seamlessly integrates multiple data sources alongside human, social and computational intelligence to build hybrid, automated data-centric workflows. By doing so, QROWD applications can take advantage of the best of both worlds: the accuracy and scale of machine computation, and the skills, knowledge and expertise of people. We present the architecture and main components of the platform, as well as its usage to realise two mobility use cases: estimating the modal split, which refers to trips people take that involve more than one type of transport, and urban auditing.
%@ 978–3‑031–08815‑5
2022
- {LSQ} 2.0: A linked dataset of {SPARQL} query logsIn: Philippe Cudré-Mauroux (ed.) Semantic Web, IOS Press, pp. 1–23Claus Stadler, Muhammad Saleem, Qaiser Mehmood, Carlos Buil-Aranda, Michel Dumontier, Aidan Hogan and Axel-Cyrille Ngonga NgomoWe present the Linked SPARQL Queries (LSQ) dataset, which currently describes 43.95 million executions of 11.56 million unique SPARQL queries extracted from the logs of 27 different endpoints. The LSQ dataset provides RDF descriptions of each such query, which are indexed in a public LSQ endpoint, allowing interested parties to find queries with the characteristics they require. We begin by describing the use cases envisaged for the LSQ dataset, which include applications for research on common features of queries, for building custom benchmarks, and for designing user interfaces. We then discuss how LSQ has been used in practice since the release of four initial SPARQL logs in 2015. We discuss the model and vocabulary that we use to represent these queries in RDF. We then provide a brief overview of the 27 endpoints from which we extracted queries in terms of the domain to which they pertain and the data they contain. We provide statistics on the queries included from each log, including the number of query executions, unique queries, as well as distributions of queries for a variety of selected characteristics. We finally discuss how the LSQ dataset is hosted and how it can be accessed and leveraged by interested parties for their use cases.
@article{stadler2022-lsq20,
abstract = {We present the Linked SPARQL Queries (LSQ) dataset, which currently describes 43.95 million executions of 11.56 million unique SPARQL queries extracted from the logs of 27 different endpoints. The LSQ dataset provides RDF descriptions of each such query, which are indexed in a public LSQ endpoint, allowing interested parties to find queries with the characteristics they require. We begin by describing the use cases envisaged for the LSQ dataset, which include applications for research on common features of queries, for building custom benchmarks, and for designing user interfaces. We then discuss how LSQ has been used in practice since the release of four initial SPARQL logs in 2015. We discuss the model and vocabulary that we use to represent these queries in RDF. We then provide a brief overview of the 27 endpoints from which we extracted queries in terms of the domain to which they pertain and the data they contain. We provide statistics on the queries included from each log, including the number of query executions, unique queries, as well as distributions of queries for a variety of selected characteristics. We finally discuss how the LSQ dataset is hosted and how it can be accessed and leveraged by interested parties for their use cases.},
author = {Stadler, Claus and Saleem, Muhammad and Mehmood, Qaiser and Buil-Aranda, Carlos and Dumontier, Michel and Hogan, Aidan and Ngonga Ngomo, Axel-Cyrille},
editor = {Cudré-Mauroux, Philippe},
journal = {Semantic Web},
keywords = {sys:relevantFor:infai},
month = 11,
pages = {1–23},
publisher = {IOS Press},
title = {{LSQ} 2.0: A linked dataset of {SPARQL} query logs},
year = 2022
}%0 Journal Article
%1 stadler2022-lsq20
%A Stadler, Claus
%A Saleem, Muhammad
%A Mehmood, Qaiser
%A Buil-Aranda, Carlos
%A Dumontier, Michel
%A Hogan, Aidan
%A Ngonga Ngomo, Axel-Cyrille
%D 2022
%E Cudré-Mauroux, Philippe
%I IOS Press
%J Semantic Web
%P 1–23
%R 10.3233/SW-223015
%T {LSQ} 2.0: A linked dataset of {SPARQL} query logs
%U https://www.semantic-web-journal.net/system/files/swj3015.pdf
%X We present the Linked SPARQL Queries (LSQ) dataset, which currently describes 43.95 million executions of 11.56 million unique SPARQL queries extracted from the logs of 27 different endpoints. The LSQ dataset provides RDF descriptions of each such query, which are indexed in a public LSQ endpoint, allowing interested parties to find queries with the characteristics they require. We begin by describing the use cases envisaged for the LSQ dataset, which include applications for research on common features of queries, for building custom benchmarks, and for designing user interfaces. We then discuss how LSQ has been used in practice since the release of four initial SPARQL logs in 2015. We discuss the model and vocabulary that we use to represent these queries in RDF. We then provide a brief overview of the 27 endpoints from which we extracted queries in terms of the domain to which they pertain and the data they contain. We provide statistics on the queries included from each log, including the number of query executions, unique queries, as well as distributions of queries for a variety of selected characteristics. We finally discuss how the LSQ dataset is hosted and how it can be accessed and leveraged by interested parties for their use cases. - Semantification of Geospatial Information for Enriched Knowledge Representation in Context of Crisis InformaticsIn: Natanael Arndt, Sabine Gründer-Fahrer, Julia Holze, Michael Martin and Sebastian Tramp (eds.): Proceedings of the International Workshop on Data-driven Resilience Research 2022, {CEUR} Workshop Proceedings. vol. 3376. Leipzig, GermanyClaus Stadler, Simon Bin, Lorenz Bühmann, Norman Radtke, Kurt Junghanns, Sabine Gründer-Fahrer and Michael MartinIn the context of crisis informatics, the integration and exploitation of high volumes of heterogeneous data from multiple sources is one of the big chances as well as challenges up to now. Semantic Web technologies have proven a valuable means to integrate and represent knowledge on the basis of domain concepts which improves interoperability and interpretability of information resources and allows deriving more knowledge via semantic relations and reasoning. In this paper, we investigate the potential of representing and processing geospatial information within the semantic paradigm. We show, on the technical level, how existing open source means can be used and supplemented as to efficiently handle geographic information and to convey exemplary results highly relevant in context of crisis management applications. When given semantic resources get enriched with geospatial information, new information can be retrieved combining the concepts of multi-polygons and geo-coordinates and using the power of GeoSPARQL queries. Custom SPARQL extension functions and data types for JSON, XML and CSV as well as for dialects such as GeoJSON and GML allow for succinct integration of heterogeneous data. We implemented these features for the Apache Jena Semantic Web framework by leveraging its plugin systems. Furthermore, significant improvements w.r.t. GeoSPARQL query performance have been contributed to the framework.
@inproceedings{stadler-c-2022–geospacial,
abstract = {In the context of crisis informatics, the integration and exploitation of high volumes of heterogeneous data from multiple sources is one of the big chances as well as challenges up to now. Semantic Web technologies have proven a valuable means to integrate and represent knowledge on the basis of domain concepts which improves interoperability and interpretability of information resources and allows deriving more knowledge via semantic relations and reasoning. In this paper, we investigate the potential of representing and processing geospatial information within the semantic paradigm. We show, on the technical level, how existing open source means can be used and supplemented as to efficiently handle geographic information and to convey exemplary results highly relevant in context of crisis management applications. When given semantic resources get enriched with geospatial information, new information can be retrieved combining the concepts of multi-polygons and geo-coordinates and using the power of GeoSPARQL queries. Custom SPARQL extension functions and data types for JSON, XML and CSV as well as for dialects such as GeoJSON and GML allow for succinct integration of heterogeneous data. We implemented these features for the Apache Jena Semantic Web framework by leveraging its plugin systems. Furthermore, significant improvements w.r.t. GeoSPARQL query performance have been contributed to the framework.},
address = {Leipzig, Germany},
author = {Stadler, Claus and Bin, Simon and Bühmann, Lorenz and Radtke, Norman and Junghanns, Kurt and Gründer-Fahrer, Sabine and Martin, Michael},
booktitle = {Proceedings of the International Workshop on Data-driven Resilience Research 2022},
editor = {Arndt, Natanael and Gründer-Fahrer, Sabine and Holze, Julia and Martin, Michael and Tramp, Sebastian},
keywords = {sys:relevantFor:infai},
month = {07},
series = {{CEUR} Workshop Proceedings},
title = {Semantification of Geospatial Information for Enriched Knowledge Representation in Context of Crisis Informatics},
volume = 3376,
year = 2022
}%0 Conference Paper
%1 stadler-c-2022–geospacial
%A Stadler, Claus
%A Bin, Simon
%A Bühmann, Lorenz
%A Radtke, Norman
%A Junghanns, Kurt
%A Gründer-Fahrer, Sabine
%A Martin, Michael
%B Proceedings of the International Workshop on Data-driven Resilience Research 2022
%C Leipzig, Germany
%D 2022
%E Arndt, Natanael
%E Gründer-Fahrer, Sabine
%E Holze, Julia
%E Martin, Michael
%E Tramp, Sebastian
%T Semantification of Geospatial Information for Enriched Knowledge Representation in Context of Crisis Informatics
%U https://ceur-ws.org/Vol-3376/paper03.pdf
%V 3376
%X In the context of crisis informatics, the integration and exploitation of high volumes of heterogeneous data from multiple sources is one of the big chances as well as challenges up to now. Semantic Web technologies have proven a valuable means to integrate and represent knowledge on the basis of domain concepts which improves interoperability and interpretability of information resources and allows deriving more knowledge via semantic relations and reasoning. In this paper, we investigate the potential of representing and processing geospatial information within the semantic paradigm. We show, on the technical level, how existing open source means can be used and supplemented as to efficiently handle geographic information and to convey exemplary results highly relevant in context of crisis management applications. When given semantic resources get enriched with geospatial information, new information can be retrieved combining the concepts of multi-polygons and geo-coordinates and using the power of GeoSPARQL queries. Custom SPARQL extension functions and data types for JSON, XML and CSV as well as for dialects such as GeoJSON and GML allow for succinct integration of heterogeneous data. We implemented these features for the Apache Jena Semantic Web framework by leveraging its plugin systems. Furthermore, significant improvements w.r.t. GeoSPARQL query performance have been contributed to the framework. - {LSQ} Framework: The {LSQ} Framework for {SPARQL} Query Log ProcessingIn: 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs @ ISWC 2022, {CEUR} Workshop Proceedings. vol. 3279Claus Stadler, Muhammad Saleem and Axel-Cyrille Ngonga NgomoThe Linked SPARQL Queries (LSQ) datasets contain real-world SPARQL queries collected from the query logs of the publicly available SPARQL endpoints. In LSQ, each SPARQL query is represented as RDF with various structural and data-driven features attached. In this paper, we present the LSQ Java framework for creating rich knowledge graphs from SPARQL query logs. The framework is able to RDFize SPARQL query logs, which are available in different formats, in a scalable way. Furthermore, the framework offers a set of static and dynamic enrichers. Static enrichers derive information from the queries, such as their number of basic graph patterns and projected variables or even a full SPIN model. Dynamic enrichment involves additional resources. For instance, the benchmark enricher executes queries against a SPARQL endpoint and collects query execution times and result set sizes. This framework has already been used to convert query logs of 27 public SPARQL endpoints, representing 43.95 million executions of 11.56 million unique SPARQL queries. The LSQ queries have been used in many use cases such as benchmarking based on real-world SPARQL queries, SPARQL adoption, caching, query optimization, useability analysis, and meta-querying. Realization of LSQ required devising novel software components to (a) improve scalability of RDF data processing with the Apache Spark Big Data framework and (b) ease operations of complex RDF data models such as controlled skolemization. Following the spirit of OpenSource software development and the “don’t repeat yourself” (DRY) paradigm, the work on the LSQ framework also resulted in contributions to Apache Jena in order to make these improvements readily available outside of the LSQ context.
@inproceedings{stadler2022-lsq-framework,
abstract = {The Linked SPARQL Queries (LSQ) datasets contain real-world SPARQL queries collected from the query logs of the publicly available SPARQL endpoints. In LSQ, each SPARQL query is represented as RDF with various structural and data-driven features attached. In this paper, we present the LSQ Java framework for creating rich knowledge graphs from SPARQL query logs. The framework is able to RDFize SPARQL query logs, which are available in different formats, in a scalable way. Furthermore, the framework offers a set of static and dynamic enrichers. Static enrichers derive information from the queries, such as their number of basic graph patterns and projected variables or even a full SPIN model. Dynamic enrichment involves additional resources. For instance, the benchmark enricher executes queries against a SPARQL endpoint and collects query execution times and result set sizes. This framework has already been used to convert query logs of 27 public SPARQL endpoints, representing 43.95 million executions of 11.56 million unique SPARQL queries. The LSQ queries have been used in many use cases such as benchmarking based on real-world SPARQL queries, SPARQL adoption, caching, query optimization, useability analysis, and meta-querying. Realization of LSQ required devising novel software components to (a) improve scalability of RDF data processing with the Apache Spark Big Data framework and (b) ease operations of complex RDF data models such as controlled skolemization. Following the spirit of OpenSource software development and the “don’t repeat yourself” (DRY) paradigm, the work on the LSQ framework also resulted in contributions to Apache Jena in order to make these improvements readily available outside of the LSQ context.},
author = {Stadler, Claus and Saleem, Muhammad and Ngomo, Axel-Cyrille Ngonga},
booktitle = {6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs @ ISWC 2022},
keywords = {sys:relevantFor:infai},
series = {{CEUR} Workshop Proceedings},
title = {{LSQ} Framework: The {LSQ} Framework for {SPARQL} Query Log Processing},
volume = 3279,
year = 2022
}%0 Conference Paper
%1 stadler2022-lsq-framework
%A Stadler, Claus
%A Saleem, Muhammad
%A Ngomo, Axel-Cyrille Ngonga
%B 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs @ ISWC 2022
%D 2022
%T {LSQ} Framework: The {LSQ} Framework for {SPARQL} Query Log Processing
%U https://ceur-ws.org/Vol-3279/paper4.pdf
%V 3279
%X The Linked SPARQL Queries (LSQ) datasets contain real-world SPARQL queries collected from the query logs of the publicly available SPARQL endpoints. In LSQ, each SPARQL query is represented as RDF with various structural and data-driven features attached. In this paper, we present the LSQ Java framework for creating rich knowledge graphs from SPARQL query logs. The framework is able to RDFize SPARQL query logs, which are available in different formats, in a scalable way. Furthermore, the framework offers a set of static and dynamic enrichers. Static enrichers derive information from the queries, such as their number of basic graph patterns and projected variables or even a full SPIN model. Dynamic enrichment involves additional resources. For instance, the benchmark enricher executes queries against a SPARQL endpoint and collects query execution times and result set sizes. This framework has already been used to convert query logs of 27 public SPARQL endpoints, representing 43.95 million executions of 11.56 million unique SPARQL queries. The LSQ queries have been used in many use cases such as benchmarking based on real-world SPARQL queries, SPARQL adoption, caching, query optimization, useability analysis, and meta-querying. Realization of LSQ required devising novel software components to (a) improve scalability of RDF data processing with the Apache Spark Big Data framework and (b) ease operations of complex RDF data models such as controlled skolemization. Following the spirit of OpenSource software development and the “don’t repeat yourself” (DRY) paradigm, the work on the LSQ framework also resulted in contributions to Apache Jena in order to make these improvements readily available outside of the LSQ context.
2021
- Open Data and the Status Quo — A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in GermanyIn: Lisa Wenige, Claus Stadler, Michael Martin, Richard Figura, Robert Sauter and Christopher W. Frank
@inproceedings{wenige2021open,
author = {Wenige, Lisa and Stadler, Claus and Martin, Michael and Figura, Richard and Sauter, Robert and Frank, Christopher W.},
keywords = {sys:relevantFor:infai},
title = {Open Data and the Status Quo — A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany},
year = 2021
}%0 Conference Paper
%1 wenige2021open
%A Wenige, Lisa
%A Stadler, Claus
%A Martin, Michael
%A Figura, Richard
%A Sauter, Robert
%A Frank, Christopher W.
%D 2021
%T Open Data and the Status Quo — A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany
%U https://arxiv.org/pdf/2106.09590.pdf - Towards the next generation of the LinkedGeoData project using virtual knowledge graphsIn: Journal of Web Semantics vol. 71, p. 100662Linfang Ding, Guohui Xiao, Albulen Pano, Claus Stadler and Diego CalvaneseWith the advancement of Semantic Technologies, large geospatial data sources have been increasingly published as Linked data on the Web. The LinkedGeoData project is one of the most prominent such projects to create a large knowledge graph from OpenStreetMap (OSM) with global coverage and interlinking of other data sources. In this paper, we report on the ongoing effort of exposing the relational database in LinkedGeoData as a SPARQL endpoint using Virtual Knowledge Graph (VKG) technology. Specifically, we present two realizations of VKGs, using the two systems Sparqlify and Ontop. In order to improve compliance with the OGC GeoSPARQL standard, we have implemented GeoSPARQL support in Ontop v4. Moreover, we have evaluated the VKG-powered LinkedGeoData in the test areas of Italy and Germany. Our experiments demonstrate that such system supports complex GeoSPARQL queries, which confirms that query answering in the VKG approach is efficient.
@article{Ding2021,
abstract = {With the advancement of Semantic Technologies, large geospatial data sources have been increasingly published as Linked data on the Web. The LinkedGeoData project is one of the most prominent such projects to create a large knowledge graph from OpenStreetMap (OSM) with global coverage and interlinking of other data sources. In this paper, we report on the ongoing effort of exposing the relational database in LinkedGeoData as a SPARQL endpoint using Virtual Knowledge Graph (VKG) technology. Specifically, we present two realizations of VKGs, using the two systems Sparqlify and Ontop. In order to improve compliance with the OGC GeoSPARQL standard, we have implemented GeoSPARQL support in Ontop v4. Moreover, we have evaluated the VKG-powered LinkedGeoData in the test areas of Italy and Germany. Our experiments demonstrate that such system supports complex GeoSPARQL queries, which confirms that query answering in the VKG approach is efficient.},
author = {Ding, Linfang and Xiao, Guohui and Pano, Albulen and Stadler, Claus and Calvanese, Diego},
journal = {Journal of Web Semantics},
keywords = {sys:relevantFor:infai},
pages = 100662,
title = {Towards the next generation of the LinkedGeoData project using virtual knowledge graphs},
volume = 71,
year = 2021
}%0 Journal Article
%1 Ding2021
%A Ding, Linfang
%A Xiao, Guohui
%A Pano, Albulen
%A Stadler, Claus
%A Calvanese, Diego
%D 2021
%J Journal of Web Semantics
%P 100662
%R https://doi.org/10.1016/j.websem.2021.100662
%T Towards the next generation of the LinkedGeoData project using virtual knowledge graphs
%U https://svn.aksw.org/papers/2023/JWS_LinkedGeoData_Ontop/public.pdf
%V 71
%X With the advancement of Semantic Technologies, large geospatial data sources have been increasingly published as Linked data on the Web. The LinkedGeoData project is one of the most prominent such projects to create a large knowledge graph from OpenStreetMap (OSM) with global coverage and interlinking of other data sources. In this paper, we report on the ongoing effort of exposing the relational database in LinkedGeoData as a SPARQL endpoint using Virtual Knowledge Graph (VKG) technology. Specifically, we present two realizations of VKGs, using the two systems Sparqlify and Ontop. In order to improve compliance with the OGC GeoSPARQL standard, we have implemented GeoSPARQL support in Ontop v4. Moreover, we have evaluated the VKG-powered LinkedGeoData in the test areas of Italy and Germany. Our experiments demonstrate that such system supports complex GeoSPARQL queries, which confirms that query answering in the VKG approach is efficient. - {DistRDF2ML} — Scalable Distributed In-Memory Machine Learning Pipelines for {RDF} Knowledge GraphsIn: Proceedings of the 30th ACM International Conference on Information \& Knowledge Management, CIKM ’21. Virtual Event, Queensland, Australia : Association for Computing Machinery — ISBN 9781450384469, pp. 4465–4474Carsten Felix Draschner, Claus Stadler, Farshad Bakhshandegan Moghaddam, Jens Lehmann and Hajira JabeenThis paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-memory data preprocessing pipelines for Spark-based machine learning on RDF knowledge graphs. This framework introduces software modules that transform large-scale RDF data into ML-ready fixed-length numeric feature vectors. The developed modules are optimized to the multi-modal nature of knowledge graphs. DistRDF2ML provides aligned software design and usage principles as common data science stacks that offer an easy-to-use package for creating machine learning pipelines. The modules used in the pipeline, the hyper-parameters and the results are exported as a semantic structure that can be used to enrich the original knowledge graph. The semantic representation of metadata and machine learning results offers the advantage of increasing the machine learning pipelines’ reusability, explainability, and reproducibility. The entire framework of DistRDF2ML is open source, integrated into the holistic SANSA stack, documented in scala-docs, and covered by unit tests. DistRDF2ML demonstrates its scalable design across different processing power configurations and (hyper-)parameter setups within various experiments. The framework brings the three worlds of knowledge graph engineers, distributed computation developers, and data scientists closer together and offers all of them the creation of explainable ML pipelines using a few lines of code.
@inproceedings{Draschner2021,
abstract = {This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-memory data preprocessing pipelines for Spark-based machine learning on RDF knowledge graphs. This framework introduces software modules that transform large-scale RDF data into ML-ready fixed-length numeric feature vectors. The developed modules are optimized to the multi-modal nature of knowledge graphs. DistRDF2ML provides aligned software design and usage principles as common data science stacks that offer an easy-to-use package for creating machine learning pipelines. The modules used in the pipeline, the hyper-parameters and the results are exported as a semantic structure that can be used to enrich the original knowledge graph. The semantic representation of metadata and machine learning results offers the advantage of increasing the machine learning pipelines’ reusability, explainability, and reproducibility. The entire framework of DistRDF2ML is open source, integrated into the holistic SANSA stack, documented in scala-docs, and covered by unit tests. DistRDF2ML demonstrates its scalable design across different processing power configurations and (hyper-)parameter setups within various experiments. The framework brings the three worlds of knowledge graph engineers, distributed computation developers, and data scientists closer together and offers all of them the creation of explainable ML pipelines using a few lines of code.},
address = {New York, NY, USA},
author = {Draschner, Carsten Felix and Stadler, Claus and Bakhshandegan Moghaddam, Farshad and Lehmann, Jens and Jabeen, Hajira},
booktitle = {Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
keywords = {sys:relevantFor:infai},
pages = {4465–4474},
publisher = {Association for Computing Machinery},
series = {CIKM ’21},
title = {{DistRDF2ML} — Scalable Distributed In-Memory Machine Learning Pipelines for {RDF} Knowledge Graphs},
year = 2021
}%0 Conference Paper
%1 Draschner2021
%A Draschner, Carsten Felix
%A Stadler, Claus
%A Bakhshandegan Moghaddam, Farshad
%A Lehmann, Jens
%A Jabeen, Hajira
%B Proceedings of the 30th ACM International Conference on Information \& Knowledge Management
%C New York, NY, USA
%D 2021
%I Association for Computing Machinery
%P 4465–4474
%R 10.1145/3459637.3481999
%T {DistRDF2ML} — Scalable Distributed In-Memory Machine Learning Pipelines for {RDF} Knowledge Graphs
%U https://svn.aksw.org/papers/2021/cikm-distrdf2ml/public.pdf
%X This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-memory data preprocessing pipelines for Spark-based machine learning on RDF knowledge graphs. This framework introduces software modules that transform large-scale RDF data into ML-ready fixed-length numeric feature vectors. The developed modules are optimized to the multi-modal nature of knowledge graphs. DistRDF2ML provides aligned software design and usage principles as common data science stacks that offer an easy-to-use package for creating machine learning pipelines. The modules used in the pipeline, the hyper-parameters and the results are exported as a semantic structure that can be used to enrich the original knowledge graph. The semantic representation of metadata and machine learning results offers the advantage of increasing the machine learning pipelines’ reusability, explainability, and reproducibility. The entire framework of DistRDF2ML is open source, integrated into the holistic SANSA stack, documented in scala-docs, and covered by unit tests. DistRDF2ML demonstrates its scalable design across different processing power configurations and (hyper-)parameter setups within various experiments. The framework brings the three worlds of knowledge graph engineers, distributed computation developers, and data scientists closer together and offers all of them the creation of explainable ML pipelines using a few lines of code.
%@ 9781450384469
2020
- MINDS: A Translator to Embed Mathematical Expressions Inside SPARQL QueriesIn: Eva Blomqvist, Paul Groth, Victor de Boer, Tassilo Pellegrini, Mehwish Alam, Tobias K{ä}fer, Peter Kieseberg, Sabrina Kirrane, Albert Mero{\~{n}}o‑Pe{\~{n}}uela, et al. (eds.): Semantic Systems. In the Era of Knowledge Graphs. Cham : Springer International Publishing — ISBN 978–3‑030–59833‑4, pp. 104–117Damien Graux, Gezim Sejdiu, Claus Stadler, Giulio Napolitano and Jens LehmannThe recent deployments of semantic web tools and the expansion of available linked datasets have given users the opportunity of building increasingly complex applications. These emerging use cases often require queries containing mathematical formulas such as euclidean distances or unit conversions. Currently, the latest SPARQL standard (version 1.1) only embeds basic math operators. Thus, to address this shortcoming, some popular SPARQL evaluators provide built-in tools to cover specific needs; however, such tools are not standard yet. To offer users a more generic solution, we propose and share MINDS, a translator of mathematical expressions into SPARQL-compliant bindings which can be understood by any evaluator. MINDS thereby facilitates the query design whenever mathematical computations are needed in a SPARQL query.
@inproceedings{graux2020-minds,
abstract = {The recent deployments of semantic web tools and the expansion of available linked datasets have given users the opportunity of building increasingly complex applications. These emerging use cases often require queries containing mathematical formulas such as euclidean distances or unit conversions. Currently, the latest SPARQL standard (version 1.1) only embeds basic math operators. Thus, to address this shortcoming, some popular SPARQL evaluators provide built-in tools to cover specific needs; however, such tools are not standard yet. To offer users a more generic solution, we propose and share MINDS, a translator of mathematical expressions into SPARQL-compliant bindings which can be understood by any evaluator. MINDS thereby facilitates the query design whenever mathematical computations are needed in a SPARQL query.},
address = {Cham},
author = {Graux, Damien and Sejdiu, Gezim and Stadler, Claus and Napolitano, Giulio and Lehmann, Jens},
booktitle = {Semantic Systems. In the Era of Knowledge Graphs},
editor = {Blomqvist, Eva and Groth, Paul and de Boer, Victor and Pellegrini, Tassilo and Alam, Mehwish and K{ä}fer, Tobias and Kieseberg, Peter and Kirrane, Sabrina and Mero{\~{n}}o‑Pe{\~{n}}uela, Albert and Pandit, Harshvardhan J.},
keywords = {sys:relevantFor:infai},
pages = {104–117},
publisher = {Springer International Publishing},
title = {MINDS: A Translator to Embed Mathematical Expressions Inside SPARQL Queries},
year = 2020
}%0 Conference Paper
%1 graux2020-minds
%A Graux, Damien
%A Sejdiu, Gezim
%A Stadler, Claus
%A Napolitano, Giulio
%A Lehmann, Jens
%B Semantic Systems. In the Era of Knowledge Graphs
%C Cham
%D 2020
%E Blomqvist, Eva
%E Groth, Paul
%E de Boer, Victor
%E Pellegrini, Tassilo
%E Alam, Mehwish
%E K{ä}fer, Tobias
%E Kieseberg, Peter
%E Kirrane, Sabrina
%E Mero{\~{n}}o‑Pe{\~{n}}uela, Albert
%E Pandit, Harshvardhan J.
%I Springer International Publishing
%P 104–117
%T MINDS: A Translator to Embed Mathematical Expressions Inside SPARQL Queries
%U https://svn.aksw.org/papers/2020/semantics_minds/public.pdf
%X The recent deployments of semantic web tools and the expansion of available linked datasets have given users the opportunity of building increasingly complex applications. These emerging use cases often require queries containing mathematical formulas such as euclidean distances or unit conversions. Currently, the latest SPARQL standard (version 1.1) only embeds basic math operators. Thus, to address this shortcoming, some popular SPARQL evaluators provide built-in tools to cover specific needs; however, such tools are not standard yet. To offer users a more generic solution, we propose and share MINDS, a translator of mathematical expressions into SPARQL-compliant bindings which can be understood by any evaluator. MINDS thereby facilitates the query design whenever mathematical computations are needed in a SPARQL query.
%@ 978–3‑030–59833‑4 - Automatic Subject Indexing with Knowledge GraphsIn: LASCAR Workshop at the Extended Semantic Web Conference (ESWC)Lisa Wenige, Claus Stadler, Simon Bin, Lorenz Bühmann, Kurt Junghanns and Michael Martin
@inproceedings{wenige2020kindex,
author = {Wenige, Lisa and Stadler, Claus and Bin, Simon and Bühmann, Lorenz and Junghanns, Kurt and Martin, Michael},
booktitle = {LASCAR Workshop at the Extended Semantic Web Conference (ESWC)},
keywords = {sys:relevantFor:infai},
title = {Automatic Subject Indexing with Knowledge Graphs},
year = 2020
}%0 Conference Paper
%1 wenige2020kindex
%A Wenige, Lisa
%A Stadler, Claus
%A Bin, Simon
%A Bühmann, Lorenz
%A Junghanns, Kurt
%A Martin, Michael
%B LASCAR Workshop at the Extended Semantic Web Conference (ESWC)
%D 2020
%T Automatic Subject Indexing with Knowledge Graphs
%U https://svn.aksw.org/papers/2020/LASCAR_Kindex/public.pdf - Schema-agnostic SPARQL-driven faceted search benchmark generationIn: Journal of Web Semantics, p. 100614Claus Stadler, Simon Bin, Lisa Wenige, Lorenz Bühmann and Jens LehmannIn this work, we present a schema-agnostic faceted browsing benchmark generation framework for RDF data and SPARQL engines. Faceted search is a technique that allows narrowing down sets of information items by applying constraints over their properties, whereas facets correspond to properties of these items. While our work can be used to realise real-world faceted search user interfaces, our focus lies on the construction and benchmarking of faceted search queries over knowledge graphs. The RDF model exhibits several traits that seemingly make it a natural foundation for faceted search: all information items are represented as RDF resources, property values typically already correspond to meaningful semantic classifications, and with SPARQL there is a standard language for uniformly querying instance and schema information. However, although faceted search is ubiquitous today, it is typically not performed on the RDF model directly. Two major sources of concern are the complexity of query generation and the query performance. To overcome the former, our framework comes with an intermediate domain-specific language. Thereby our approach is SPARQL-driven which means that every faceted search information need is intensionally expressed as a single SPARQL query. In regard to the latter, we investigate the possibilities and limits of real-time SPARQL-driven faceted search on contemporary triple stores. We report on our findings by evaluating systems performance and correctness characteristics when executing a benchmark generated using our generation framework. All components, namely the benchmark generator, the benchmark runners and the underlying faceted search framework, are published freely available as open source.
@article{stadler2020facete,
abstract = {In this work, we present a schema-agnostic faceted browsing benchmark generation framework for RDF data and SPARQL engines. Faceted search is a technique that allows narrowing down sets of information items by applying constraints over their properties, whereas facets correspond to properties of these items. While our work can be used to realise real-world faceted search user interfaces, our focus lies on the construction and benchmarking of faceted search queries over knowledge graphs. The RDF model exhibits several traits that seemingly make it a natural foundation for faceted search: all information items are represented as RDF resources, property values typically already correspond to meaningful semantic classifications, and with SPARQL there is a standard language for uniformly querying instance and schema information. However, although faceted search is ubiquitous today, it is typically not performed on the RDF model directly. Two major sources of concern are the complexity of query generation and the query performance. To overcome the former, our framework comes with an intermediate domain-specific language. Thereby our approach is SPARQL-driven which means that every faceted search information need is intensionally expressed as a single SPARQL query. In regard to the latter, we investigate the possibilities and limits of real-time SPARQL-driven faceted search on contemporary triple stores. We report on our findings by evaluating systems performance and correctness characteristics when executing a benchmark generated using our generation framework. All components, namely the benchmark generator, the benchmark runners and the underlying faceted search framework, are published freely available as open source.},
author = {Stadler, Claus and Bin, Simon and Wenige, Lisa and Bühmann, Lorenz and Lehmann, Jens},
journal = {Journal of Web Semantics},
keywords = {sys:relevantFor:infai},
pages = 100614,
title = {Schema-agnostic SPARQL-driven faceted search benchmark generation},
year = 2020
}%0 Journal Article
%1 stadler2020facete
%A Stadler, Claus
%A Bin, Simon
%A Wenige, Lisa
%A Bühmann, Lorenz
%A Lehmann, Jens
%D 2020
%J Journal of Web Semantics
%P 100614
%R https://doi.org/10.1016/j.websem.2020.100614
%T Schema-agnostic SPARQL-driven faceted search benchmark generation
%U https://svn.aksw.org/papers/2020/JWS_Faceted_Search_Benchmark/public.pdf
%X In this work, we present a schema-agnostic faceted browsing benchmark generation framework for RDF data and SPARQL engines. Faceted search is a technique that allows narrowing down sets of information items by applying constraints over their properties, whereas facets correspond to properties of these items. While our work can be used to realise real-world faceted search user interfaces, our focus lies on the construction and benchmarking of faceted search queries over knowledge graphs. The RDF model exhibits several traits that seemingly make it a natural foundation for faceted search: all information items are represented as RDF resources, property values typically already correspond to meaningful semantic classifications, and with SPARQL there is a standard language for uniformly querying instance and schema information. However, although faceted search is ubiquitous today, it is typically not performed on the RDF model directly. Two major sources of concern are the complexity of query generation and the query performance. To overcome the former, our framework comes with an intermediate domain-specific language. Thereby our approach is SPARQL-driven which means that every faceted search information need is intensionally expressed as a single SPARQL query. In regard to the latter, we investigate the possibilities and limits of real-time SPARQL-driven faceted search on contemporary triple stores. We report on our findings by evaluating systems performance and correctness characteristics when executing a benchmark generated using our generation framework. All components, namely the benchmark generator, the benchmark runners and the underlying faceted search framework, are published freely available as open source.
2019
- RDF-based Deployment Pipelining for Efficient Dataset Release ManagementIn: Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTICS’19). Karlsruhe, GermanyClaus Stadler, Lisa Wenige, Sebastian Tramp, Kurt Junghanns and Michael MartinOpen Data portals often struggle to provide release features (i.e., stable versioning, up-to-date download links, rich metadata descritions) for their datasets. By this means, wide adoption of publicly available datasets is hindered, since consuming applications cannot access fresh data sources or might break due to data quality issues. While there exists a variety of tools to efficiently control release processes in software development, the management of dataset releases is not as clear. This paper proposes a deployment pipeline for efficient dataset releases that is based on automated enrichment of DCAT/DATAID metadata and is a first step towards efficient deployment pipelining for Open Data publishing.
@inproceedings{stadler-n-2019–rdfdeploy,
abstract = {Open Data portals often struggle to provide release features (i.e., stable versioning, up-to-date download links, rich metadata descritions) for their datasets. By this means, wide adoption of publicly available datasets is hindered, since consuming applications cannot access fresh data sources or might break due to data quality issues. While there exists a variety of tools to efficiently control release processes in software development, the management of dataset releases is not as clear. This paper proposes a deployment pipeline for efficient dataset releases that is based on automated enrichment of DCAT/DATAID metadata and is a first step towards efficient deployment pipelining for Open Data publishing.},
address = {Karlsruhe, Germany},
author = {Stadler, Claus and Wenige, Lisa and Tramp, Sebastian and Junghanns, Kurt and Martin, Michael},
booktitle = {Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTICS’19)},
keywords = {sys:relevantFor:infai},
title = {RDF-based Deployment Pipelining for Efficient Dataset Release Management},
year = 2019
}%0 Conference Paper
%1 stadler-n-2019–rdfdeploy
%A Stadler, Claus
%A Wenige, Lisa
%A Tramp, Sebastian
%A Junghanns, Kurt
%A Martin, Michael
%B Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTICS’19)
%C Karlsruhe, Germany
%D 2019
%T RDF-based Deployment Pipelining for Efficient Dataset Release Management
%U https://svn.aksw.org/papers/2019/semantics_rdf_pipeline/public.pdf
%X Open Data portals often struggle to provide release features (i.e., stable versioning, up-to-date download links, rich metadata descritions) for their datasets. By this means, wide adoption of publicly available datasets is hindered, since consuming applications cannot access fresh data sources or might break due to data quality issues. While there exists a variety of tools to efficiently control release processes in software development, the management of dataset releases is not as clear. This paper proposes a deployment pipeline for efficient dataset releases that is based on automated enrichment of DCAT/DATAID metadata and is a first step towards efficient deployment pipelining for Open Data publishing.
2018
- {SQCFramework}: Generating SPARQL Query Containment Benchmarks using the SQCFrameworkIn: Demo at ISWCMuhammad Saleem, Claus Stadler, Qaiser Mehmood, Jens Lehmann and Axel-Cyrille Ngonga Ngomo
@inproceedings{Saleem2018SQCDmo,
author = {Saleem, Muhammad and Stadler, Claus and Mehmood, Qaiser and Lehmann, Jens and Ngomo, Axel-Cyrille Ngonga},
booktitle = {Demo at ISWC},
keywords = {sys:relevantFor:infai},
title = {{SQCFramework}: Generating SPARQL Query Containment Benchmarks using the SQCFramework},
year = 2018
}%0 Conference Paper
%1 Saleem2018SQCDmo
%A Saleem, Muhammad
%A Stadler, Claus
%A Mehmood, Qaiser
%A Lehmann, Jens
%A Ngomo, Axel-Cyrille Ngonga
%B Demo at ISWC
%D 2018
%T {SQCFramework}: Generating SPARQL Query Containment Benchmarks using the SQCFramework
%U https://svn.aksw.org/papers/2018/ISWC-Demo-SQCFramework/public.pdf
2017
- {SQCFramework}: {SPARQL} Query Containment Benchmark Generation FrameworkIn: K‑CAP 2017: Knowledge Capture Conference : ACMMuhammad Saleem, Claus Stadler, Qaiser Mehmood, Jens Lehmann and Axel-Cyrille Ngonga Ngomo
@inproceedings{Saleem2017SQC,
author = {Saleem, Muhammad and Stadler, Claus and Mehmood, Qaiser and Lehmann, Jens and Ngomo, Axel-Cyrille Ngonga},
booktitle = {K‑CAP 2017: Knowledge Capture Conference},
keywords = {sys:relevantFor:infai},
organization = {ACM},
title = {{SQCFramework}: {SPARQL} Query Containment Benchmark Generation Framework},
year = 2017
}%0 Conference Paper
%1 Saleem2017SQC
%A Saleem, Muhammad
%A Stadler, Claus
%A Mehmood, Qaiser
%A Lehmann, Jens
%A Ngomo, Axel-Cyrille Ngonga
%B K‑CAP 2017: Knowledge Capture Conference
%D 2017
%T {SQCFramework}: {SPARQL} Query Containment Benchmark Generation Framework
%U https://svn.aksw.org/papers/2017/KCAP_SQCFrameWork/public.pdf - The Tale of Sansa SparkIn: Proceedings of 16th International Semantic Web Conference, Poster \& DemosIvan Ermilov, Jens Lehmann, Gezim Sejdiu, Lorenz Bühmann, Patrick Westphal, Claus Stadler, Simon Bin, Nilesh Chakraborty, Henning Petzka, et al.
@inproceedings{iermilov-2017-sansa-iswc-demo,
author = {Ermilov, Ivan and Lehmann, Jens and Sejdiu, Gezim and Bühmann, Lorenz and Westphal, Patrick and Stadler, Claus and Bin, Simon and Chakraborty, Nilesh and Petzka, Henning and Saleem, Muhammad and Ngonga, Axel-Cyrille Ngomo and Jabeen, Hajira},
booktitle = {Proceedings of 16th International Semantic Web Conference, Poster \& Demos},
keywords = {buehmann},
title = {The Tale of Sansa Spark},
year = 2017
}%0 Conference Paper
%1 iermilov-2017-sansa-iswc-demo
%A Ermilov, Ivan
%A Lehmann, Jens
%A Sejdiu, Gezim
%A Bühmann, Lorenz
%A Westphal, Patrick
%A Stadler, Claus
%A Bin, Simon
%A Chakraborty, Nilesh
%A Petzka, Henning
%A Saleem, Muhammad
%A Ngonga, Axel-Cyrille Ngomo
%A Jabeen, Hajira
%B Proceedings of 16th International Semantic Web Conference, Poster \& Demos
%D 2017
%T The Tale of Sansa Spark
%U https://svn.aksw.org/papers/2017/ISWC_SANSA_Demo/public.pdf - Distributed Semantic Analytics using the SANSA StackIn: Proceedings of 16th International Semantic Web Conference — Resources Track (ISWC’2017) : Springer, pp. 147–155Jens Lehmann, Gezim Sejdiu, Lorenz Bühmann, Patrick Westphal, Claus Stadler, Ivan Ermilov, Simon Bin, Nilesh Chakraborty, Muhammad Saleem, et al.
@inproceedings{lehmann-2017-sansa-iswc,
author = {Lehmann, Jens and Sejdiu, Gezim and Bühmann, Lorenz and Westphal, Patrick and Stadler, Claus and Ermilov, Ivan and Bin, Simon and Chakraborty, Nilesh and Saleem, Muhammad and Ngonga, Axel-Cyrille Ngomo and Jabeen, Hajira},
booktitle = {Proceedings of 16th International Semantic Web Conference — Resources Track (ISWC’2017)},
keywords = {buehmann},
pages = {147–155},
publisher = {Springer},
title = {Distributed Semantic Analytics using the SANSA Stack},
year = 2017
}%0 Conference Paper
%1 lehmann-2017-sansa-iswc
%A Lehmann, Jens
%A Sejdiu, Gezim
%A Bühmann, Lorenz
%A Westphal, Patrick
%A Stadler, Claus
%A Ermilov, Ivan
%A Bin, Simon
%A Chakraborty, Nilesh
%A Saleem, Muhammad
%A Ngonga, Axel-Cyrille Ngomo
%A Jabeen, Hajira
%B Proceedings of 16th International Semantic Web Conference — Resources Track (ISWC’2017)
%D 2017
%I Springer
%P 147–155
%T Distributed Semantic Analytics using the SANSA Stack
%U http://svn.aksw.org/papers/2017/ISWC_SANSA_SoftwareFramework/public.pdf - Benchmarking Faceted Browsing Capabilities of TriplestoresIn: 13th International Conference on Semantic Systems (SEMANTiCS 2017), September 11–14 2017, Amsterdam, NetherlandsHenning Petzka, Claus Stadler, Georgios Katsimpras, Bastian Haarmann and Jens Lehmann
@inproceedings{petzka-semantics-facets,
author = {Petzka, Henning and Stadler, Claus and Katsimpras, Georgios and Haarmann, Bastian and Lehmann, Jens},
booktitle = {13th International Conference on Semantic Systems (SEMANTiCS 2017), September 11–14 2017, Amsterdam, Netherlands},
keywords = {sys:relevantFor:infai},
title = {Benchmarking Faceted Browsing Capabilities of Triplestores},
year = 2017
}%0 Conference Paper
%1 petzka-semantics-facets
%A Petzka, Henning
%A Stadler, Claus
%A Katsimpras, Georgios
%A Haarmann, Bastian
%A Lehmann, Jens
%B 13th International Conference on Semantic Systems (SEMANTiCS 2017), September 11–14 2017, Amsterdam, Netherlands
%D 2017
%T Benchmarking Faceted Browsing Capabilities of Triplestores
%U https://svn.aksw.org/papers/2017/Semantics_faceted_browsing/public.pdf - JPA Criteria Queries over RDF DataIn: Workshop on Querying the Web of Data co-located with the Extended Semantic Web ConferenceClaus Stadler and Jens Lehmann
@inproceedings{quweda_jpa,
author = {Stadler, Claus and Lehmann, Jens},
booktitle = {Workshop on Querying the Web of Data co-located with the Extended Semantic Web Conference},
keywords = {MOLE},
title = {JPA Criteria Queries over RDF Data},
year = 2017
}%0 Conference Paper
%1 quweda_jpa
%A Stadler, Claus
%A Lehmann, Jens
%B Workshop on Querying the Web of Data co-located with the Extended Semantic Web Conference
%D 2017
%T JPA Criteria Queries over RDF Data
%U https://svn.aksw.org/papers/2017/QuWeDa_jpa/public.pdf
2016
- Towards Sustainable view-based Extract-Transform-Load (ETL) Fusion of Open DataIn: Proceedings of the Workshop on 3rd Workshop on Linked Data Quality co-located with the European Semantic Web Conference 2016 (ESWC 2016)kmueller stadler singh hellmannOpenly available datasets originate from different data providers which range from government agencies, over commercial enterprises to communities of data enthusiasts. Integrating different source datasets into a single RDF graph by using ETL (Extract-Transform-Load) sys- tems which perform offline transformation, ontology matching and link- ing techniques usually takes many iterations of revisions until the target dataset is made free of the most obvious mapping, linking and consis- tency errors. Since ETL systems produce the RDF offline, any map- ping or content change requires a re-ingest of the relevant source data. When dealing with heterogeneous source datasets, creating a unified tar- get dataset can be a tedious undertaking. Therefore the paper proposes an RDF view based ingestion approach, which allows real-time “debug- ging” of the unified dataset where mappings and links can be changed with immediate effect. Once the unified graph passes all data quality tests, the RDF can be materialized. This process poses an alternative to existing ETL solutions.
@inproceedings{kmueller-ldq16-etl-views,
abstract = {Openly available datasets originate from different data providers which range from government agencies, over commercial enterprises to communities of data enthusiasts. Integrating different source datasets into a single RDF graph by using ETL (Extract-Transform-Load) sys- tems which perform offline transformation, ontology matching and link- ing techniques usually takes many iterations of revisions until the target dataset is made free of the most obvious mapping, linking and consis- tency errors. Since ETL systems produce the RDF offline, any map- ping or content change requires a re-ingest of the relevant source data. When dealing with heterogeneous source datasets, creating a unified tar- get dataset can be a tedious undertaking. Therefore the paper proposes an RDF view based ingestion approach, which allows real-time “debug- ging” of the unified dataset where mappings and links can be changed with immediate effect. Once the unified graph passes all data quality tests, the RDF can be materialized. This process poses an alternative to existing ETL solutions.},
author = {kmueller stadler singh hellmann},
booktitle = {Proceedings of the Workshop on 3rd Workshop on Linked Data Quality co-located with the European Semantic Web Conference 2016 (ESWC 2016)},
keywords = {hellmann},
month = {05},
title = {Towards Sustainable view-based Extract-Transform-Load (ETL) Fusion of Open Data},
year = 2016
}%0 Conference Paper
%1 kmueller-ldq16-etl-views
%A hellmann, kmueller stadler singh
%B Proceedings of the Workshop on 3rd Workshop on Linked Data Quality co-located with the European Semantic Web Conference 2016 (ESWC 2016)
%D 2016
%T Towards Sustainable view-based Extract-Transform-Load (ETL) Fusion of Open Data
%U http://svn.aksw.org/papers/2016/ESWC_ETL_VIEWS/public.pdf
%X Openly available datasets originate from different data providers which range from government agencies, over commercial enterprises to communities of data enthusiasts. Integrating different source datasets into a single RDF graph by using ETL (Extract-Transform-Load) sys- tems which perform offline transformation, ontology matching and link- ing techniques usually takes many iterations of revisions until the target dataset is made free of the most obvious mapping, linking and consis- tency errors. Since ETL systems produce the RDF offline, any map- ping or content change requires a re-ingest of the relevant source data. When dealing with heterogeneous source datasets, creating a unified tar- get dataset can be a tedious undertaking. Therefore the paper proposes an RDF view based ingestion approach, which allows real-time “debug- ging” of the unified dataset where mappings and links can be changed with immediate effect. Once the unified graph passes all data quality tests, the RDF can be materialized. This process poses an alternative to existing ETL solutions.
2015
- CubeViz — Exploration and Visualization of Statistical Linked DataIn: Proceedings of the 24th International Conference on World Wide Web, WWW 2015Michael Martin, Konrad Abicht, Claus Stadler, S{ö}ren Auer, Axel‑C. Ngonga Ngomo and Tommaso SoruCubeViz is a flexible exploration and visualization platform for statistical data represented adhering to the RDF Data Cube vocabulary. If statistical data is provided adhering to the Data Cube vocabulary, CubeViz exhibits a faceted browsing widget allowing to interactively filter observations to be visualized in charts. Based on the selected structural part, CubeViz offers suitable chart types and options for configuring the visualization by users. In this demo we present the CubeViz visualization architecture and components, sketch its underlying API and the libraries used to generate the desired output. By employing advanced introspection, analysis and visualization bootstrapping techniques CubeViz hides the schema complexity of the encoded data in order to support a user-friendly exploration experience.
@inproceedings{martin-www-2015-demo-cubeviz,
abstract = {CubeViz is a flexible exploration and visualization platform for statistical data represented adhering to the RDF Data Cube vocabulary. If statistical data is provided adhering to the Data Cube vocabulary, CubeViz exhibits a faceted browsing widget allowing to interactively filter observations to be visualized in charts. Based on the selected structural part, CubeViz offers suitable chart types and options for configuring the visualization by users. In this demo we present the CubeViz visualization architecture and components, sketch its underlying API and the libraries used to generate the desired output. By employing advanced introspection, analysis and visualization bootstrapping techniques CubeViz hides the schema complexity of the encoded data in order to support a user-friendly exploration experience.},
author = {Martin, Michael and Abicht, Konrad and Stadler, Claus and Auer, S{ö}ren and Ngonga Ngomo, Axel‑C. and Soru, Tommaso},
booktitle = {Proceedings of the 24th International Conference on World Wide Web, WWW 2015},
keywords = {topic_Exploration},
title = {CubeViz — Exploration and Visualization of Statistical Linked Data},
year = 2015
}%0 Conference Paper
%1 martin-www-2015-demo-cubeviz
%A Martin, Michael
%A Abicht, Konrad
%A Stadler, Claus
%A Auer, S{ö}ren
%A Ngonga Ngomo, Axel‑C.
%A Soru, Tommaso
%B Proceedings of the 24th International Conference on World Wide Web, WWW 2015
%D 2015
%T CubeViz — Exploration and Visualization of Statistical Linked Data
%U https://svn.aksw.org/papers/2015/WWW_Demo_CubeViz/public.pdf
%X CubeViz is a flexible exploration and visualization platform for statistical data represented adhering to the RDF Data Cube vocabulary. If statistical data is provided adhering to the Data Cube vocabulary, CubeViz exhibits a faceted browsing widget allowing to interactively filter observations to be visualized in charts. Based on the selected structural part, CubeViz offers suitable chart types and options for configuring the visualization by users. In this demo we present the CubeViz visualization architecture and components, sketch its underlying API and the libraries used to generate the desired output. By employing advanced introspection, analysis and visualization bootstrapping techniques CubeViz hides the schema complexity of the encoded data in order to support a user-friendly exploration experience. - Simplified {RDB2RDF} MappingIn: Proceedings of the 8th Workshop on Linked Data on the Web (LDOW2015), Florence, ItalyClaus Stadler, Joerg Unbehauen, Patrick Westphal, Mohamed Ahmed Sherif and Jens LehmannThe combination of the advantages of widely used relational databases and semantic technologies has attracted significant research over the past decade. In particular, mapping languages for the conversion of databases to RDF knowledge bases have been developed and standardized in the form of R2RML. In this article, we first review those mapping languages and then devise work towards a unified formal model for them. Based on this, we present the Sparqlification Mapping Language (SML), which provides an intuitive way to declare mappings based on SQL VIEWS and SPARQL construct queries. We show that SML has the same expressivity as R2RML by enumerating the language features and show the correspondences, and we outline how one syntax can be converted into the other. A conducted user study for this paper juxtaposing SML and R2RML provides evidence that SML is a more compact syntax which is easier to understand and read and thus lowers the barrier to offer SPARQL access to relational databases.
@inproceedings{sml,
abstract = {The combination of the advantages of widely used relational databases and semantic technologies has attracted significant research over the past decade. In particular, mapping languages for the conversion of databases to RDF knowledge bases have been developed and standardized in the form of R2RML. In this article, we first review those mapping languages and then devise work towards a unified formal model for them. Based on this, we present the Sparqlification Mapping Language (SML), which provides an intuitive way to declare mappings based on SQL VIEWS and SPARQL construct queries. We show that SML has the same expressivity as R2RML by enumerating the language features and show the correspondences, and we outline how one syntax can be converted into the other. A conducted user study for this paper juxtaposing SML and R2RML provides evidence that SML is a more compact syntax which is easier to understand and read and thus lowers the barrier to offer SPARQL access to relational databases.},
author = {Stadler, Claus and Unbehauen, Joerg and Westphal, Patrick and Sherif, Mohamed Ahmed and Lehmann, Jens},
booktitle = {Proceedings of the 8th Workshop on Linked Data on the Web (LDOW2015), Florence, Italy},
keywords = {MOLE},
title = {Simplified {RDB2RDF} Mapping},
year = 2015
}%0 Conference Paper
%1 sml
%A Stadler, Claus
%A Unbehauen, Joerg
%A Westphal, Patrick
%A Sherif, Mohamed Ahmed
%A Lehmann, Jens
%B Proceedings of the 8th Workshop on Linked Data on the Web (LDOW2015), Florence, Italy
%D 2015
%T Simplified {RDB2RDF} Mapping
%U http://svn.aksw.org/papers/2015/LDOW_SML/paper-camery-ready_public.pdf
%X The combination of the advantages of widely used relational databases and semantic technologies has attracted significant research over the past decade. In particular, mapping languages for the conversion of databases to RDF knowledge bases have been developed and standardized in the form of R2RML. In this article, we first review those mapping languages and then devise work towards a unified formal model for them. Based on this, we present the Sparqlification Mapping Language (SML), which provides an intuitive way to declare mappings based on SQL VIEWS and SPARQL construct queries. We show that SML has the same expressivity as R2RML by enumerating the language features and show the correspondences, and we outline how one syntax can be converted into the other. A conducted user study for this paper juxtaposing SML and R2RML provides evidence that SML is a more compact syntax which is easier to understand and read and thus lowers the barrier to offer SPARQL access to relational databases. - The GeoKnow Generator Workbench: An Integration Platform for Geospatial DataIn: Proceedings of the 3rd International Workshop on Semantic Web Enterprise Adoption and Best PracticeAlejandra Garcia-Rojas, Daniel Hladky, Matthias Wauer, Robert Isele, Claus Stadler and Jens Lehmann
@inproceedings{wasabi_generator,
author = {Garcia-Rojas, Alejandra and Hladky, Daniel and Wauer, Matthias and Isele, Robert and Stadler, Claus and Lehmann, Jens},
booktitle = {Proceedings of the 3rd International Workshop on Semantic Web Enterprise Adoption and Best Practice},
keywords = {ontology},
title = {The GeoKnow Generator Workbench: An Integration Platform for Geospatial Data},
year = 2015
}%0 Conference Paper
%1 wasabi_generator
%A Garcia-Rojas, Alejandra
%A Hladky, Daniel
%A Wauer, Matthias
%A Isele, Robert
%A Stadler, Claus
%A Lehmann, Jens
%B Proceedings of the 3rd International Workshop on Semantic Web Enterprise Adoption and Best Practice
%D 2015
%T The GeoKnow Generator Workbench: An Integration Platform for Geospatial Data - RDF Editing on the WebIn: SEMANTICS 2015, SEM ’15. Vienna, Austria : ACMClaus Stadler, Natanael Arndt, Michael Martin and Jens LehmannWhile several tools for simplifying the task of visualizing (SPARQL accessible) RDF data on the Web are available today, there is a lack of corresponding tools for exploiting standard HTML forms directly for RDF editing. The few related existing systems roughly fall in the categories of (a) applications that are not aimed at being reused as components, (b) form generators, which automatically create forms from a given schema — possibly derived from instance data — or © form template processors which create forms from a manually created specification. Furthermore, these systems usually come with their own widget library, which can only be extended by wrapping existing widgets. In this paper, we present the AngularJS-based \emph{Rdf Edit eXtension} (REX) system, which facilitates the enhancement of standard HTML forms as well as many existing AngularJS widgets with RDF editing support by means of a set of HTML attributes. We demonstrate our system though the realization of several usage scenarios.
@inproceedings{rex_pd,
abstract = {While several tools for simplifying the task of visualizing (SPARQL accessible) RDF data on the Web are available today, there is a lack of corresponding tools for exploiting standard HTML forms directly for RDF editing. The few related existing systems roughly fall in the categories of (a) applications that are not aimed at being reused as components, (b) form generators, which automatically create forms from a given schema — possibly derived from instance data — or © form template processors which create forms from a manually created specification. Furthermore, these systems usually come with their own widget library, which can only be extended by wrapping existing widgets. In this paper, we present the AngularJS-based \emph{Rdf Edit eXtension} (REX) system, which facilitates the enhancement of standard HTML forms as well as many existing AngularJS widgets with RDF editing support by means of a set of HTML attributes. We demonstrate our system though the realization of several usage scenarios.},
author = {Stadler, Claus and Arndt, Natanael and Martin, Michael and Lehmann, Jens},
booktitle = {SEMANTICS 2015},
keywords = {sys:relevantFor:infai},
month = {09},
publisher = {ACM},
series = {SEM ’15},
title = {RDF Editing on the Web},
year = 2015
}%0 Conference Paper
%1 rex_pd
%A Stadler, Claus
%A Arndt, Natanael
%A Martin, Michael
%A Lehmann, Jens
%B SEMANTICS 2015
%D 2015
%I ACM
%T RDF Editing on the Web
%U http://ceur-ws.org/Vol-1481/paper29.pdf
%X While several tools for simplifying the task of visualizing (SPARQL accessible) RDF data on the Web are available today, there is a lack of corresponding tools for exploiting standard HTML forms directly for RDF editing. The few related existing systems roughly fall in the categories of (a) applications that are not aimed at being reused as components, (b) form generators, which automatically create forms from a given schema — possibly derived from instance data — or © form template processors which create forms from a manually created specification. Furthermore, these systems usually come with their own widget library, which can only be extended by wrapping existing widgets. In this paper, we present the AngularJS-based \emph{Rdf Edit eXtension} (REX) system, which facilitates the enhancement of standard HTML forms as well as many existing AngularJS widgets with RDF editing support by means of a set of HTML attributes. We demonstrate our system though the realization of several usage scenarios.
2014
- DBpedia Viewer — An Integrative Interface for DBpedia leveraging the DBpedia Service Eco SystemIn: Proc. of the Linked Data on the Web 2014 WorkshopDenis Lukovnikov, Claus Stadler, Dimitris Kontokostas, Sebastian Hellmann and Jens Lehmann
@inproceedings{ldow_dbpedia_viewer,
author = {Lukovnikov, Denis and Stadler, Claus and Kontokostas, Dimitris and Hellmann, Sebastian and Lehmann, Jens},
booktitle = {Proc. of the Linked Data on the Web 2014 Workshop},
keywords = {hellmann},
title = {DBpedia Viewer — An Integrative Interface for DBpedia leveraging the DBpedia Service Eco System},
year = 2014
}%0 Conference Paper
%1 ldow_dbpedia_viewer
%A Lukovnikov, Denis
%A Stadler, Claus
%A Kontokostas, Dimitris
%A Hellmann, Sebastian
%A Lehmann, Jens
%B Proc. of the Linked Data on the Web 2014 Workshop
%D 2014
%T DBpedia Viewer — An Integrative Interface for DBpedia leveraging the DBpedia Service Eco System
%U https://svn.aksw.org/papers/2014/LDOW_DBpediaInterface/public.pdf - The GeoKnow Generator: Managing Geospatial Data in the Linked Data WebIn: Proceedings of the Linking Geospatial Data WorkshopJon Jay Le Grange, Jens Lehmann, Spiros Athanasiou, Alejandra Garcia Rojas, Giorgos Giannopoulos, Daniel Hladky, Robert Isele, Axel-Cyrille {Ngonga Ngomo}, Mohamed Ahmed Sherif, et al.
@inproceedings{lgd_geoknow_generator,
author = {Grange, Jon Jay Le and Lehmann, Jens and Athanasiou, Spiros and Rojas, Alejandra Garcia and Giannopoulos, Giorgos and Hladky, Daniel and Isele, Robert and {Ngonga Ngomo}, Axel-Cyrille and Sherif, Mohamed Ahmed and Stadler, Claus and Wauer, Matthias},
booktitle = {Proceedings of the Linking Geospatial Data Workshop},
keywords = {MOLE},
title = {The GeoKnow Generator: Managing Geospatial Data in the Linked Data Web},
year = 2014
}%0 Conference Paper
%1 lgd_geoknow_generator
%A Grange, Jon Jay Le
%A Lehmann, Jens
%A Athanasiou, Spiros
%A Rojas, Alejandra Garcia
%A Giannopoulos, Giorgos
%A Hladky, Daniel
%A Isele, Robert
%A {Ngonga Ngomo}, Axel-Cyrille
%A Sherif, Mohamed Ahmed
%A Stadler, Claus
%A Wauer, Matthias
%B Proceedings of the Linking Geospatial Data Workshop
%D 2014
%T The GeoKnow Generator: Managing Geospatial Data in the Linked Data Web
%U http://jens-lehmann.org/files/2014/lgd_geoknow_generator.pdf - Facilitating the Exploration and Visualization of Linked DataIn: S{ö}ren Auer, Volha Bryl and Sebastian Tramp (eds.): Linked Open Data—Creating Knowledge Out of Interlinked Data, Lecture Notes in Computer Science : Springer International Publishing — ISBN 978–3‑319–09845‑6, pp. 90–107Christian Mader, Michael Martin and Claus StadlerThe creation and the improvement of tools that cover exploratory and visualization tasks for Linked Data were one of the major goals focused in the LOD2 project. Tools that support those tasks are regarded as essential for the Web of Data, since they can act as a user-oriented starting point for data customers. During the project, several development efforts were made, whose results either facilitate the exploration and visualization directly (such as OntoWiki, the Pivot Browser) or can be used to support such tasks. In this chapter we present the three selected solutions rsine, CubeViz and Facete.
@incollection{LOD2Book-ExplorationVisualization,
abstract = {The creation and the improvement of tools that cover exploratory and visualization tasks for Linked Data were one of the major goals focused in the LOD2 project. Tools that support those tasks are regarded as essential for the Web of Data, since they can act as a user-oriented starting point for data customers. During the project, several development efforts were made, whose results either facilitate the exploration and visualization directly (such as OntoWiki, the Pivot Browser) or can be used to support such tasks. In this chapter we present the three selected solutions rsine, CubeViz and Facete.},
author = {Mader, Christian and Martin, Michael and Stadler, Claus},
booktitle = {Linked Open Data—Creating Knowledge Out of Interlinked Data},
editor = {Auer, S{ö}ren and Bryl, Volha and Tramp, Sebastian},
keywords = {facete},
pages = {90–107},
publisher = {Springer International Publishing},
series = {Lecture Notes in Computer Science},
title = {Facilitating the Exploration and Visualization of Linked Data},
year = 2014
}%0 Book Section
%1 LOD2Book-ExplorationVisualization
%A Mader, Christian
%A Martin, Michael
%A Stadler, Claus
%B Linked Open Data—Creating Knowledge Out of Interlinked Data
%D 2014
%E Auer, S{ö}ren
%E Bryl, Volha
%E Tramp, Sebastian
%I Springer International Publishing
%P 90–107
%R 10.1007/978–3‑319–09846-3_5
%T Facilitating the Exploration and Visualization of Linked Data
%U https://svn.aksw.org/papers/2014/LOD_rsine/public.pdf
%X The creation and the improvement of tools that cover exploratory and visualization tasks for Linked Data were one of the major goals focused in the LOD2 project. Tools that support those tasks are regarded as essential for the Web of Data, since they can act as a user-oriented starting point for data customers. During the project, several development efforts were made, whose results either facilitate the exploration and visualization directly (such as OntoWiki, the Pivot Browser) or can be used to support such tasks. In this chapter we present the three selected solutions rsine, CubeViz and Facete.
%@ 978–3‑319–09845‑6 - LD viewer-linked data presentation frameworkIn: Proceedings of the 10th International Conference on Semantic Systems : ACM, pp. 124–131Denis Lukovnikov, Claus Stadler and Jens Lehmann
@inproceedings{ld_viewer,
author = {Lukovnikov, Denis and Stadler, Claus and Lehmann, Jens},
booktitle = {Proceedings of the 10th International Conference on Semantic Systems},
keywords = {sys:relevantFor:infai},
organization = {ACM},
pages = {124–131},
title = {LD viewer-linked data presentation framework},
year = 2014
}%0 Conference Paper
%1 ld_viewer
%A Lukovnikov, Denis
%A Stadler, Claus
%A Lehmann, Jens
%B Proceedings of the 10th International Conference on Semantic Systems
%D 2014
%P 124–131
%T LD viewer-linked data presentation framework
%U https://svn.aksw.org/papers/2014/Semantics_ld_viewer/public.pdf - Jassa — {A} JavaScript suite for SPARQL-based faceted searchIn: Proceedings of the {ISWC} Developers Workshop 2014, co-located with the 13th International Semantic Web Conference {(ISWC} 2014), Riva del Garda, Italy, October 19, 2014., pp. 31–36Claus Stadler, Patrick Westphal and Jens Lehmann
@inproceedings{jassa,
author = {Stadler, Claus and Westphal, Patrick and Lehmann, Jens},
booktitle = {Proceedings of the {ISWC} Developers Workshop 2014, co-located with the 13th International Semantic Web Conference {(ISWC} 2014), Riva del Garda, Italy, October 19, 2014.},
crossref = {DBLP:conf/semweb/2014dev},
keywords = {sys:relevantFor:infai},
pages = {31–36},
title = {Jassa — {A} JavaScript suite for SPARQL-based faceted search},
year = 2014
}%0 Conference Paper
%1 jassa
%A Stadler, Claus
%A Westphal, Patrick
%A Lehmann, Jens
%B Proceedings of the {ISWC} Developers Workshop 2014, co-located with the 13th International Semantic Web Conference {(ISWC} 2014), Riva del Garda, Italy, October 19, 2014.
%D 2014
%P 31–36
%T Jassa — {A} JavaScript suite for SPARQL-based faceted search
%U http://ceur-ws.org/Vol-1268/paper6.pdf - Quality Assurance of RDB2RDF MappingsPatrick Westphal, Claus Stadler and Jens Lehmann
@techreport{rdb2rdf_qa,
author = {Westphal, Patrick and Stadler, Claus and Lehmann, Jens},
keywords = {sys:relevantFor:infai},
title = {Quality Assurance of RDB2RDF Mappings},
year = 2014
}%0 Report
%1 rdb2rdf_qa
%A Westphal, Patrick
%A Stadler, Claus
%A Lehmann, Jens
%D 2014
%T Quality Assurance of RDB2RDF Mappings
%U http://svn.aksw.org/papers/2014/report_QA_RDB2RDF/public.pdf - {Exploring the Web of Spatial Data with Facete}In: Companion proceedings of 23rd International World Wide Web Conference (WWW), pp. 175–178Claus Stadler, Michael Martin and S{ö}ren Auer
@inproceedings{stadler-www,
author = {Stadler, Claus and Martin, Michael and Auer, S{ö}ren},
booktitle = {Companion proceedings of 23rd International World Wide Web Conference (WWW)},
keywords = {facete},
pages = {175–178},
title = {{Exploring the Web of Spatial Data with Facete}},
year = 2014
}%0 Conference Paper
%1 stadler-www
%A Stadler, Claus
%A Martin, Michael
%A Auer, S{ö}ren
%B Companion proceedings of 23rd International World Wide Web Conference (WWW)
%D 2014
%P 175–178
%T {Exploring the Web of Spatial Data with Facete}
%U https://svn.aksw.org/papers/2014/WWW_Facete/public.pdf
2013
- Increasing the Financial Transparency of European Commission Project FundingIn: Semantic Web Journal vol. Special Call for Linked Dataset descriptions, Nr. 2, pp. 157–164Michael Martin, Claus Stadler, Philipp Frischmuth and Jens LehmannThe Financial Transparency System (FTS) of the European Commission contains information about grants for European Union projects starting from 2007. It allows users to get an overview on EU funding, including information on beneficiaries as well as the amount and type of expenditure and information on the responsible EU department. The original dataset is freely available on the European Commission website, where users can query the data using an HTML form and download it in CSV and most recently XML format. In this article, we describe the transformation of this data to RDF and its interlinking with other datasets. We show that this allows interesting queries over the data, which were very difficult without this conversion. The main benefit of the dataset is an increased financial transparency of EU project funding. The RDF version of the FTS dataset will become part of the EU Open Data Portal and eventually be hosted and maintained by the European Union itself.
@article{martin-fts,
abstract = {The Financial Transparency System (FTS) of the European Commission contains information about grants for European Union projects starting from 2007. It allows users to get an overview on EU funding, including information on beneficiaries as well as the amount and type of expenditure and information on the responsible EU department. The original dataset is freely available on the European Commission website, where users can query the data using an HTML form and download it in CSV and most recently XML format. In this article, we describe the transformation of this data to RDF and its interlinking with other datasets. We show that this allows interesting queries over the data, which were very difficult without this conversion. The main benefit of the dataset is an increased financial transparency of EU project funding. The RDF version of the FTS dataset will become part of the EU Open Data Portal and eventually be hosted and maintained by the European Union itself.},
author = {Martin, Michael and Stadler, Claus and Frischmuth, Philipp and Lehmann, Jens},
journal = {Semantic Web Journal},
keywords = {MOLE},
number = 2,
pages = {157–164},
title = {Increasing the Financial Transparency of European Commission Project Funding},
volume = {Special Call for Linked Dataset descriptions},
year = 2013
}%0 Journal Article
%1 martin-fts
%A Martin, Michael
%A Stadler, Claus
%A Frischmuth, Philipp
%A Lehmann, Jens
%D 2013
%J Semantic Web Journal
%N 2
%P 157–164
%T Increasing the Financial Transparency of European Commission Project Funding
%U http://www.semantic-web-journal.net/system/files/swj435.pdf
%V Special Call for Linked Dataset descriptions
%X The Financial Transparency System (FTS) of the European Commission contains information about grants for European Union projects starting from 2007. It allows users to get an overview on EU funding, including information on beneficiaries as well as the amount and type of expenditure and information on the responsible EU department. The original dataset is freely available on the European Commission website, where users can query the data using an HTML form and download it in CSV and most recently XML format. In this article, we describe the transformation of this data to RDF and its interlinking with other datasets. We show that this allows interesting queries over the data, which were very difficult without this conversion. The main benefit of the dataset is an increased financial transparency of EU project funding. The RDF version of the FTS dataset will become part of the EU Open Data Portal and eventually be hosted and maintained by the European Union itself. - Extraktion, Mapping und Verlinkung von Daten im WebIn: Datenbank Spektrum vol. 13, Nr. 2, pp. 77–87S{ö}ren Auer, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Claus Stadler and J{ö}rg Unbehauen
@article{AUE+13a,
author = {Auer, S{ö}ren and Lehmann, Jens and Ngomo, Axel-Cyrille Ngonga and Stadler, Claus and Unbehauen, J{ö}rg},
journal = {Datenbank Spektrum},
keywords = {MOLE},
number = 2,
pages = {77–87},
title = {Extraktion, Mapping und Verlinkung von Daten im Web},
volume = 13,
year = 2013
}%0 Journal Article
%1 AUE+13a
%A Auer, S{ö}ren
%A Lehmann, Jens
%A Ngomo, Axel-Cyrille Ngonga
%A Stadler, Claus
%A Unbehauen, J{ö}rg
%D 2013
%J Datenbank Spektrum
%N 2
%P 77–87
%T Extraktion, Mapping und Verlinkung von Daten im Web
%U http://jens-lehmann.org/files/2013/db_spektrum_linked_data.pdf
%V 13 - Generating SPARQL queries using templatesIn: WIAS journal,Vol. 11, No. 3, 2013.Saeedeh Shekarpour, S{ö}ren Auer, Axel-Cyrille {Ngonga Ngomo}, Daniel Gerber, Sebastian Hellmann and Claus Stadler
@inproceedings{SHE+journal2013,
author = {Shekarpour, Saeedeh and Auer, S{ö}ren and {Ngonga Ngomo}, Axel-Cyrille and Gerber, Daniel and Hellmann, Sebastian and Stadler, Claus},
booktitle = {WIAS journal,Vol. 11, No. 3, 2013.},
keywords = {MOLE},
title = {Generating SPARQL queries using templates},
year = 2013
}%0 Conference Paper
%1 SHE+journal2013
%A Shekarpour, Saeedeh
%A Auer, S{ö}ren
%A {Ngonga Ngomo}, Axel-Cyrille
%A Gerber, Daniel
%A Hellmann, Sebastian
%A Stadler, Claus
%B WIAS journal,Vol. 11, No. 3, 2013.
%D 2013
%T Generating SPARQL queries using templates - CSV2RDF: User-Driven CSV to RDF Mass Conversion FrameworkIn: Proceedings of the ISEM ’13, September 04 — 06 2013, Graz, AustriaIvan Ermilov, S{ö}ren Auer and Claus StadlerGovernments and public administrations started recently to publish large amounts of structured data on the Web, mostly in the form of tabular data such as CSV files or Excel sheets. Various tools and projects have been launched aiming at facilitating the lifting of tabular data to reach semantically structured and linked data. However, none of these tools supported a truly incremental, pay-as-you-go data publication and mapping strategy, which enables effort sharing between data owners, community experts and consumers. In this article, we present an approach for enabling the user-driven semantic mapping of large amounts tabular data. We devise a simple mapping language for tabular data, which is easy to understand even for casual users, but expressive enough to cover the vast majority of potential tabular mappings use cases. We outline a formal approach for mapping tabular data to RDF. Default mappings are automatically created and can be revised by the community using a semantic wiki. The mappings are executed using a sophisticated streaming RDB2RDF conversion. We report about the deployment of our approach at the Pan-European data portal PublicData.eu, where we transformed and enriched almost 10,000 datasets accounting for 7.3 billion triples.
@inproceedings{ermilov-ivan-2013-isem,
abstract = {Governments and public administrations started recently to publish large amounts of structured data on the Web, mostly in the form of tabular data such as CSV files or Excel sheets. Various tools and projects have been launched aiming at facilitating the lifting of tabular data to reach semantically structured and linked data. However, none of these tools supported a truly incremental, pay-as-you-go data publication and mapping strategy, which enables effort sharing between data owners, community experts and consumers. In this article, we present an approach for enabling the user-driven semantic mapping of large amounts tabular data. We devise a simple mapping language for tabular data, which is easy to understand even for casual users, but expressive enough to cover the vast majority of potential tabular mappings use cases. We outline a formal approach for mapping tabular data to RDF. Default mappings are automatically created and can be revised by the community using a semantic wiki. The mappings are executed using a sophisticated streaming RDB2RDF conversion. We report about the deployment of our approach at the Pan-European data portal PublicData.eu, where we transformed and enriched almost 10,000 datasets accounting for 7.3 billion triples.},
author = {Ermilov, Ivan and Auer, S{ö}ren and Stadler, Claus},
booktitle = {Proceedings of the ISEM ’13, September 04 — 06 2013, Graz, Austria},
keywords = {sys:relevantFor:infai},
title = {CSV2RDF: User-Driven CSV to RDF Mass Conversion Framework},
year = 2013
}%0 Conference Paper
%1 ermilov-ivan-2013-isem
%A Ermilov, Ivan
%A Auer, S{ö}ren
%A Stadler, Claus
%B Proceedings of the ISEM ’13, September 04 — 06 2013, Graz, Austria
%D 2013
%T CSV2RDF: User-Driven CSV to RDF Mass Conversion Framework
%U http://svn.aksw.org/papers/2013/ISemantics_CSV2RDF/public.pdf
%X Governments and public administrations started recently to publish large amounts of structured data on the Web, mostly in the form of tabular data such as CSV files or Excel sheets. Various tools and projects have been launched aiming at facilitating the lifting of tabular data to reach semantically structured and linked data. However, none of these tools supported a truly incremental, pay-as-you-go data publication and mapping strategy, which enables effort sharing between data owners, community experts and consumers. In this article, we present an approach for enabling the user-driven semantic mapping of large amounts tabular data. We devise a simple mapping language for tabular data, which is easy to understand even for casual users, but expressive enough to cover the vast majority of potential tabular mappings use cases. We outline a formal approach for mapping tabular data to RDF. Default mappings are automatically created and can be revised by the community using a semantic wiki. The mappings are executed using a sophisticated streaming RDB2RDF conversion. We report about the deployment of our approach at the Pan-European data portal PublicData.eu, where we transformed and enriched almost 10,000 datasets accounting for 7.3 billion triples.
2012
- The German {DBpedia}: A Sense Repository for Linking EntitiesIn: Sebastian Hellmann, Claus Stadler and Jens Lehmann
@incollection{Hellmann2012GermanDBpedia,
author = {Hellmann, Sebastian and Stadler, Claus and Lehmann, Jens},
crossref = {Springer-ldl},
keywords = {hellmann},
title = {The German {DBpedia}: A Sense Repository for Linking Entities},
year = 2012
}%0 Book Section
%1 Hellmann2012GermanDBpedia
%A Hellmann, Sebastian
%A Stadler, Claus
%A Lehmann, Jens
%D 2012
%T The German {DBpedia}: A Sense Repository for Linking Entities - {T}owards an {O}pen-{G}overnmental {D}ata {W}ebIn: Proceedings of the KESW2012Ivan Ermilov, Claus Stadler, Michael Martin and S{ö}ren AuerUp to the present day much effort has been made to publish government data on the Web. However, such data has been published in different formats. For any particular source and use (e.g. exploration, visualization, integration) of such information special applications have to be written. This limits the overall usability of the information provided and makes it diffcult to access information resources. These limitations can be overridden, if the information will be provided using a homogeneous data and access model complying with the Linked Data principles. In this paper we showcase how raw Open Government Data (OGD) from heterogeneous sources can be processed, converted, published and used on the Web of Linked Data. In particular we demonstrate our experience in processing of OGD on two use cases: the Digital Agenda Scoreboard and the Financial Transparency System of the European Commission.
@inproceedings{ermilov-i-2012–a,
abstract = {Up to the present day much effort has been made to publish government data on the Web. However, such data has been published in different formats. For any particular source and use (e.g. exploration, visualization, integration) of such information special applications have to be written. This limits the overall usability of the information provided and makes it diffcult to access information resources. These limitations can be overridden, if the information will be provided using a homogeneous data and access model complying with the Linked Data principles. In this paper we showcase how raw Open Government Data (OGD) from heterogeneous sources can be processed, converted, published and used on the Web of Linked Data. In particular we demonstrate our experience in processing of OGD on two use cases: the Digital Agenda Scoreboard and the Financial Transparency System of the European Commission.},
author = {Ermilov, Ivan and Stadler, Claus and Martin, Michael and Auer, S{ö}ren},
booktitle = {Proceedings of the KESW2012},
keywords = {sys:relevantFor:lod2},
title = {{T}owards an {O}pen-{G}overnmental {D}ata {W}eb},
year = 2012
}%0 Conference Paper
%1 ermilov-i-2012–a
%A Ermilov, Ivan
%A Stadler, Claus
%A Martin, Michael
%A Auer, S{ö}ren
%B Proceedings of the KESW2012
%D 2012
%T {T}owards an {O}pen-{G}overnmental {D}ata {W}eb
%U http://svn.aksw.org/papers/2012/KESW2012_OpenGovWeb/public.pdf
%X Up to the present day much effort has been made to publish government data on the Web. However, such data has been published in different formats. For any particular source and use (e.g. exploration, visualization, integration) of such information special applications have to be written. This limits the overall usability of the information provided and makes it diffcult to access information resources. These limitations can be overridden, if the information will be provided using a homogeneous data and access model complying with the Linked Data principles. In this paper we showcase how raw Open Government Data (OGD) from heterogeneous sources can be processed, converted, published and used on the Web of Linked Data. In particular we demonstrate our experience in processing of OGD on two use cases: the Digital Agenda Scoreboard and the Financial Transparency System of the European Commission. - Knowledge Extraction from Structured SourcesIn: Stefano Ceri and Marco Brambilla (eds.): Search Computing — Broadening Web Search, Lecture Notes in Computer Science. vol. 7538 : Springer, pp. 34–52J{ö}rg Unbehauen, Sebastian Hellmann, S{ö}ren Auer and Claus Stadler
@incollection{UnbehauenSeCo2012,
author = {Unbehauen, J{ö}rg and Hellmann, Sebastian and Auer, S{ö}ren and Stadler, Claus},
booktitle = {Search Computing — Broadening Web Search},
editor = {Ceri, Stefano and Brambilla, Marco},
keywords = {hellmann},
pages = {34–52},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
title = {Knowledge Extraction from Structured Sources},
volume = 7538,
year = 2012
}%0 Book Section
%1 UnbehauenSeCo2012
%A Unbehauen, J{ö}rg
%A Hellmann, Sebastian
%A Auer, S{ö}ren
%A Stadler, Claus
%B Search Computing — Broadening Web Search
%D 2012
%E Ceri, Stefano
%E Brambilla, Marco
%I Springer
%P 34–52
%T Knowledge Extraction from Structured Sources
%U http://svn.aksw.org/papers/2012/SearchComputing_KnowledgeExtraction/public.pdf
%V 7538 - Navigation-induced Knowledge Engineering by ExampleIn: JISTSebastian Hellmann, Jens Lehmann, J{ö}rg Unbehauen, Claus Stadler, Thanh Nghia Lam and Markus Strohmaier
@inproceedings{hellmann-jist-2012-NKE,
author = {Hellmann, Sebastian and Lehmann, Jens and Unbehauen, J{ö}rg and Stadler, Claus and Lam, Thanh Nghia and Strohmaier, Markus},
booktitle = {JIST},
keywords = {MOLE},
title = {Navigation-induced Knowledge Engineering by Example},
year = 2012
}%0 Conference Paper
%1 hellmann-jist-2012-NKE
%A Hellmann, Sebastian
%A Lehmann, Jens
%A Unbehauen, J{ö}rg
%A Stadler, Claus
%A Lam, Thanh Nghia
%A Strohmaier, Markus
%B JIST
%D 2012
%T Navigation-induced Knowledge Engineering by Example
%U http://svn.aksw.org/papers/2012/JIST_NKE/public.pdf - LinkedGeoData: A Core for a Web of Spatial Open DataIn: Semantic Web Journal vol. 3, Nr. 4, pp. 333–354Claus Stadler, Jens Lehmann, Konrad H{ö}ffner and S{ö}ren Auer
@article{SLHA11,
author = {Stadler, Claus and Lehmann, Jens and H{ö}ffner, Konrad and Auer, S{ö}ren},
journal = {Semantic Web Journal},
keywords = {sys:relevantFor:infai},
number = 4,
pages = {333–354},
title = {LinkedGeoData: A Core for a Web of Spatial Open Data},
volume = 3,
year = 2012
}%0 Journal Article
%1 SLHA11
%A Stadler, Claus
%A Lehmann, Jens
%A H{ö}ffner, Konrad
%A Auer, S{ö}ren
%D 2012
%J Semantic Web Journal
%N 4
%P 333–354
%T LinkedGeoData: A Core for a Web of Spatial Open Data
%U http://jens-lehmann.org/files/2012/linkedgeodata2.pdf
%V 3 - {DB}pedia and the {L}ive {E}xtraction of {S}tructured {D}ata from {W}ikipediaIn: Program: electronic library and information systems vol. 46, p. 27Mohamed Morsey, Jens Lehmann, S{ö}ren Auer, Claus Stadler and Sebastian HellmannPurpose — DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the Web using Linked Data and SPARQL. However, the DBpedia release process is heavy-weight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. Design/methodology/approach — Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of recently updated articles. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Findings — During the realization of DBpedia-Live we learned, that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there is a plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications — DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value — The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication.
@article{dbpedia_live_2012,
abstract = {Purpose — DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the Web using Linked Data and SPARQL. However, the DBpedia release process is heavy-weight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. Design/methodology/approach — Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of recently updated articles. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Findings — During the realization of DBpedia-Live we learned, that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there is a plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications — DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value — The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication.},
author = {Morsey, Mohamed and Lehmann, Jens and Auer, S{ö}ren and Stadler, Claus and Hellmann, Sebastian},
journal = {Program: electronic library and information systems},
keywords = {hellmann},
pages = 27,
title = {{DB}pedia and the {L}ive {E}xtraction of {S}tructured {D}ata from {W}ikipedia},
volume = 46,
year = 2012
}%0 Journal Article
%1 dbpedia_live_2012
%A Morsey, Mohamed
%A Lehmann, Jens
%A Auer, S{ö}ren
%A Stadler, Claus
%A Hellmann, Sebastian
%D 2012
%J Program: electronic library and information systems
%P 27
%T {DB}pedia and the {L}ive {E}xtraction of {S}tructured {D}ata from {W}ikipedia
%U http://svn.aksw.org/papers/2011/DBpedia_Live/public.pdf
%V 46
%X Purpose — DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the Web using Linked Data and SPARQL. However, the DBpedia release process is heavy-weight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. Design/methodology/approach — Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of recently updated articles. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Findings — During the realization of DBpedia-Live we learned, that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there is a plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications — DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value — The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication. - Assessing Linked Data Mappings Using Network MeasuresIn: Proceedings of the 9th Extended Semantic Web Conference, Lecture Notes in Computer Science. vol. 7295 : Springer, pp. 87–102Christophe Gu{é}ret, Paul T. Groth, Claus Stadler and Jens Lehmann
@inproceedings{Gueret2012,
author = {Gu{é}ret, Christophe and Groth, Paul T. and Stadler, Claus and Lehmann, Jens},
booktitle = {Proceedings of the 9th Extended Semantic Web Conference},
keywords = {sys:relevantFor:infai},
pages = {87–102},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
title = {Assessing Linked Data Mappings Using Network Measures},
volume = 7295,
year = 2012
}%0 Conference Paper
%1 Gueret2012
%A Gu{é}ret, Christophe
%A Groth, Paul T.
%A Stadler, Claus
%A Lehmann, Jens
%B Proceedings of the 9th Extended Semantic Web Conference
%D 2012
%I Springer
%P 87–102
%T Assessing Linked Data Mappings Using Network Measures
%U http://jens-lehmann.org/files/2012/linked_mapping_qa.pdf
%V 7295 - Managing the life-cycle of Linked Data with the {LOD2} StackIn: Proceedings of International Semantic Web Conference (ISWC 2012). — 22\% acceptance rateS{ö}ren Auer, Lorenz B{ü}hmann, Christian Dirschl, Orri Erling, Michael Hausenblas, Robert Isele, Jens Lehmann, Michael Martin, Pablo N. Mendes, et al.
@inproceedings{Auer+ISWC-2012,
author = {Auer, S{ö}ren and B{ü}hmann, Lorenz and Dirschl, Christian and Erling, Orri and Hausenblas, Michael and Isele, Robert and Lehmann, Jens and Martin, Michael and Mendes, Pablo N. and van Nuffelen, Bert and Stadler, Claus and Tramp, Sebastian and Williams, Hugh},
booktitle = {Proceedings of International Semantic Web Conference (ISWC 2012)},
keywords = {sys:relevantFor:lod2},
note = {22\% acceptance rate},
title = {Managing the life-cycle of Linked Data with the {LOD2} Stack},
year = 2012
}%0 Conference Paper
%1 Auer+ISWC-2012
%A Auer, S{ö}ren
%A B{ü}hmann, Lorenz
%A Dirschl, Christian
%A Erling, Orri
%A Hausenblas, Michael
%A Isele, Robert
%A Lehmann, Jens
%A Martin, Michael
%A Mendes, Pablo N.
%A van Nuffelen, Bert
%A Stadler, Claus
%A Tramp, Sebastian
%A Williams, Hugh
%B Proceedings of International Semantic Web Conference (ISWC 2012)
%D 2012
%T Managing the life-cycle of Linked Data with the {LOD2} Stack
%U http://iswc2012.semanticweb.org/sites/default/files/76500001.pdf - Accessing Relational Data on the Web with SparqlMapIn: JISTJ{ö}rg Unbehauen, Claus Stadler and S{ö}ren Auer
@inproceedings{unbehauen-jist-2012-sparqlmap,
author = {Unbehauen, J{ö}rg and Stadler, Claus and Auer, S{ö}ren},
booktitle = {JIST},
keywords = {sys:relevantFor:infai},
title = {Accessing Relational Data on the Web with SparqlMap},
year = 2012
}%0 Conference Paper
%1 unbehauen-jist-2012-sparqlmap
%A Unbehauen, J{ö}rg
%A Stadler, Claus
%A Auer, S{ö}ren
%B JIST
%D 2012
%T Accessing Relational Data on the Web with SparqlMap
%U https://svn.aksw.org/papers/2012/JIST_SparqlMap/public.pdf
2011
- Keyword-driven SPARQL Query Generation Leveraging Background KnowledgeIn: International Conference on Web IntelligenceSaeedeh Shekarpour, S{ö}ren Auer, Axel-Cyrille {Ngonga Ngomo}, Daniel Gerber, Sebastian Hellmann and Claus Stadler
@inproceedings{SHE+11,
author = {Shekarpour, Saeedeh and Auer, S{ö}ren and {Ngonga Ngomo}, Axel-Cyrille and Gerber, Daniel and Hellmann, Sebastian and Stadler, Claus},
booktitle = {International Conference on Web Intelligence},
keywords = {hellmann},
title = {Keyword-driven SPARQL Query Generation Leveraging Background Knowledge},
year = 2011
}%0 Conference Paper
%1 SHE+11
%A Shekarpour, Saeedeh
%A Auer, S{ö}ren
%A {Ngonga Ngomo}, Axel-Cyrille
%A Gerber, Daniel
%A Hellmann, Sebastian
%A Stadler, Claus
%B International Conference on Web Intelligence
%D 2011
%T Keyword-driven SPARQL Query Generation Leveraging Background Knowledge - Linked Data Quality Assessment through Network AnalysisIn: ISWC 2011 Posters and DemosChristophe Gu{é}ret, Paul Groth, Claus Stadler and Jens Lehmann
@inproceedings{iswc-11-pd-linkqa,
author = {Gu{é}ret, Christophe and Groth, Paul and Stadler, Claus and Lehmann, Jens},
booktitle = {ISWC 2011 Posters and Demos},
keywords = {sys:relevantFor:infai},
title = {Linked Data Quality Assessment through Network Analysis},
year = 2011
}%0 Conference Paper
%1 iswc-11-pd-linkqa
%A Gu{é}ret, Christophe
%A Groth, Paul
%A Stadler, Claus
%A Lehmann, Jens
%B ISWC 2011 Posters and Demos
%D 2011
%T Linked Data Quality Assessment through Network Analysis
%U http://jens-lehmann.org/files/2011/iswc_pd_linkqa.pdf
2010
- {U}pdate {S}trategies for {DB}pedia {L}iveIn: 6th Workshop on Scripting and Development for the Semantic Web Colocated with ESWC 2010 30th or 31st May, 2010 Crete, GreeceClaus Stadler, Michael Martin, Jens Lehmann and Sebastian HellmannWikipedia is one of the largest public information spaces with a huge user community, which collaboratively works on the largest online encyclopedia. Their users add or edit up to 150 thousand wiki pages per day. The DBpedia project extracts RDF from Wikipedia and interlinks it with other knowledge bases. In the DBpedia live extraction mode, Wikipedia edits are instantly processed to update information in DBpedia. Due to the high number of edits and the growth of Wikipedia, the update process has to be very efficient and scalable. In this paper, we present different strategies to tackle this challenging problem and describe how we modified the DBpedia live extraction algorithm to work more efficiently.
@inproceedings{stadler-c-2010–a,
abstract = {Wikipedia is one of the largest public information spaces with a huge user community, which collaboratively works on the largest online encyclopedia. Their users add or edit up to 150 thousand wiki pages per day. The DBpedia project extracts RDF from Wikipedia and interlinks it with other knowledge bases. In the DBpedia live extraction mode, Wikipedia edits are instantly processed to update information in DBpedia. Due to the high number of edits and the growth of Wikipedia, the update process has to be very efficient and scalable. In this paper, we present different strategies to tackle this challenging problem and describe how we modified the DBpedia live extraction algorithm to work more efficiently.},
author = {Stadler, Claus and Martin, Michael and Lehmann, Jens and Hellmann, Sebastian},
booktitle = {6th Workshop on Scripting and Development for the Semantic Web Colocated with ESWC 2010 30th or 31st May, 2010 Crete, Greece},
keywords = {MOLE},
title = {{U}pdate {S}trategies for {DB}pedia {L}ive},
year = 2010
}%0 Conference Paper
%1 stadler-c-2010–a
%A Stadler, Claus
%A Martin, Michael
%A Lehmann, Jens
%A Hellmann, Sebastian
%B 6th Workshop on Scripting and Development for the Semantic Web Colocated with ESWC 2010 30th or 31st May, 2010 Crete, Greece
%D 2010
%T {U}pdate {S}trategies for {DB}pedia {L}ive
%U http://jens-lehmann.org/files/2010/dbpedia_live_eswc.pdf
%X Wikipedia is one of the largest public information spaces with a huge user community, which collaboratively works on the largest online encyclopedia. Their users add or edit up to 150 thousand wiki pages per day. The DBpedia project extracts RDF from Wikipedia and interlinks it with other knowledge bases. In the DBpedia live extraction mode, Wikipedia edits are instantly processed to update information in DBpedia. Due to the high number of edits and the growth of Wikipedia, the update process has to be very efficient and scalable. In this paper, we present different strategies to tackle this challenging problem and describe how we modified the DBpedia live extraction algorithm to work more efficiently.
2009
- {DBpedia} Live ExtractionIn: Proc. of 8th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), Lecture Notes in Computer Science. vol. 5871, pp. 1209–1223Sebastian Hellmann, Claus Stadler, Jens Lehmann and S{ö}ren Auer
@inproceedings{hellmann_odbase_dbpedia_live_09,
author = {Hellmann, Sebastian and Stadler, Claus and Lehmann, Jens and Auer, S{ö}ren},
booktitle = {Proc. of 8th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE)},
keywords = {MOLE},
pages = {1209–1223},
series = {Lecture Notes in Computer Science},
title = {{DBpedia} Live Extraction},
volume = 5871,
year = 2009
}%0 Conference Paper
%1 hellmann_odbase_dbpedia_live_09
%A Hellmann, Sebastian
%A Stadler, Claus
%A Lehmann, Jens
%A Auer, S{ö}ren
%B Proc. of 8th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE)
%D 2009
%P 1209–1223
%R doi:10.1007/978–3‑642–05151-7_33
%T {DBpedia} Live Extraction
%U http://svn.aksw.org/papers/2009/ODBASE_LiveExtraction/dbpedia_live_extraction_public.pdf
%V 5871