Search CORE

214 research outputs found

Efficiently Clustering Very Large Attributed Graphs

Author: Akoglu L.
Boldi P.
Combe D.
Deza M.M.
Diestel R.
Duong K.-C.
Protter M. H.
Villa-Vialaneix N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs. The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures. Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an appendix with validation of our attribute model and distance function, omitted in the converence version for lack of space. Please refer to the published versio

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della Ricerca - Università di Roma 3

Outlier Edge Detection Using Random Graph Generation Models and Applications

Author: A Lancichinetti
AK Jain
DJ Watts
G Karypis
H Zhang
J Leskovec
J Shi
J Yang
L Akoglu
L Danon
L Danon
L Liu
L Lu
L Waltman
LC Freeman
M Choudhury De
M Coscia
M Newman
M Rosvall
ME Newman
ME Newman
MEJ Newman
MR Brito
R Yu
S Fortunato
S Lloyd
S Papadopoulos
SE Schaeffer
VD Blondel
VJ Hodge
X Dong
Publication venue
Publication date: 21/06/2016
Field of study

Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

arXiv.org e-Print Archive

Qatar University Institutional Repository

Crossref

Directory of Open Access Journals

Trepo - Institutional Repository of Tampere University

Seismic risk in the city of Al Hoceima (north of Morocco) using the vulnerability index method, applied in Risk-UE project

Author: A Coburn
A Poujol
A Tahayt
A Talhaoui
AM Akoglu
ATC-13
CL Casado
D Benedetti
E d’Acremont
E Faccioli
G Grünthal
JVD Woerd
K Pitilakis
L Ait Brahim
Luis Pujades
M Bezzeghoud
Mimoun Chourak
Mohamed Abed
N Lantada
RMSI
S Lagomarsino
Seif-eddine Cherif
SO Alami El
T Cherkaoui
T Cherkaoui
T Mourabit
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s11069-016-2566-8Al Hoceima is one of the most seismic active regions in north of Morocco. It is demonstrated by the large seismic episodes reported in seismic catalogs and research studies. However, seismic risk is relatively high due to vulnerable buildings that are either old or don’t respect seismic standards. Our aim is to present a study about seismic risk and seismic scenarios for the city of Al Hoceima. The seismic vulnerability of the existing residential buildings was evaluated using the vulnerability index method (Risk-UE). It was chosen to be adapted and applied to the Moroccan constructions for its practicality and simple methodology. A visual inspection of 1102 buildings was carried out to assess the vulnerability factors. As for seismic hazard, it was evaluated in terms of macroseismic intensity for two scenarios (a deterministic and probabilistic scenario). The maps of seismic risk are represented by direct damage on buildings, damage to population and economic cost. According to the results, the main vulnerability index of the city is equal to 0.49 and the seismic risk is estimated as Slight (main damage grade equal to 0.9 for the deterministic scenario and 0.7 for the probabilistic scenario). However, Moderate to heavy damage is expected in areas located in the newer extensions, in both the east and west of the city. Important economic losses and damage to the population are expected in these areas as well. The maps elaborated can be a potential guide to the decision making in the field of seismic risk prevention and mitigation strategies in Al Hoceima.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

UPCommons (Universitat Politècnica de Catalunya)

An exposure-effect approach for evaluating ecosystem-wide risks from human activities

Author: Akoglu E
Boicenco L
Churilova T
Fleming-Lehtinen V
Galil BS
Goodsir F
Goren M
Jongbloed RH
Knights AM
Kryvenko O
Leppanen J-M
Margonski P
Moncheva S
Oguz T
Papadopoulou KN
Piet GJ
Robinson LA
Setala O
Smith CJ
Stefanova K
Tamis JE
Timofte F
White LJ
Publication venue: 'The Korea Society of Information Technology Services'
Publication date: 01/02/2015
Field of study

Ecosystem-based management (EBM) is promoted as the solution for sustainable use. An ecosystem-wide assessment methodology is therefore required. In this paper, we present an approach to assess the risk to ecosystem components from human activities common to marine and coastal ecosystems. We build on: (i) a linkage framework that describes how human activities can impact the ecosystem through pressures, and (ii) a qualitative expert judgement assessment of impact chains describing the exposure and sensitivity of ecological components to those activities. Using case study examples applied at European regional sea scale, we evaluate the risk of an adverse ecological impact from current human activities to a suite of ecological components and, once impacted, the time required for recovery to pre-impact conditions should those activities subside. Grouping impact chains by sectors, pressure type, or ecological components enabled impact risks and recovery times to be identified, supporting resource managers in their efforts to prioritize threats for management, identify most at-risk components, and generate time frames for ecosystem recovery

Crossref

PEARL (Univ. of Plymouth)

OpenMETU (Middle East Technical University)

The iPlant Collaborative: Cyberinfrastructure for Plant Biology

Author: Akoglu A.
Andrews G.
Ane C.
Boyle B.
Brutnell T.
Cazes J.
Cranston K.
Donoghue M. J.
Dooley R.
Enquist B. J.
Feng X.
Gendler K.
Gessler D.
Goff S. A
Gonzales M
Grene R.
Hanlon M.
Helmke M.
Hilgert U.
Hopkins N.
Jordan C.
Kim S. J.
Kleibenstein D. J.
Koesterke L.
Kubach A.
Kvilekval K.
Leebens-Mack J.
Lenards A.
Lent M.
Lowenthal D.
Lowry S.
Lu Z.
Lyons E.
Manjunath B.S.
Matasci N.
McKay S.
McLay R.
Merchant N.
Micklos D.
Mock S.
Muir A.
Myers C. R.
Narro M.
Noutsos C.
O'Meara B.
Pasternak S.
Piel W. H.
Ram S.
Sanderson M. J.
Skidmore E.
Soltis D.
Soltis P.
Spalding E. P.
Stamatakis A.
Stanzione D.
Stapleton A. E
Stein L.
Tang C.
Tannen V.
Vaughn M.
Vision T. J.
Wang L.
Ware D.
Welch S. M.
White J. W.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2011
Field of study

The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services

Cold Spring Harbor Laboratory Institutional Repository

Data-Driven Policy on Feasibility Determination for the Train Shunting Problem

Author: C Aggarwal
J Haahr
JE Hopcroft
JT Haahr
L Akoglu
LG Kroon
M Neumann
N Shervashidze
R Freling
S Verwer
Publication venue: Springer
Publication date: 30/04/2020
Field of study

Crossref

Pure OAI Repository

Reducing Controversy by Connecting Opposing Views

Author: Akoglu L.
Conover M.
Golub G. H.
Guerra P. H. C.
Guo G.
Mejova Y.
Munson S. A.
Pariser E.
Publication venue: International Joint Conference on Artificial Intelligence, Inc
Publication date: 24/05/2018
Field of study

Peer reviewe

arXiv.org e-Print Archive

Crossref

Helsingin yliopiston digitaalinen arkisto

On defining rules for cancer data fabrication

Author: A Adir
A Silvina
CE Roffman
DB Rubin
E Bilgory
E Tsang
G Caiola
H Akoglu
H-M Adorf
JP Reiter
L de Moura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Funding: This research is partially funded by the Data Lab, and the EU H2020 project Serums: Securing Medical Data in Smart Patient-Centric Healthcare Systems (grant 826278).Data is essential for machine learning projects, and data accuracy is crucial for being able to trust the results obtained from the associated machine learning models. Previously, we have developed machine learning models for predicting the treatment outcome for breast cancer patients that have undergone chemotherapy, and developed a monitoring system for their treatment timeline showing interactively the options and associated predictions. Available cancer datasets, such as the one used earlier, are often too small to obtain significant results, and make it difficult to explore ways to improve the predictive capability of the models further. In this paper, we explore an alternative to enhance our datasets through synthetic data generation. From our original dataset, we extract rules to generate fabricated data that capture the different characteristics inherent in the dataset. Additional rules can be used to capture general medical knowledge. We show how to formulate rules for our cancer treatment data, and use the IBM solver to obtain a corresponding synthetic dataset. We discuss challenges for future work.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Discovering Polarized Communities in Signed Networks

Author: Akoglu L.
Beigi G.
Cesa-Bianchi N.
Graells-Garrido E.
Guha R.
Leskovec J.
Liao Q. V.
Marshall A. W.
Munson S. A.
Swamy C.
Publication venue
Publication date: 01/01/2019
Field of study

Signed networks contain edge annotations to indicate whether each interaction is friendly (positive edge) or antagonistic (negative edge). The model is simple but powerful and it can capture novel and interesting structural properties of real-world phenomena. The analysis of signed networks has many applications from modeling discussions in social media, to mining user reviews, and to recommending products in e-commerce sites. In this paper we consider the problem of discovering polarized communities in signed networks. In particular, we search for two communities (subsets of the network vertices) where within communities there are mostly positive edges while across communities there are mostly negative edges. We formulate this novel problem as a "discrete eigenvector" problem, which we show to be NP-hard. We then develop two intuitive spectral algorithms: one deterministic, and one randomized with quality guarantee

\sqrt{n}

(where

n

is the number of vertices in the graph), tight up to constant factors. We validate our algorithms against non-trivial baselines on real-world signed networks. Our experiments confirm that our algorithms produce higher quality solutions, are much faster and can scale to much larger networks than the baselines, and are able to detect ground-truth polarized communities

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale