547 research outputs found

    Inferring Social Status and Rich Club Effects in Enterprise Communication Networks

    Full text link
    Social status, defined as the relative rank or position that an individual holds in a social hierarchy, is known to be among the most important motivating forces in social behaviors. In this paper, we consider the notion of status from the perspective of a position or title held by a person in an enterprise. We study the intersection of social status and social networks in an enterprise. We study whether enterprise communication logs can help reveal how social interactions and individual status manifest themselves in social networks. To that end, we use two enterprise datasets with three communication channels --- voice call, short message, and email --- to demonstrate the social-behavioral differences among individuals with different status. We have several interesting findings and based on these findings we also develop a model to predict social status. On the individual level, high-status individuals are more likely to be spanned as structural holes by linking to people in parts of the enterprise networks that are otherwise not well connected to one another. On the community level, the principle of homophily, social balance and clique theory generally indicate a "rich club" maintained by high-status individuals, in the sense that this community is much more connected, balanced and dense. Our model can predict social status of individuals with 93% accuracy.Comment: 13 pages, 4 figure

    Metropolis

    Get PDF

    Lights in a Cafe

    Get PDF

    exKidneyBERT: A Language Model for Kidney Transplant Pathology Reports and the Crucial Role of Extended Vocabularies

    Get PDF
    Background: Pathology reports contain key information about the patient’s diagno- sis as well as important gross and microscopic findings. These information-rich clinical reports offer an invaluable resource for clinical studies, but data extraction and anal- ysis is often manual and tedious given their unstructured texts. Thus, an automated data extraction method from pathology reports would be of significant value and utility. Language modeling is useful for classifying and extracting information from natural lan- guage reports. Released in 2018, Bidirectional Encoder Representations from Transform- ers (BERT) achieved state-of-the-art performance on several natural language processing (NLP) tasks. Pre-training BERT to the task-specific domain usually improves the model performance. BioBERT was pre-trained with large biomedical corpora on BERT and out- performed BERT on biomedical NLP tasks. Clinical BERT pre-trained with clinical data on BioBERT achieved better results than BioBERT on clinical NLP tasks. It is not clear, however, whether pre-training on ever smaller training data sets is worthwhile. Objective: to develop a language model for renal transplant-pathology reports to extract the answers for two pre-defined questions. Methods: The study aimed to answer two pre-defined questions: 1) “What kind of rejection does the patient show?”; and 2)“What is the grade of interstitial fibrosis and tubu- lar atrophy (IFTA)?”. First, we followed the conventionally recommended procedure and pre-trained Clinical BERT further with the corpus which contains 3.4K renal transplant- reports and 1.5M words using Masked Language Modeling to obtain the Kidney BERT. Second, we hypothesize that the conventional pre-training procedure fails to capture the intricate vocabulary of narrow technical domains. We created extended Kidney BERT (exKidneyBERT) by extending the six words to the tokenizer of Clinical BERT and pre- trained with the same corpus as Kidney BERT on Clinical BERT. Third, all three models were fine-tuned with QA heads for the questions. Results: For the first question regarding rejection, the overlap ratio at word level for exKidneyBERT (83.3% for antibody-mediated rejection (ABMR) and 79.2% for T-cell mediated rejection (TCMR)) beats that of both Clinical BERT and Kidney BERT (46.1% for ABMR, and 65.2% for TCMR). For the second question regarding IFTA, the exact match rate of exKidneyBERT (95.8%) beats that of Kidney BERT ( 95.0%) and Clinical BERT (94.7%), Conclusion: When working in domains with highly specialized vocabulary, it is essen- tial to extend the vocabulary library of the BERT tokenizer to improve model performance. In this case, pre-training BERT language models for kidney pathology reports improved model performance even though the training data were relatively small

    学会抄録

    Get PDF
    <p><b>Observation of pulmonary artery sections</b> (200X, HE) The pulmonary artery wall thickness of disease (D) is noticeably increased. In the D sample, 1) the tunica adventicia was more compact and exhibited increased connective tissue; 2) the smooth muscle fiber was thicker; 3) there was excessive fiber production; and 4) the intima was more compact. The arrows indicate the pathological changes.</p

    Provably Convergent Federated Trilevel Learning

    Full text link
    Trilevel learning, also called trilevel optimization (TLO), has been recognized as a powerful modelling tool for hierarchical decision process and widely applied in many machine learning applications, such as robust neural architecture search, hyperparameter optimization, and domain adaptation. Tackling TLO problems has presented a great challenge due to their nested decision-making structure. In addition, existing works on TLO face the following key challenges: 1) they all focus on the non-distributed setting, which may lead to privacy breach; 2) they do not offer any non-asymptotic convergence analysis which characterizes how fast an algorithm converges. To address the aforementioned challenges, this paper proposes an asynchronous federated trilevel optimization method to solve TLO problems. The proposed method utilizes μ\mu-cuts to construct a hyper-polyhedral approximation for the TLO problem and solve it in an asynchronous manner. We demonstrate that the proposed μ\mu-cuts are applicable to not only convex functions but also a wide range of non-convex functions that meet the μ\mu-weakly convex assumption. Furthermore, we theoretically analyze the non-asymptotic convergence rate for the proposed method by showing its iteration complexity to obtain ϵ\epsilon-stationary point is upper bounded by O(1ϵ2)\mathcal{O}(\frac{1}{\epsilon^2}). Extensive experiments on real-world datasets have been conducted to elucidate the superiority of the proposed method, e.g., it has a faster convergence rate with a maximum acceleration of approximately 80%\%.Comment: Accepted at AAAI 202
    corecore