• Publications
  • Influence
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
TLDR
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora. Expand
Self-Attention Graph Pooling
TLDR
This paper proposes a graph pooling method based on self-attention using graph convolution, which achieves superior graph classification performance on the benchmark datasets using a reasonable number of parameters. Expand
Evaluating window joins over unbounded streams
TLDR
A unit-time-basis cost model is introduced to analyze the expected performance of algorithms for evaluating sliding window joins over pairs of unbounded streams and shows that asymmetric combinations of join algorithms can outperform symmetric join algorithm implementations. Expand
Graph Transformer Networks
TLDR
This paper proposes Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-to-end fashion. Expand
On schema matching with opaque column names and data values
TLDR
The results suggest that the two-step schema matching technique can be a useful addition to a set of (semi) automatic schema matching techniques. Expand
Catching the boat with Strudel: experiences with a Web-site management system
TLDR
This work addresses two main questions: when does a declarative specification of site structure provide significant benefits, and what are the main advantages provided by the semi-structured data model. Expand
Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering
TLDR
It is shown that ranking paragraphs and aggregating answers using Paragraph Ranker improves performance of open-domain QA pipeline on the four open- domain QA datasets by 7.8% on average. Expand
Comparative study of name disambiguation problem using a scalable blocking-based framework
TLDR
This study identifies combinations that are scalable and effective to disambiguate author names in citations based on a scalable two-step framework and presents extensive experimental results. Expand
STRUDEL: a Web site management system
TLDR
The key idea in the STRUDEL system is the separation of the logical view of information available at a Web site, the structure of that information in linked pages, and the ability to restructure information via queries. Expand
DSigDB: drug signatures database for gene set analysis
TLDR
The creation of Drug Signatures Database (DSigDB), a new gene set resource that relates drugs/compounds and their target genes, for gene set enrichment analysis (GSEA) is reported. Expand
...
1
2
3
4
5
...