Learn More
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a(More)
We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact(More)
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the(More)
– Mobile social networks extend social networks in the cyberspace into the real world by allowing mobile users to discover and interact with existing and potential friends who happen to be in their physical vicinity. Despite their promise to enable many exciting applications, serious security and privacy concerns have hindered wide adoption of these(More)
As data have been growing rapidly in data centers, deduplication storage systems continuously face challenges in providing the corresponding throughputs and capacities necessary to move backup data within backup and recovery window times. One approach is to build a cluster deduplication storage system with multiple dedu-plication storage system nodes. The(More)
K-Nearest Neighbor Graph (K-NNG) construction is an important operation with many web related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Existing methods for K-NNG construction either do not scale, or are specific to certain similarity measures. We present NN-Descent, a simple yet(More)
With the completion of genome sequences of major model organisms , increasingly sophisticated genetic tools are necessary for investigating the complex and coordinated functions of genes. Here we describe a genetic manipulation system termed ''genomic engineering'' in Drosophila. Genomic engineering is a 2-step process that combines the ends-out(More)
We describe a genetic variation map for the chicken genome containing 2.8 million single-nucleotide polymorphisms (SNPs). This map is based on a comparison of the sequences of three domestic chicken breeds (a broiler, a layer and a Chinese silkie) with that of their wild ancestor, red jungle fowl. Subsequent experiments indicate that at least 90% of the(More)
Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained interesting asymptotic results, provides little guidance on how(More)
Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4(T) (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them,(More)