Learn More
The need to process and conceptualize large sparse matrices effectively and efficiently (typically via low-rank approximations) is essential for many data mining applications, including document and image analysis, recommendation systems, and gene expression analysis. The nonnegative matrix fac-torization (NMF) has many advantages to alternative techniques(More)
It is well-known that good initializations can improve the speed and accuracy of the solutions of many nonnegative matrix factorization (NMF) algorithms [56]. Many NMF algorithms are sensitive with respect to the initialization of W or H or both. This is especially true of algorithms of the alternating least squares (ALS) type [55], including the two new(More)
The ranking of sports teams is of significant importance to those who are involved with or interested in the various professional and amateur leagues that exist around the world. We present a ranking algorithm that is simple to implement in SAS code and which gives results that are consistent with some of the best and most well-known computer methods for(More)
Learning from your customers and your competitors has become a real possibility because of the massive amount of web and social media data available. However, this abundance of data requires significantly more time and computer memory to perform analytical tasks. This paper introduces high-performance text mining technology for SAS ® High-Performance(More)
Text mining models routinely represent each document with a vector of weighted term frequencies. This bag-of-words approach has many strengths, one of which is representing the document in a compact form that can be used by standard data mining tools. However, this approach loses most of the contextual information that is conveyed in the relationship of(More)
Many companies search the Web to learn about their competition and understand their potential customers. But how accurate are these search results? For instance, have you ever submitted the query "SAS", only to get results back about "Scandinavian Airline Systems"? This paper presents a SAS-based solution to accessing and clustering Yahoo! search engine(More)
Sparse data sets are common in applications of text and data mining, social network analysis, and recommendation systems. In SAS ® software, sparse data sets are usually stored in the coordinate list (COO) transactional format. Two major drawbacks are associated with this sparse data representation: First, most SAS procedures are designed to handle dense(More)
  • 1