Projects | analytics

RECENT PROJECTS

The Emergence of Machine Learning, Recommender Systems, Sentiment Analysis and Classification, Social Media, and Wisdom of Crowd

Recommender systems have become extremely common in recent years, and are applied in a variety of applications. Recommender systems are systems that predict the user’s rating and/or preference for movies, products, restaurants, life insurance, financial services, online dating, and Twitter.

Social media such as Facebook, MySpace, and Twitter have exploded as online services where people create and share content and network at a fast rate. In addition, due to their ease of use, speed and reach, social media is setting trends in topics that range from politics to technology and the entertainment. Social media is also form of collective wisdom (crowd sourcing) and combined with Machine Learning techniques is powerful tool at predicting real-world outcomes that can be used to make quantitative and quantitative predictions that outperform those of artificial markets.

Have you ever wondered if you can predict an activity in the future using tweets? (i.e. given a set of tweets and a future timeframe, to extract a set of activities that will be popular during that timeframe.) While this is a very generic question, our main focus will be on prediction of the Movies’ popularity because considerable interest among the social media users, diversity of the opinions and real-world outcomes can be easily observed from box-office revenue for movies. Our Artificial Intelligence system uses Twitter to forecast and predict the movie’s popularity, movie’s success and rank much before its release. Our Artificial Intelligence system based on Machine Learning, recommender system, wisdom of crowd and sentiment analysis of Tweets can outperform market-based predictors and showed the effetiveness of the forecasting power of the social media.

Sentiment Analysis: In addition, our Artificial intelligence system is used to predict rating of product reviews. The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains). Some domains (books and DVDs) have hundreds of thousands of reviews. Others (musical instruments) have only a few hundred. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed.

Machine Learning and Direct Marketing- Email Analytics and Data Mining

Randomly “Direct Marketing” often time create huge cost, weak response and annoy customers while precise Personalized-Targeted Direct Marketing results in a greater return on investment, high profitability, and high probability of positive response by customers. Using historical data of customer’s purchase and direct email marketing campaigns of Retail stores featuring selected merchandise and tracking the results after campaign, our Artificial Intelligence technique has been able to predict precisely the performance of the direct marketing campaign. Our system can also predict precisely whether a campaign scenario will be successful and which scenario will be performed the best. In addition, our Artificial Intelligence system can predict how much incremental sales per customer each campaign can drive, who are the best customers that should be received direct marketing emails and finally who are the customers that should be eliminated from receiving the direct marketing emails?

Data Analytics for Banking

Often time, more than one phone call to the same client is required for a product to be subscribed or purchased. Our Artificial Intelligence techniques can help to predict the outcome of the campaigns and if direct marketing campaigns work. Using data from direct phone marketing campaigns of banking, our Artificial Intelligence technique has been able to predict the outcome with 97-99 percent of the accuracy. Our system can also predict, whether a loan approved is good or bad credit risk with 96-98 Percent of the accuracy.

Airline on-Time Performance and Delay Prediction

Have you ever been stuck in an airport due to flight delay or cancellation? What are the reasons for such flight delayed and cancellations? Is it the older planes cause more delays, or is it the number of people is flying? Is it the weather that causes the delay or the flying between different locations? Is there any critical link in the system or is it delayed due to cascading failures? If you have been wondering about these causes and effects, and if you ever wanted to know if you could have predicted such delayed and avoid them, it’s your chance to find out. Using 100s of thousands to 10s of Millions of flight records, our Artificial Intelligence technique for Airline on-time performance and Delay Prediction has been able to predict the delays and on time with 94-97 percent of the accuracy. Our Decision Trees will also walk you through the casual and effect.

Social Media-Customer Satisfaction and Retention

With the number of competing services available, businesses need to maintain their consumers and reward their loyalty. Focusing on proactively identify when, and why customers will or may leave will help to design an efficient customer retention strategy, start a new marketing campaigns, decide on the timing of campaigns and understand their behavior or reactions to a given marketing campaigns prior to lunch of the new promotion, lucrative offers or marketing strategy. May corporations periodically conduct surveys to ascertain customer perceptions of the firm, its products and services, and to help identify areas that require improvement or attention? After the results are compiled, respondents can be portrayed as belonging somewhere in the range of “totally dissatisfied” to “totally satisfied” customers. Near the middle region, there is potential for a company to generate more satisfied customers by improving performance in the factors that are sensitive to that group. Likewise, a company can lose marginally satisfied customers by failing to maintain/strengthen factors that are important to them. The main goal for any cooperation is to analyze the near-middle regions and identify factors that would cause cross-overs, and recommend actions to preserve and elevate the number of satisfied customers. Successful efforts to proactively convert marginally dissatisfied customers to satisfied ones by even a few percentage points will benefit most companies. Preventing “decay” in the other way is equally important and beneficial.Using thousands of customer records and surveys, our Artificial Intelligence technique for customer satisfaction and retention has been able to predict the satisfied and dissatisfied customers with 90-95 percent of the accuracy. Our decision trees will also walk you through the casual and effect, provide insight, reveals interesting facts and is opening windows into better understanding the customer behavior, proactively.

Customer Relationship Management (CRM): In addition, our extended system is called Artificial Intelligence-CRM system, can precisely predict whether a customer will switch provider (churn), buy the main service/new product (appetency) and/or buy additional extras/upgrades or add-ons (up-selling). Churn is the propensity of customers to switch between service providers, appetency is the propensity of customers to buy a service, and up-selling is the success in selling additional good or services to make a sale more profitable.

Machine Learning for Demand Forecasting of Bike Rental

Bike sharing programs are popular around the world that competes with other forms of public transportation in urban environments. Currently, there are over 500 bike-sharing programs around the world including over 50 IT-based bikesharing in four major cities: Los Angeles, Chicago, San Francisco and New York. These systems use a network of kiosks for users to rent a bike from a one location and return it to a different place on an as-needed basis. The automated kiosks gather all sorts of bike usage data, including duration of rent, departure and arrival locations among others.

While Bikesharing is a sustainable and environmentally friendly transportation, making predictions about future station states is challenging. In addition, Bikesharing systems are highly dynamic, and riders’ behavior is difficult to predict. Therefore their operation suffers from the effects of the fluctuating demand in space and time that leads to severe system inefficiencies and degrading the level of service, system performance and causing disappointment that may result in loss of users. The knowledge of future demand patterns can aid in reducing relocation costs and increasing system performance.

In May 2014, the machine learning competition website kaggle.com opened the competition “Forecast use of a city bikeshare system”. In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.

Using our advanced artificial intelligence technique and given previous rental history, hourly measured weather, indication if today is a holiday/weekday/weekend., we have been able to precisely make predictions on future station states such as current demand for the bikes, demand for bikes at particular hour, and demand for bikes at particular hour and particular day of the week. Tour system will help to solve the dynamic bikesharing rebalancing and inventory balancing, bikesharing rental network scheduling, and optimal routes and that will keep the system balanced.

Electricity Price Forecast

In free Market, what are the prices of the Electricity when prices are not fixed and are affected by demand and supply of the Market? They are set every five minutes and soon will be even every minute. In this study, we look at 10s-100s of thousands of the records and we will find the changes of the price relative to a moving average of the last 24 hours. Our Artificial Intelligence technique has been able to predict the Up and Down of the price with 96-98 percent of the accuracy.

BioMedical/Gene Expression Application

Our application includes classification of varieties of Cancers using Gene Expression using microarray data some with over 37,000 attributes (Gene’s expressions). These are just simple examples of class of Big Data Analytics, with 10s of thousands of attributes. We are looking forward to share the results and findings through both public and private presentations during the next few months to come. We have focused on many key types of Cancers such as Breast Cancer, Colon Tumor, Leukemia, Lung Cancer, Ovarian Cancer, Prostate Cancer, and other Bio-Medical applications such as Central Nervous Systems, Genomic Sequences, and Diffuse Large B-Cell Lymphoma. We have used the public dataset, and we have showed that our techniques outperform of the existing state-or-the-art techniques, with accuracy of %95-%98 (test runs) and %100 for optimized models.

Predicting Email Spam

Email spam is an Unsolicited bulk email (UBE) or Unsolicited commercial email (UCE), sent in large quantities. Spam can be used for products or web sites advertisements, make money fast schemes (fraud), chain letters, pornography and also to spread computer viruses, trojan horses or other malicious software. The objective may be identity theft, or worse (e.g., advance fee fraud). The Most common products advertised were Pharmacy %81, replica %5.4, Enhancers %2.3, and Phishing %2.3. In 2011, the estimated figure for spam messages was around seven trillion. In 2012, global losses due to phishing attacks totaled 1.5 billion dollars. Spam, on the other hand costs consumers and business an estimated 20 billion dollars per year.Spamming remains economically viable because advertisers have no operating costs beyond the management of their mailing lists, and it is difficult to hold senders accountable for their mass mailings. Today, 18.4 percent of all spam is sent from the United States (%3.2 in 2011), 14.7 percent came from South Korea (%6 in 2011) and 5.7 came from Taiwan. Russia and India were responsible for 3.9 percent and 3.7 percent respectively (%9 and %13.9 in 2011). The three largest targets of malicious emails are the United States at 11.8 percent, Germany at 11.4 percent and Great Britain at 8.1 percent. India is fourth at 6.4 percent.Using data from postmaster and indivisual/personal spam and non-spam emails, our Artificial Intelligence technique has been able to predict the outcome with 97-99 percent of the accuracy and outperformed similar systems and state-of-the-art techniques (%84-%95). Our Decision Trees will also provide insight, reveals interesting facts and opening windows into better understanding these criminal acts.

Machine Learning-Deep Learning

Big Data Analytics & Intelligent Insights

We use state-of-the art Neuro Computing and Deep Learning architectures such as deep neural networks, deep Boltzmann and Restricted Boltzmann machine, convolutional deep neural networks, deep belief networks and recurrent neural networks for advanced analytics, fraud detection, anomaly and novelty detection, medical diagnosis and prognosis, automated sensory information system, classification, and clustering. Our technology go beyond the traditional method and replay on Artificial Intelligence/Machine Learning-deep learning technologies and advanced analytics utterly embedded and complemented with advanced platform that can extract actionable insights at speed and scale. We model user’s behavior at scale that can monitor, identify, classify, and predict any abnormal or suspicious user activity and behavior, including possible frauds. The system can also predict anomaly and novelty as it happens and as it develops in real time and provide real time alert/notification and insights.

We are enhancing the technology that has been developed at DoE-National Laboratory is known as Knowledge Mobilization and Intelligent Augmentation (KnowMInA) - Application to Homeland Security and Tool for Intelligent Knowledge Management and Discovery (TIKManD). KnowMInA-TIKManD was an intelligent system that could recognize terrorism activities. KnowMInA was included behavior/profile modeling, reasoning engines, decision-risk analysis, and visual data mining-analytic software. KnowMInA-TIKManD was used to find suspicious pattern in data using geographical maps and recognition technology. KnowMInA-TIKManD was designed to detect unusual patterns; raise alarms based on classification of activities and offer explanations based on automatic learning techniques for why a certain activity is placed in a particular class such as "Safe", "Suspicious", "Dangerous" etc. The underlying techniques combine expert knowledge and data driven rules to continually improve its classification and adapt to dynamic changes in data and expert knowledge. This will also provide the ability to answer “What if?” questions in order to decrease uncertainty and provide a better risk analysis. For Example, US authorities have announced that 5 out of 19 suspected September 11 hijackers met in Las Vegas before the attack. Some of the terrorist attended the same training school, had communication tracks, and flew at the same time period to the same location in the same flight. KnowMInA-TIKManD was intended to capture and recognize such activities and raise alarms.

Data Analytics for Politics

While we know the formal party affiliation (Republican or Democrat) of every congressperson, and intuitively understand what’s Left and Right, and what’s Republican and Democrat mean on the political spectrum, it is not clear and not everyone would agree on the precise degree in which a congressperson being Republication/Democrat or being Leftist/Right-Wing? Have you ever wanted to predict the party affiliation of a member of US House of representative (Democrat or Republican) using knowledge of each congressperson’s voting record or his/her stand/position on issues? Is he/she representing his party affiliation and vote along his/her party? To what degree is the Republican/Democrat congressperson is Republican/Democrat? Is there a way to predict the degree in which a congressperson is Leftist/ Right-Wing? Is he/she representing his party affiliation and vote along his/her party? What’s the degree of the similarity of one congressperson to the other? What would be the most likelihood of his/her vote/stand/position on a specific issue? To what degree one congressperson would agree or disagree with another congressperson on a specific issue? The answer to such questions will also help Voters; 1) To have better understanding of the viewpoints of their Representatives and help them to decide, should do they re-elect their Representative given their own personal view on an issue or their affiliation to a specific Party. Given qualitative, perceptual, and high-dimensional nature of such votes, the traditional concept of proximity, similarity, distance or nearest neighbor may not even be meaningful. Using data from the U.S. House of Representatives and the vote history of the Congressperson on selected key issues, our Artificial Intelligence technique which is based on both quantitative and qualitative/perceptual information has been able to predict both the Congressperson affiliation and the degree in which he/she is affiliated to the specific party with 96-98 percent of the accuracy. Our system can also be used to predict 1) what would be the most likelihood of his/her vote/stand/position on a specific issue? And 2) to what degree one congressperson would agree or disagree with another congressperson on a specific issue? Of course, this type of problem has many commercial applications (i.e. predicting the type of customer based on their purchase history).

BIG Data Analytics & Machine Learning

Strategic Consulting