NCRACS 2017 - National Conference on Recent Advances in Computer Sciences

"NCRACS- Data Analytics 2017 Conference Papers "

Pages   [1] [2] [3] [4] [5] [6]
 




EFFICIENT SPATIAL DATA HEURISTIC PARTITION USING VORONOI DIAGRAM OVER DYNAMIC LOCATION DATA THROUGH DECENTRALIZED SERVER[ ]


A static index is remodeled sporadically from scratch instead of updated incrementally. It’s been shown that throwaway indices be at specialized moving object indices that maintain location updates incrementally. However, throwaway indices suffer from measurability thanks to their single-server style and therefore the solely distributed throwaway index (D-MOVIES), extension of a centralized approach, doesn’t scale out because the variety of servers will increase, particularly throughout question process section. we have a tendency to propose a distributed throwaway spatial index structure (D-Toss) that not solely scales bent on multiple servers by victimization Associate in Nursing intelligent partitioning technique however additionally scales up since it totally exploits the multi-core CPUs accessible on every server. D-ToSS apace constructs a Voronoi Diagram that incorporates a flat structure creating it an ideal acceptable data processing. for instance, we have a tendency to through an experiment show a twenty five speed in question process compared to D-MOVIES and this gap gets larger because the variety of servers will increase.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PERSONALIZED TRAVEL SEQUENCE RECOMMENDATION BASED ON AUTHOR TOPIC MATRIX MODELING ALGORITHM ON MULTISOURCE BIG DATA[ ]


Nowadays, tour planning is the challenging task because of various interest preferences and trip restrictions such as limitation of time, source and destination points for each tourist. The resources collected from the Internet and travel guides, normally recommend familiarized Point of Interest (POI). Such resources do not provide sufficient information to the users interest preference. Compared to the existing approaches, this approach is both personalized and also able to recommend a travel sequence. Topical package space is constructed which includes representative tags, the cost distributions, visiting time and visiting season of each topic. These resources are mined to bridge the vocabulary gap between user travel preference and travel routes. It utilizes two kinds of social media: travelogue and community-contributed photos. The textual descriptions of both user and routes are mapped to the topical package space to get user topical package model and route topical package model. First famous routes are ranked according to the similarity between user package and route package. Then top ranked routes are further optimized by social similar users travel records. An Author Topic Matrix Modeling Algorithm (ATMMA) is suggested for personalized tours which suggest that the POIs are optimized to the users’ interest preferences and POI popularity.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
WEB-PAGE RECOMMENDATION -THE ONTOLOGY APPROACH[ ]


An Personalized Recommendation System is a system that makes use of representation of items and user-profiles based on Ontology in order to provide Semantic applications with personalized services. In this paper we present method supported by three new knowledge representation models and a set of Web-page recommendation strategies. The first model is an ontology-based model that represents the domain knowledge of a website.The second model is a semantic network that represents domain knowledge .The third model is a conceptual prediction model, which is a navigation network of domain terms based on the frequently viewed Web-pages and represents the integrated Web usage and domain knowledge for supporting Web-page prediction.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PERFORMANCE ANALYSIS OF DIFFERENT TEXT CLASSIFICATION ALGORITHMS[ ]


Today social media plays an important role in everyone’s day-to-day life. Social media is a phrase which people use lot in these days to describe their post on sites and apps like facebook, twitter, instagram, snapchat and others. Social Network Analysis is primarily used by the companies with strong consumer focus-retail, financial, communication and marketing organization. With data mining a retailer can use point-of-sale records of customer purchase to develop products and promotions. Some of the Behaviour analysis applications are health care, education, fraud detection, customer relationship management (CRM) and others. This paper provides the survey on different text classification algorithms used for process of data analysis.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ENHANCED SHUFFLE GROUPING FOR STREAM PROCESSING IN BIG DATA ANALYTICS[ ]


Stream processing systems perform analysis on continuous data streams. A stream processing application contains data operators and streams of tuples containing data to be analysed. Grouping function strategy routes the tuples towards the operator instances. Shuffle grouping is a technique used by stream processing frameworks to share input load among parallel instances of stateless operators. With shuffle grouping each tuple of a stream can be assigned to any available operator instance, independently from any previous assignment. A common approach to implement shuffle grouping is to adopt a round robin routing policy, a simple solution that adapts well as long as the tuple execution time is constant. In shuffle grouping each operating instance gets equal number of tuples. However, such assumption rarely holds where execution time strongly depends on tuple content. As a result, parallel stateless operators within stream processing applications may experience unpredictable unbalance that causes undesirable increase in tuple completion time. Proactive Online Shuffle Grouping (POSG), a novel approach to shuffle grouping aims at reducing the overall tuple completion time. POSG estimates the execution time of each tuple, enabling a proactive and online scheduling of input load to the target operator instances. Sketches are used to efficiently store the otherwise large amount of information required to schedule incoming load.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CROP ROTATION AND YIELD ANALYSIS USING NAIVE RATIO CLASSIFICATION[ ]


Crop rotation and yield analysis is the methodology of predicting and analyzing what crop can be cultivated during the particular month and the yield of that crop prior to harvest. This will help us to predict the crop yield by suggesting the best crop that can be cultivated to improve the quality and profitability of the agricultural sector by processing the given datasets. The objective of this work is to construct a crop yield predictor for farmers that will give suggestions by analyzing various attributes in the historical dataset that were surveyed across Tamil Nadu. A decision support system has been designed using the predicted results and it works by getting basic inputs from the farmers and system will suggest what type of crops can be cultivated to get better yield.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EFFECT OF SENTIMENT EMBEDDINGS IN SENTIMENT ANALYSIS[ ]


Sentiment analysis is the most influencing factors in understanding the users’ behavior. Word representation attempts to respect aspects of word meanings. Word representation is a critical component of many natural language processing systems as word is usually the basic computational unit of texts. To solve this problem, many studies represent each word as continuous, low dimensional, and real-valued vector, also known as word embeddings. Word embeddings have been leveraged as inputs or extra word features for a variety of natural language processing tasks, including machine translation, syntactic parsing, question answering, discourse parsing, etc. Due to the large volume of sentimental data, it is essential to mine such data and find sentiments in order to the overall sentiments on the uploaded dataset of products, politics, sports, education reforms, and so on. Existing system used the concept of pattern recognition for extracting the polarity of the sentences or words. The drawback of this system is that it will not be that much effective to classify or evaluate the datasets. So by using the concept of neural network models, we can evaluate the accuracy metric to find the effectiveness of sentiment embeddings in sentiment analysis.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
JOB-DRIVEN SCHEDULING FOR VIRTUAL MAPREDUCE CLUSTERS[ ]


Virtual private servers (VPSs) rented from VPS provider is cost-efficient for a tenant with a limited budget to establish a virtual MapReduce cluster. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a job-driven scheduling scheme (JoSS) from a tenant’s perspective. JoSS provides not only job level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies MapReduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. Extensive experiments are conducted to evaluate and compare the two variations with current scheduling algorithms supported by Hadoop.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SIMILAR HERB SELECTION USING DATA MINING[ ]


Ethno-pharmacological relevance of Natural products has long been the most important source of ingredients in the discovery of new drugs. Moreover, since the Nagoya Protocol, finding alternative herbs with similar efficacy in traditional medicine has become a very important issue as it proved to be less effective; therefore, this project proposes a novel targeted selection method using data mining approaches in the MEDLINE database to identify and select herbs with a similar degree of efficacy. Phytochemicals are non-nutritive plant chemicals that have protective or disease preventive properties. It is well-known that plant produce these chemicals to protect themselves but recent research demonstrate that they can also protect humans against diseases. The main objective of this project is to provide the performance evaluation of the phytochemicals present in herbs thus, by collecting articles from the MEDLINE database. These articles are analysed and the required phytochemical structures are obtained. Finally the candidate herbs are found out which match the phytochemicals present in the target herb.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFREQUENT WEIGHTED ITEM SET MINING USING FREQUENT PATTERN GROWTH[ ]


Frequent weighted item sets represent correlations frequently holding in data in which items may weight differently. However, in some contexts, e.g., when the need is to minimize a certain cost function, discovering rare data correlations is more interesting than mining frequent ones. This paper tackles the issue of discovering rare and weighted item sets, i.e., the infrequent weighted item set (IWI) mining problem. Two novel quality measures are proposed to drive the IWI mining process. Furthermore, two algorithms that perform IWI and Minimal IWI mining efficiently, driven by the proposed measures, are presented. Experimental results show efficiency and effectiveness of the proposed approach.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ENHANCED COLLABORATIVE RECOMMENDATION SYSTEM USING USER ITEM SUBGROUPS[ ]


The most efficient and successful recommendation approach is Collaborative filtering(CF).The common CF-based recommender system associates with the user including with the group of compatible users according to their individual preferences over all the items and recommends to the user some unobserved items enjoyed by the group. Existing work uses the multiclass co-clustering model (MCoC) for finding meaningful subgroups and the item may be recommended based on the correlation between user and items. In case of applying collaborative filtering, some groups may have very few elements due to unbalanced clustering. Due to this some user may not have enough correlated items for recommendation. This paper proposes a novel clustering model called probabilistic fuzzy c-means algorithm and conditional random fields to generate more informative subgroups by using user relationship and item.The key idea is to find the correlated subgroups which improves the recommendation.Hence, our approach can be seen as a novel enhancement of MCoC model.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DISEASE DIAGNOSIS FINDING SYSTEM USING DIVERGENCE ACCELERATED PARTICLE SWARM OPTIMIZATION IN BIG DATA[ ]


Feature selection is popularly used to lighten the processing load in a data mining model. However, when it comes to mining over high dimensional data, the search space from which an optimal feature subset is derived grows exponentially in size, leading to an intractable manner. The feature selection is designed particularly for mining, streaming data on the fly, by using Accelerated Particle Swarm Optimization (APSO) type of swarm search that achieves enhanced analytical accuracy within reasonable processing time. This paper discusses about modifying APSO swarm search. We include a divergence concept to decrease the processing time and provide high accuracy. The difference between the two positions of global best and local best should be less than the divergence. If the difference is greater than the divergence we have to adjust that position. Finally, compare the performance results of the existing APSO based feature selection with the proposed modified APSO feature selection.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TEXTURE BASED WEED IDENTIFICATION SYSTEM FOR PRECISION FARMING[ ]


Weed control within crop fields is one of the main problems in precision farming. For centuries, different weed removal handheld tools have been used to minimize weeds in the crop fields. The automation of weed detection and removal in the agricultural field is a vital task which greatly improves the cost effectiveness and efficiency of the weed removal processes. This paper compares two texture extraction methods tailored for weed removal process. Nowadays several image processing techniques are used for the identification of weeds in crop field. Eventually it also discusses the performance of those texture extraction and feature selection methods and a classifier which classifies the crop and weed and overcomes the challenges facing in the present day research of weed removal technique in image processing.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EXPLAINING THE WHY-NOT QUESTIONS IN TOP-K QUERY EXECUTION[ ]


In recent days, almost in all fields the data is digitalized. The enormous amount of data generates another set of data in turn. A more specialized database systems are required to manage this set of data, thus enabling efficient fetching of data whenever needed .Due to the fact that the existing database systems are increasingly more difficult to use, improving the quality and the usability of database systems has gained tremendous momentum over the last few years. In particular, the feature of explaining why some expected tuples are missing in the result of a query has received more attention .To approach this problem, we use the query-refinement method. That is, when the original top-k SQL query and a set of missing tuples are given as inputs, our algorithms return to the user a refined query that includes both the missing tuples and the original query results and the penalty will be calculated for getting the expected or the missing tuple. Also the non-numerical values are internally converted into numerical values to get the tuples based on non-numerical entities in the database table.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
POLARITY CONSISTENCY CHECKING FOR MULTIPOLARITY BASED DOMAIN INDEPENDENT SENTIMENT DICTIONARIES [ ]


Polarity classification is very important for sentiment analysis. Polarity consistency checking problem is used to find all the polarity inconsistencies word in sentiment dictionary. We perform experiments on four sentiment dictionaries and WordNet. Several domain independent sentiment dictionaries have been manually or semi automatically created. OF, GI and AL are called sentiment word dictionaries (SWD). The domain dependent dictionaries are constructed by using the positive negative terms based on the particular domain. To construct these dictionaries, the domain knowledge plays a vital role. We proposed a new approach in which we implement hypernym with WordNet. Hyponyms are subdivisions of more general words. The semantic relationship between each of the more specific words (e.g., daisy and rose) and the more general term (flower) is called hyponymy or inclusion. In this system we include subjective and objective senses of a word. It is implemented by using Subjectivity Word Sense Disambiguation. Finally in this proposed system polarity distribution is implemented with multiple polarities. It increases the accuracy of the polarity consistency check. We reduce the polarity consistency problem to the satisfiability problem and utilize two fast SAT solvers to detect inconsistencies in a sentiment dictionary. We perform experiments on five sentiment dictionaries and Wordnet to show inter- and intra-dictionaries inconsistencies.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SECURED ULTIMATE MULTI-PARTY CONFLICT RESOLUTION IN SOCIAL MEDIA[ ]


Facebook is an online social networking service that enables millions of users to share their views and photos. Items shared through Social Media might have an effect on additional than one user’s privacy. Eg .Photos that portray multiple users, comments that mention multiple users etc. The absence of multi-gathering security bolster in current standard online networking makes clients not able to properly control to whom these things are really shared. To resolve this problem, various computational mechanisms are used which helps in merging the privacy preferences of multiple users into a single policy. However, merging multiple users’ privacy preferences is not an easy task, so methods to resolve privacy conflicts are needed. This project proposes the first computational mechanism to resolve conflicts for multi-party privacy management in social media that is able to adapt to different situations by modelling the confessions that users make to reach a solution to the conflicts.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HANDWRITING ANALYSIS BASED HUMAN PERSONALITY PREDICTION USING SUGENO FUZZY MODEL[ ]


Handwriting analysis is an important research area in graphology. In this study it is proposed to predict human writer’s personality, based on handwriting features, using adaptive neuro fuzzy inference system. The proposed work intends to predict the personality traits using Sugeno based fuzzy inference model which is designed for predicting writer behavior. The input parameters are spacing, size, slant, shape, loop, dot, pressure, signature, zones and page margin. The fuzzy system is designed using MATLAB 7.1 toolbox. Performance of the model has been evaluated using mean square error (MSE) and root mean square error (RMSE). The simulation results obtained shows the effectiveness and accuracy of the proposed model.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ANALYSING THE SOCIAL DATA OPINION USING SENTIMENT SENSITIVE EMBEDDINGS[ ]


This project is entitled as “Analysing the Social Data Opinion using Sentiment Sensitive Embeddings”. The main objective of this project is to generate a real time suggestion from user’s comments in various social networks. Numerous comments are being posted in various social media daily, making it a valuable platform for tracking and analyzing public sentiment. This project is used to arrive at a conclusion from these comments in a quick and timely manner. By Using Artificial Neural Network (ANN) and Text categorization a Bag of words is generated. The resulting words will be used to analyze the public sentiment variations by micro analysis method which is called Pattern recognition. Two novel generative models like Clustering and Classification, which contains an inbuilt data dictionary are developed to solve the reason mining problem. Ranking is used for the estimation of relevance in posted comments to obtain a more specific outcome. All kind of user sentiments will be scrutinized from hitting comments whereas non-hitting sentences will be considered as moderate comment. An overall report in a pictorial format will be generated for each post from the given comments which can be used for supernumerary works.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SURVEY OF BIG DATA ANALYTICS IN HEALTH CARE[ ]


The world is floating by information today. The amount of information that we gather and consume is thriving forcefully in the digitized world. Increased usage of new technical advancements and online networking create unlimited information that can acquire useful insights if appropriately investigated. The dataset which cannot be handled by traditional database System are referred as big data. The size of big data differs based on the data management capacity of the organization. In healthcare, Big data helps in providing the best possible diagnosis and to make smart decisions. This paper discusses the concepts and characteristics of Big Data and the role big data analytics in health care. Challenges of Big data in medical applications are also discussed.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Pages   [1] [2] [3] [4] [5] [6]