Learn to prepare tasty,quality south indian, north indian, vegeterian,non-vegeterian,thai,tiffin,idli,dosa,vada,sambar.
Hortonworks sandbox to start learning hadoop
Hortonworks the enterprise hadoop software vendor offers world class software for enterprises with a stable hadoop suite of software. Sandbox is the download offered by hortonworks. As a first step to learn hadoop we need an environment that is easy to play with. As per hortonworks sandbox is a single node cluster that comes as learnign environment to start learning hadoop and its ecosystem of products . This can be installed in virtual box, vmware in desktop. As per hortonworks this can be installed in microsoft Azure as well. Based on our personal experience azure installation of hortonworks demands A4 sized virtual machine which is nto free. Start learning hadoop in desktop now.
Hortonworks Dataflow 1.1.1 available for use :
Hortonworks dataflow the GUI that helps in collecting, conducting and curating the distributed data from structured, unstructured data sources and pushing onto enriched Hadoop has its latest version Hortonworks dataflow 1.1.1 released and available for download. This software has been released on January 3rd,2016 and can be downloaded from the following link for download
Hortonworks dataflow is powered by Apache nifi and has same GUI as apache nifi. Also this is a integrated platform that helps with transporting data from as small as twitter feeds onto HDFS to as big as a datasource that has continuous streaming of information. Also, data is transported in a secured fashion.
The basic component that comes out of the box as part of Apache nifi is the processor. This is the core component of dataflow. It is possible to choose appropriate processor from among the 90 processors by simple search using name, tag in search box. Drag and drop processor from GUI that is accessed in port 8080. The incoming information or data is referred to as flowfile. Typically relationship is established between processor and datatore which happens to be HDFS in a hadoop environment
Locating Correct Processor hortonworks dataflow :
Apacha nifi is the data ingestion tool that has been customized as Hortonworks dataflow. Processor is the basic component that helps with collecting, aggregating correct information to be processed, pushed onto HDFS. There are more than 90 processors as of date that come as integral part of Apache Nifi. It would be better if we have a easy method to locate the correct processor
1) Drag and drop the processor icon from apache Nifi web user interface
2) Click on tags to locate the processor based on usage. Say, tag ingest is going to get list of processors that start with get
3) Type the processor name in search box and add them
What are all stages in datascience methodology?
Data science the rapidly evolving stream that employs different strategies to come up with an answer for existing business problem is a methodical approach that involves the following phases to start with understanding the business issue until upto coming up with an answer for business problem.
Here are the many different stages of data science methodology:
1) Understand business issues
2) Determining analytical approach
3) Determining data requirements for building analytical models
3) Collect data from many different datasources
4) Understand the type of data. This can be from relational databases, website logs, structured as well as unstructured data
5) Prepare data
6) Modelling data and continuous evaluation of models
7) Deployment of models designed to predict outcomes
8) Get feedback from customers and implement changes to models as needed
What does a typical job of a data analyst or someone who works closely with customers to understand business requirements and utilize data science to solve their business issue do?
A data analyst professional helps client to understand and improve the user experience with their online properties. For collecting data some datasource is needed. This can typically be firm's website as well as web based or desktop based applications associated with storing of data like a EMR system in a typical hospital environment
Clients look for automated analytical solutions that will help them look at the metrics in form of dashboard views, reports etc. A data analyst will work with developer and automate this process using reporting solutions like SSRS, tableau etc depending on the solution being used for product
Data analyst is sometimes expected to do SAS programming that automates the insight retrieval based on statistical modelling techniques like significance testing, t-test, regression analysis etc. These techniques help with pre-treatment, post-treatment, control analysis to identify the best performing location in case of website projects. This can be header, top banner parallel to header, side banner to name a few. The location prominence helps clients place important information based on user behavior for maximum conversion which will be the primary business issue. This is typically referred to as user experience
Propensity matching Technique is the Technique used to predict a set of customers likely to have similar characteristics as another customer group. This technique is typically used in case of projects demanding market segmentation as opposed to projects that involve yes or no type questions
Model to Predict the customers who are more likely to sign up and activate for a product that will be launched in future. This is future business prediction that helps clients with decision making on production as well as inventory