Quantcast
Channel: Data Platform
Viewing all 808 articles
Browse latest View live

Microsoft releases HDInsight for Hadoop 2.4 And Gains 100x Performance Increase

$
0
0

Yesterday, at the Hadoop Summit, Microsoft announced that Azure HDInsight now supports Hadoop 2.4.  This is the next release of our 100 percent Apache Hadoop-based distribution for Microsoft Azure. In conjunction with Hortonworks recent release of HDP 2.1 as a Hadoop on Windows offering and Analytics Platform System as a Hadoop and data warehousing appliance, this release of Azure continues our strategy of making Hadoop accessible to everybody with Hadoop in the cloud.

This release of HDInsight is important because it has the latest benefits of Apache Hadoop 2.4 which provides order magnitude (up to 100x) performance improvements to query response times and continues to leverage the benefits of YARN (upgrading to the future “Data Operating System For Hadoop 2.0”). Finally, we are also providing an easy-to-use web interface that gives users of HDInsight a friendly experience. You can create queries with a graphical user interface to issue Hive queries.

The 100x improvements to query response times is due to the Stinger initiative where Microsoft in collaboration with Hortonworks and the open source software community (OSS) have brought some of the technological breakthroughs of SQL Server to Hadoop. We are excited to see Microsoft-led contributions bring generational improvements to Hadoop.

Since the Azure HDInsight’s release on October, 2013, we have seen tremendous momentum of customers deploying Hadoop in the cloud. Beth Israel Deaconess Medical Center, a teaching hospital for Harvard Medical School is using HDInsight to process large amounts of unstructured log data and to maintain their stringent requirements of data retention (that can be as long as 30 years). Virginia Polytechnic Institute is using the power of HDInsight to analyze massive amounts of DNA sequencing data.  More of these examples can be read on CIO magazine who recently highlighted several HDInsight customer stories.

With Hortonworks HDP 2.1 For Windows, Microsoft Analytics Platform System, Microsoft Azure, Microsoft customers have an unprecedented number of options to deploy Hadoop on-premise, in the cloud or hybrid.  We invite you to learn more through the following resources:


Announcing the preview of Apache HBase clusters inside Microsoft Azure HDInsight

$
0
0

On June 3, Microsoft announced an update to HDInsight to support Hadoop 2.4 for 100x faster queries.  Today, we are announcing the preview of Apache HBase clusters inside Microsoft Azure HDInsight.

HBase is a NoSQL (“not only Structured Query Language”) database component of the Apache Hadoop ecosystem. While relational database management systems (RDBMS) typically use rigid tabular schemas, NoSQL databases uses fluid techniques such as key-value, column, graph, or document. They are usually designed for elasticity over large datasets and are less rigorous when it comes to schema.

HBase is a columnar NoSQL database that was built to run on top of the Hadoop Distributed File System (HDFS). As a low-latency database, it can do OLTP capabilities like updates, inserts, and deletes of data in Hadoop. HBase will have a set of tables that contain rows and column families that you must predefine. However, it provides flexibility in that new columns can be added to the column families at any time.  This makes HBase have flexibility in the schema to adapt to changing requirements quickly.

This preview announcement will enable customers to run HBase as a managed cluster in the cloud (as an integrated feature of Azure HDInsight). The HBase clusters are configured to store data directly in Azure Blob storage. This will enable use cases like:

  • Building interactive websites that work with large datasets stored in Azure Blobs
  • Building services that store sensor and telemetry data from millions of end points in Azure Blobs (which can then be analyzed using HDInsight (Hadoop)

To learn more about HBase, we invite you to read the following resources:

To learn more about Azure HDInsight and Hadoop, we invite you to the following resources:

SQL Server chez les clients – Un référentiel de données fiable et maitrisé avec MDS et DQS

$
0
0

La gestion des données de référence, ou Master Data Management (MDM), est une discipline permettant aux organisations de travailler sur un socle consolidé et fiable pour les informations référentes. SQL Server Master Data Services (MDS) est une plateforme de gestion des données de référence, qui peut être combinée avec Data Quality Services (DQS), pour la gestion de la qualité de donnée.

Dans ce billet, nous allons montrer comment injecter de la qualité dans un référentiel de donnée grâce à l’utilisation combinée des outils MDS et DQS.

...(read more)

SQL Server 2012 Service Pack 2 (SP2) RTM has released.

$
0
0
Dear Customers, Microsoft SQL Server team is pleased to announce the release of SQL Server 2012 Service Pack 2 (SP2). As part of our continued commitment to software excellence for our customers, this upgrade is free and doesn’t...(read more)

SQL Server 2012 SP2 is now available!

$
0
0

Microsoft is pleased to announce the release of SQL Server 2012 Service Pack 2. The Service Pack is available for download on the Microsoft Download Center. As part of our continued commitment to software excellence for our customers, this upgrade is available to all customers with existing SQL Server 2012 deployments.

SQL Server 2012 SP2 contains fixes to issues that have been reported through our customer feedback platforms and Hotfix solutions provided in SQL Server 2012 SP1 cumulative updates up to and including Cumulative Update 9.  Service Pack 2 also includes a handful of design change requests and fixes for issues that have been reported through Windows Error Reporting system.

A few customer-requested updates in Microsoft SQL Server 2012 SP2 are:

  • Enhanced informational messages about AlwaysOn availability groups.  Support for COPY_ONLY backups of an AlwaysOn secondary through DPM.
  • Support for local cube creation.  More Analysis Services and Reporting Services logging to improve troubleshooting.
  • Performance improvements for SQL Server Integration Services around SSIDB deadlock/cleanup.
  • Query planning enhancements and improved troubleshooting diagnostics for hash join/aggregate operator spills of data in tempdb and full text indexes.
  • Replication supportability and functionality enhancements.
  • Storage engine performance enhancements.

For more highlights of SQL Server 2012 SP2, please read here.

For information about upgrading to SQL Server 2012, see setup information here. To obtain SQL Server 2012 SP2 with its improved supportability please visit the links below:

SQL Server SQL Server 2012 SP2

SQL Server SQL Server 2012 SP2 Express

SQL Server SQL Server 2012 SP2 Feature Packs 

Pie in the Sky (June 13th, 2014)

$
0
0

Last week was soooooo busy, I didn't have time for links on Friday. This week has been a little better, so I have some links for you today.

Cloud

Client/mobile

JavaScript/Node.js

Ruby

Misc.

Enjoy!

- Larry

Machine Learning at Microsoft

$
0
0

This blog post is authored by Joseph Sirosh, Corporate Vice President of Machine Learning at Microsoft.

Some months ago, my colleague and friend John Platt and I were bouncing around a few ideas to disseminate the deep advances and practical expertise that Microsoft has accumulated over the years in the field of machine learning (ML). We thought we would share our experiences so that our customers and the community may benefit from it as they embark on their own ML journeys. Hence this blog.

Today we are focused on the introduction of Microsoft Azure Machine Learning – you can learn more about it via the announce blog located here. It is a fully-managed cloud service that will enable data scientists and developers to efficiently embed predictive analytics into their applications, helping organizations use massive amounts of data and bring all the benefits of the cloud to machine learning.

John and I are privileged to be working with some of the foremost thought leaders and experts involved in ML today and we look forward to sharing more about the work we are doing. Stay tuned to this blog for more.

Joseph
Follow me on Twitter 

Real-time predictive analytics in the cloud

$
0
0

Predictive analytics has greatly empowered organizations across industries. One of the key benefits of predictive analytics centers on the ability to shorten decision cycle times, which enables organizations to improve risk management, customer responsiveness, marketing, and supply chain. This is not a small feat given that the ever growing stream of data can be daunting, but when data is effectively captured and analyzed, organizations can gain a distinct and game-changing advantage.

How are financial services, life sciences, retail, technology, and manufacturing organizations increasingly perfecting the art and science of insights generation at scale from event streams and real-time data? Find out by attending our upcoming webinar: Real-time predictive analytics in the cloud.

During our live webinar Krishna R, Regional Head, Mu Sigma, and guest speakers Mike Gualtieri, Principal Analyst, Forrester Research, Inc. and Roger Barga, Group Program Manager, Microsoft will shed light on the rapid, seamless and cost-effective deployment of real-time predictive analytics on the cloud. 

Register for Real-time predictive analytics in the cloud– June, 25, 2014 9AM PST


Why SQL Server runs best on Microsoft Azure

$
0
0

Microsoft Azure Infrastructure Services offers many ways to optimize your on-premises data and data platform projects from development and test of new SQL Server applications to migrating existing on-premises SQL Server instances on the latest images of SQL Server to cost effective hybrid scenarios such as database backup and extended business continuity.  So why is Microsoft Azure Infrastructure Services the ideal place to implement such scenarios?

The latest Images of SQL Server always available

Let’s start with having the latest images of SQL Server as soon as they are made generally available including tuned images for select workloads like an optimized data warehousing image.  An example is Microsoft Azure offered SQL Server 2014 CU1 update at the end of April, the day it was made generally available.  This image is still pending to be uploaded on many other service providers.  In addition critical updates for SQL Server are enabled by default, ensuring that your SQL Server running in an Azure VM always receives these upon release. 

Additional VM Sizes to choose from and more continually being added

With recently added larger VM sizes including the A8 (8 vCPUs, 56GBmem) and A9 (16 vCPUs, 112GB) virtual machines you know have access to more cores and more memory for your larger SQL Server workloads.  You can scale up these large VMs even further with the unique in-memory OLTP design architecture in SQL Server 2014 that removes database contention to allow you to utilize more vCPUs in parallel for increased number of concurrent users and significantly increase transactional performance.

Cross regional connectivity support for extended business continuity and global BI 

Now with the new cross regional Virtual Network connectivity amongst Azure datacenters you can extend the SQL Server business continuity scenario even further by being able to select which Azure datacenters to place AlwaysOn secondaries.  With up to 8 secondaries available with SQL Server 2014, you are not only enhancing business continuity by placing additional asynchronous secondaries but also improving global BI performance by offloading BI reporting from your primary to the closest Azure secondary.  This allows you to take advantage of the global scale that Microsoft Azure datacenters offers along with the cost savings through economies of scale.

Business Continuity with lower RTO and RPO with ExpressRoute

Azure ExpressRoute offers customers a secure dedicated connection to Microsoft Azure.  The connection offers much higher speeds than standard Azure connections as well as greater isolation when it comes to data security.  This means fasters SQL Server backups to Azure storage using the SQL Server cloud backup capability offered in SQL Server 2012 and enhanced in SQL Server 2014.  In addition to improved cloud backup, you can significantly improve hybrid business continuity by combining SQL Server AlwaysOn with ExpressRoute.  The significantly faster and reliable connection between your primary on-premises and secondary in Azure you can improve your recovery point object (RPO) by reducing potential for data loss in asynchronous mode and also improve your recovery time objective (RTO).

The above capabilities in Microsoft Azure make it an ideal environment for you to maximize the benefits of SQL Server hybrid scenarios.  Ready to create your own virtual machine? Check out the resources listed below, and look for upcoming technical blogs to follow on the topics discussed here. 

Try Microsoft Azure

Learn more about Virtual Machines

Read how Amway and Lufthansa leveraged Microsoft SQL Server 2014 and Windows Azure

SQL Server chez les clients – Self-Service BI et Microsoft Azure Machine Learning

$
0
0

Les métiers et professionnels ont un besoin croissant d'analyse de données diverses mais au-delà de la simple analyse nous devons de plus en plus comprendre nos données, les extrapoler, en déduire des prédictions, analyser le passé pour comprendre le futur, en déduire des comportements et prendre des actions ou décisions.

Nous allons parcourir les nouveaux usages et expliquer les fondements du machine learning au travers de la self-service BI connecté à la nouvelle plateforme Microsoft Azure Machine Learning.

...(read more)

SQL Server AlwaysOn Availability Groups Supported between Microsoft Azure Regions

$
0
0

Last year we announced the support of SQL Server AlwaysOn Availability Groups on Microsoft Azure Infrastructure Services.

We’re excited to announce that AlwaysOn Availability Groups are now supported between Microsoft Azure Regions. Today we updated our official documentation describing how to configure these.

AlwaysOn Availability Groups on Microsoft Azure Infrastructure Services

Availability Groups, released in SQL Server 2012 and enhanced in SQL Server 2014, detect conditions impacting SQL Server availability (e.g. SQL service being down or losing connectivity).  When detecting these conditions, the Availability Group fails over a group of databases to a secondary replica. In the context of Azure Infrastructure Services, this significantly increases the availability of these databases during Microsoft Azure’s VM Service Healing (e.g. due to physical hardware failures), platform upgrades, or your own patching of the guest OS or SQL Server.

To ensure SQL Server high availability on Azure Infrastructure Services, you configure an Availability Group, generally with 2 replicas (1 primary, 1 secondary) for automatic failover and a Listener. The replicas correspond to SQL Server instances hosted by separate Virtual Machines within the same Azure Virtual Network (VNET). The Listener is a DNS name that client applications, inside or outside the VNET (inside or outside of Microsoft Azure), can use in their connection string to connect to the primary replica of the Availability Group. This is illustrated in the figure below:

AlwaysOn Availability Groups between Microsoft Azure Regions

Availability Groups are now supported between different Azure Regions. Any regions available today (4 in United States, 2 in Europe, 2 in Asia Pacific, 2 in Japan, and 1 in Brazil).

This builds on top of Microsoft Azure’s new support to connect VNETs in different Azure regions via secure tunnels. After connecting 2 or more VNETs, their VMs can connect to each other, and even join the same Windows domain, as if they were part of the same VNET.

Having Availability Groups spanning two or more Azure regions enables two important SQL Server scenarios on Azure Infrastructure Services: disaster recovery and geo-distributed read scale-out.

 

Scenario 1: SQL Server Disaster Recovery

In this scenario, an Availability Group is expanded with one or more secondary replicas in a different Azure region. This allows quickly recovering SQL Server from a situation impacting a full Azure region (e.g. a gateway hardware failure). This also allows testing disaster recovery processes when desired.

The scenario is depicted in the figure below. An availability group has been configured with 2 replicas (primary P and secondary S1) for automatic failover and a Listener within the virtual network VNET1 in Region 1 (e.g. West US). This guarantees high availability of SQL Server in case of failures within the region.  A secure tunnel has been configured between VNET1 and another virtual network VNET2 in Region 2 (e.g. Central US). The availability group has been expanded with a third replica (S2) configured for manual failover in this VNET to enable disaster recovery in case of failures impacting Region1. Finally, the Listener has been configured to route connections to the primary replica, irrespective of which region hosts it. This allows client applications connect to the primary replica, with the same connection string, after failing over between Azure regions.

  

Scenario 2: SQL Server Geo-Distributed Read Workloads

In this scenario, an Availability Group is expanded with one or more readable secondary replicas in one or more different Azure regions. This allows offloading read workloads from the primary replica to readable secondary replicas in Azure regions that are closer to the source of the read workloads (e.g. reporting or BI apps).

This not only reduces the utilization of resources (CPU, memory, IO) at the primary replica, saving them for write workloads (e.g. OLTP), but also reduces the response time of the read workloads by reducing network latency and leveraging dedicated resources.

The scenario is depicted in the figure below. As before, an Availability Group has been configured with 2 replicas (primary P and secondary S1) for automatic failover and a Listener within the virtual network VNET1 in Region 1 (e.g. Central US). This guarantees high availability of SQL Server in case of failures within the region. 

Two secure tunnels have been configured between VNET1 and two other Virtual Networks: VNET2 in Region 2 (e.g. East US) and VNET3 in Region 3 (e.g. West US). The availability group has been expanded with two readable secondary replicas, one on each Azure region: S2 on Region 2 and S3 on Region 3.

Client applications, inside or outside of Azure, can connect to the closest readable secondary replica to run read workloads. For example, a Reporting App connects to the secondary replica S2 within the same Azure Region 2, and BI App connects to the secondary replica S4 from on-premise via a public endpoint.

  

Remember that the secondary replicas on the remote regions can be failover targets, so they can support disaster recovery besides serving read workloads. In addition, they can be used to take backups, this allows offloading backups from the primary replica to reduce resource utilization, and maintaining backups outside the operational region, if needed for compliance reasons.

Resources

EF 6.1.1 RTM Available

$
0
0

Today we are pleased to announce the availability of EF6.1.1. This patch release includes a number of high priority bug fixes and some contributions from our community.

 

What’s in EF6.1.1?

You can see a list of the fixes/changes included in EF6.1.1 on our CodePlex site.

In particular, we’d like to call out the following two fixes to issues that a number of people have encountered:

 

Where do I get EF6.1.1?

The runtime is available on NuGet. If you are using Code First then there is no need to install the tooling. Follow the instructions on our Get It page for installing the latest version of Entity Framework runtime.

The tooling for Visual Studio 2012 and Visual Studio 2013 is available on the Microsoft Download Center. You only need to install the tooling if you want to use Model First or Database First.

 

Thank you to our contributors

We’d like to say thank you to folks from the community who contributed features, bug fixes, and other changes to the 6.1 release:

 

What’s next?

In addition to working on the next major version of EF (Entity Framework 7), we’re also working on another update to EF6. This update to EF6 is tentatively slated to be another patch release (EF6.1.2) and we are working a series of bug fixes and accepting pull requests.

If you want to learn more about our plans for EF7, see our recent EF7 - New Platforms, New Data Stores post.

Fix for index corruption issue now available for SQL Server 2012 Service Pack 2

$
0
0
Fix for issue described in KB http://support.microsoft.com/kb/2969896 is now available for download for SP2 via the hotfix download link in the KB article....(read more)

SQL Server chez les clients – Optimisation du modèle analytique

$
0
0

Cet article présente les bonnes pratiques à suivre afin d’optimiser l’implémentation d'un modèle multidimensionnel dans SQL Server Analysis Services, puis propose une approche de gestion dynamique du partitionnement des données de faits dans les cubes afin d'accélérer la mise à disposition des données et leurs restitutions.

...(read more)

ODataLib 6.4.0 Release

$
0
0

We are happy to announce that the ODL 6.4.0 is released and available on nuget along with the source code oncodeplex (please read the git history for the v6.4.0 code info and all previous version). Detailed release notes are listed below.

Bug Fix

  • Fix a bug for top level dynamic property payload don’t have @odata.type annotation

New Features

  • ODataLib supports write & read async headers & payload

  ODataLib now supports new aysnc API in .Net 4.0. For example you can now use code like:  var customers = await ctx.Customers.ExecuteAsync();

 

  • OData Client supports using enum in query options & operation parameter

please refer to this blog post http://blogs.msdn.com/b/odatateam/archive/2014/03/18/use-enumeration-types-in-odata.aspx

 

  • Provide a new (API/flag) to enhance the writer performance by disable validation

In ODataMessageWriterSettings and ODataMessageReaderSettings, an flag “BypassValidation” is added to bypass validation to improve perf.

 public class Microsoft.OData.Core.ODataMessageWriterSettings    {     public bool BypassValidation {get;set;}

 public class Microsoft.OData.Core.ODataMessageReaderSettings  {     public bool BypassValidation {get;set;} 

 

 

  • Support server side paging on client
On client side you should be able to write code:

      NorthwindEntities ctx = new NorthwindEntities(new Uri(@"http://services.odata.org/V4/Northwind/Northwind.svc/"));

     var customers = ctx.Customers.GetAllPages(); // automatically get all pages of the Customers entity set   

     foreach (var customer in customers)

     {

           Console.WriteLine(customer.CustomerID);

     }

 

Call to Action

You and your team are highly welcomed to try out this new version if you are interested in the new features and fixes above. For any feature request, issue or idea please feel free to reach out to us.


Webcast: Database cloud backup and DR - a strategic imperative

$
0
0

The cloud is creating radical changes in how information technology is architected.  And its next big target just might be business continuity. Explosive data growth and new application types are causing enterprises to consider cloud as a strategic alternative to growing on-premises storage for backup and disaster recovery.

Many enterprises view backup and disaster recovery (DR) as an IT function that seems to provide little value, even though they must do it to support critical applications. Exponential data growth, shrinking backup windows, static budgets, and increasing deployments of business critical mobile, cloud and web applications are changing the backup and DR requirements. Traditional on-premises approaches for backup and DR cannot keep up with this explosive demand for new database administration requirements.

Join Forrester Research analyst Noel Yuhanna in this webinar as we talk about cloud and database trends, and why enterprises should make database cloud backup and DR part of your enterprise database strategy.

Register now for this webcast on improving business continuity by going to the cloud! The event takes place Wednesday July 2 at 9:00 AM Pacific.

SPEAKER:

 Noel Yuhanna
Principal Analyst Serving Enterprise Architecture Professionals
Forrester Research

Noel serves Enterprise Architecture Professionals. He primarily covers database management systems (DBMSes), infrastructure-as-a-service (IaaS), data replication and integration, data security, data management tools, and related online transaction processing issues. His current primary research focus is on customer usage experiences and broad industry trends of DBMS, IaaS, data security, enterprise data grids, outsourcing, information life-cycle management, open source databases, and other emerging database technologies.

Driving Ground Breaking BI with APS

$
0
0

This blog post will detail how APS gives users the ability to:

  • Leverage Power Query, Power Pivot, and Power Map at massive scale
  • Iteratively query APS, adding BI on the fly
  • Combine data seamlessly from PDW, HDI, and Azure using PolyBase

The Microsoft Analytics Platform System (APS) is a powerful scale out data warehouse solution for aggregating data across a variety of platforms. In Architecture of the Microsoft Analytics Platform System and PolyBase in APS - Yet another SQL over Hadoop solution?, the base architecture of the platform was defined. Here we’ll build on this knowledge to see how APS becomes a key element of your BI story at massive scale.

Let’s first start with a business case. Penelope is a data analyst at a US based restaurant chain with hundreds of locations across the world. She is looking to use the power of the Microsoft BI stack to get insight into the business – both in real time and aggregate form for the last quarter. With the integration of APS with Microsoft BI stack, she is able to extend her analysis beyond simple querying. Penelope is able to utilize the MOLAP data model in SQL Server Analysis Services (SSAS) as a front end to the massive querying capabilities of APS. Using the combined tools, she is able to:

  • Quickly access data in stored aggregations that are compressed and optimized for analysis
  • Easily update these aggregations based on structured and unstructured data sets
  • Transparently access data through Excel’s front-end

Using Excel, Penelope has quick access to all of the aggregations she has stored in SSAS with analysis tools like Power Query, Power Pivot, and Power Map. Using Power Map, Penelope is able to plot the growth of restaurants across America, and sees that lagging sales in two regions, the West Coast and Mid-Atlantic, are affecting the company as a whole.

After Penelope discovers that sales are disproportionately low on the West Coast and in the Mid-Atlantic regions, she can use the speed of APS’ Massively Parallel Processor (MPP) architecture to iteratively query the database, create additional MOLAP cubes on the fly, and focus on issues driving down sales with speed and precision using Microsoft’s BI stack. By isolating the regions in question, Penelope sees that sales are predominantly being affected by two states – California and Connecticut. Drilling down further, she uses Power Chart and Power Pivot to breakdown sales by menu item in the two states, and sees that the items with low sales in those regions are completely different.

While querying relational data stored in APS can get to the root of an issue, by leveraging PolyBase it becomes simple to also take advantage of the world of unstructured data, bringing additional insight from sources such as sensors or social media sites. In this way Penelope is able to incorporate the text of tweets relating to menu items into her analysis. She can use PolyBase’s predicate pushdown ability to filter tweets by geographic region and mentions of the low selling items in those regions, honing her analysis. In this way, she is able to discover that there are two separate issues at play. In California she sees customers complaining about the lack of gluten free options at restaurants, and in Connecticut she sees that many diners find the food to be too spicy.

Iterative Analytics

So how did Penelope use the power of APS to pull in structured data such as Point of Sale (POS), inventory and ordering history, website traffic, and social sentiment into a cohesive, actionable model? By using a stack that combines the might of APS, with the low time to insight of Excel - let’s breakdown the major components:

  • Microsoft Analytics Platform System (APS)
  • Microsoft HDInsight
  • Microsoft SQL Server Analysis Services (SSAS)
  • Microsoft Excel with Power Query, Power Pivot and Power Map

Loading Data in APS and Hadoop

Any analytics team is able to quickly load data into APS from many relational data sources using SSIS. By synchronizing the data flow between their production inventory and POS systems, APS is able to accurately capture and store trillions of transactional rows from within the company. By leveraging the massive scale of APS (up to 6 PB of storage), Penelope doesn’t have to create the data aggregates up front. Instead she can define them later.

Concurrently, her team uses an HDInsight Hadoop cluster running in Microsoft Azure to aggregate all of the individual tweets and posts about the company alongside its menus, locations, public accounts, customer comments, and sentiment. By storing this data in HDInsight, the company is able to utilize the elastic scale of the Azure cloud, and continually update records with real-time sentiment from many social media sites. With PolyBase, Penelope is able to join transactional data with the external tables containing social sentiment data using standard TSQL constructs.

Creating the External Tables

Using the power of PolyBase, the development team can create external tables in APS connected to the HDInsight instance running in Azure. In two such tables, Tweets and WordCloud, Twitter data is easily collected and aggregated in HDFS. Here, the Tweets table is raw data with an additional sentiment value and the WordCloud table is an aggregate of all words used in posts about to the company.

Connecting APS and SSAS to Excel

Within Excel, Penelope has the ability to choose how she would like to access the data. At first she uses the aggregations that are available to her via SSAS – typical sales aggregates like menu items purchases, inventory, etc. – through PowerQuery.

But how does Penelope access the social sentiment data directly from APS? Simple, by using the same data connection tab, Penelope can directly connect to APS and pull in the sentiment data using PolyBase.

Once the process is complete, tables pulled into Excel, as well as their relationships, are shown as data connections.

Once the data connection is created, Penelope is able to create a report using PowerPivot with structured data from the Orders table and the unstructured social sentiment data from HDInsight in Azure.

With both data sets combined in Excel, Penelope is able to then create a Power Map of the sales data layered with the social sentiment. By diving into the details, she can clearly see issues with sentiment from customers in Connecticut and California.

To learn more about APS, please visit http://www.microsoft.com/aps.

Drew DiPalma– Program Manager – Microsoft APS
Drew is a Program Manager working on Microsoft Analytics Platform System.  His work on the team has covered many areas, including MPP architecture, analytics, and telemetry.  Prior to starting with Microsoft, he studied Computer Science and Mathematics at Pomona College in Claremont, CA. 

The Joy (and Hard Work) of Machine Learning

$
0
0

This blog post is authored by Joseph Sirosh.

Few people appreciate the enormous potential of machine learning (ML) in enterprise applications. I was lucky enough to get a taste of its potential benefits just a few months into my first job. It was 1995 and credit card issuers were beginning to adopt neural network models to detect credit card fraud in real-time. When a credit card is used, transaction data from the point of sale system is sent to the card issuer's credit authorization system where a neural network scores for the probability of fraud. If the probability is high, the transaction is declined in real-time. I was a scientist building such models and one of my first model deliveries was for a South American bank. When the model was deployed, the bank identified over a million dollars of previously undetected fraud on the very first day. This was a big eye-opener. In the years since, I have seen ML deliver huge value in diverse applications such as demand forecasting, failure and anomaly detection, ad targeting, online recommendations and virtual assistants like Cortana. By embedding ML into their enterprise systems, organizations can improve customer experience, reduce the risk of systemic failures, grow revenue and realize significant cost savings.

However, building ML systems is slow, time-consuming and error prone. Even though we are able to analyze very large data sets these days and deploy at very high transaction rates, several bottlenecks remain:

  • ML system development requires deep expertise. Even though the core principles of ML are now accessible to a wider audience, talented data scientists are as hard to hire today as they were two decades ago.
  • Practitioners are forced to use a variety of tools to collect, clean, merge and analyze data. These tools have a steep learning curve and are not integrated. Commercial ML software is expensive to deploy and maintain.
  • Building and verifying models requires considerable experimentation. Data scientists often find themselves limited by compute and storage because they need to run a large number of experiments that generate considerable new data.
  • Software tools do not support scalable experimentation or methods for organizing experiment runs. The act of collaborating with a team on experiments, sharing derived variables, scripts, etc. is manual and ad-hoc, without tools support. Evaluating and debugging statistical models remains a challenge.

Data scientists work around these limitations by writing custom programs and by doing undifferentiated heavy lifting as they perform their ML experiments. But it gets harder in the deployment phase. Deploying ML models in a mission-critical business process such as real-time fraud prevention or ad targeting requires sophisticated engineering:

  • Typically, ML models that have been developed offline now have to be re-implemented in a language such as C++, C# or Java.
  • The transaction data pipelines have to be plumbed. Data transformations and variables used in the offline models have to be re-coded and compiled.
  • These re-implementations inevitably introduce bugs, requiring verification that the models work as originally designed.
  • A custom container for the model has to be built, with appropriate monitors, metrics and logging.
  • Advanced deployments require A/B testing frameworks to evaluate alternative models side-by-side. One needs mechanisms to switch models in or out, preferably without recompiling and deploying the entire application.
  • One has to validate that the candidate production model works as originally designed through statistical tests.
  • The automated decisions made by the system and the business outcomes have to be logged for refining the ML models and for monitoring.
  • The service has to be designed for high availability, disaster recovery and geo proximity to end points.
  • When the service has to be scaled to meet higher transaction rates and/or low latency, more work is required to provision new hardware, deploy the service to new machines and scale out.

All of these are time consuming and engineering-intensive steps. It is expensive in terms of both infrastructure and manpower. The end-to-end engineering and maintenance of a production ML application requires a highly skilled team that few organizations can build and sustain.

Microsoft Azure ML was designed to solve these problems:

  • It’s a fully managed cloud service with no software to install, no hardware to manage, no OS versions or development environments to grapple with.
  • Armed with nothing but a browser, data scientists can log on to Azure and start developing ML models from any location, from any device. They can host a practically unlimited number of files on Azure storage.
  • ML Studio, an integrated development environment for ML, lets you set up experiments as simple data flow graphs, with an easy to use drag, drop and connect paradigm. Data scientists can avoid programming for a large number of common tasks, allowing them to focus on experiment design and iteration.
  • Many sample experiments are provided to make it easy to get started.
  • A collection of best of breed algorithms developed by Microsoft Research come built-in, as is support for custom R code– over 350 open source R packages can be used securely within Azure ML.
  • Data flow graphs can have several parallel paths which automatically run in parallel, allowing scientists to execute complex experiments and make side-by-side comparisons without the usual computational constraints.
  • Experiments are readily sharable, so others can pick up on your work and continue where you left off.

Azure ML also makes it simple to create production deployments at scale in the cloud. Pre-trained ML models can be incorporated into a scoring workflow and, with a few clicks, a new cloud-hosted REST API can be created. This REST API has been engineered to respond with low latency. No reimplementation or porting is required – a key benefit over traditional data analytics software. Data from anywhere on the internet – laptops, websites, mobile devices, wearables and connected machines – can be sent to the newly created API to get back predictions. For example, a data scientist can create a fraud detection API that takes transaction information as input and returns a low/medium/high risk indicator as output. Such an API would then be “live” on the cloud, ready to accept calls from any software that a developer chooses to call it from. The API backend scales elastically, so that when transaction rates spike, the Azure ML service can automatically handle the load. There are virtually no limits on the number of ML APIs that a data scientist can create and deploy – and all this without any dependency on engineering. For engineering and IT, it becomes simple to integrate a new ML model using those REST APIs, and testing multiple models side-by-side before deployment becomes easy, allowing dramatically better agility at low cost. Azure provides mechanisms to scale and manage APIs in production, including mechanisms to measure availability, latency, and performance. Building robust, highly available, reliable ML systems and managing the production deployment is therefore dramatically faster, cheaper and easier for the enterprise, with huge business benefits.

We believe Azure ML is a game changer. It makes the incredible potential of ML accessible both to startups and large enterprises. Startups are now able to use the same capabilities that were previously available to only the most sophisticated businesses. Larger enterprises are able to unleash the latent value in their big data to generate significantly more revenue and efficiencies. Above all, the speed of iteration and experimentation that is now possible will allow for rapid innovation and pave the way for intelligence in cloud-connected devices all around us.

When I started my career in 1995, it took a large organization to build and deploy credit card fraud detection systems. With tools like Azure ML and the power of the cloud, a single talented data scientist can accomplish the same feat.

Joseph
Follow me on Twitter

Pie in the Sky (June 27th, 2014)

$
0
0

 

Cloud

Client/Mobile

Node.js

Ruby

Misc.

Cumulative Update #2 for SQL Server 2014 RTM

$
0
0
Dear Customers, The 2 nd cumulative update release for SQL Server 2014 RTM is now available for download at the Microsoft Support site. To learn more about the release or servicing model, please visit: CU#2 KB Article: http://support.microsoft...(read more)
Viewing all 808 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>