SQL Server chez les clients – L’accélération de croissance avec Microsoft Business Intelligence

June 30, 2014, 10:54 pm

≪ Previous: Cumulative Update #2 for SQL Server 2014 RTM

Cet article présente 5 cas d’études d’opportunités de croissance dans le domaine de la Business Intelligence (BI) mis en œuvre par les consultants de l’équipe Data Insight de Microsoft Consulting Services (MCS).

L’objectif de cet article est de montrer comment chacun de ces cas d’études permet d’être un véritable accélérateur de croissance, d’améliorer la productivité et d’accroitre la compétitivité de l’entreprise, à travers l'accompagnement de deux clients grands comptes MCS : un groupe multinational dans le secteur de la gestion de l’environnement et un grand groupe de BTP et de concessions français.

...(read more)

↧

What is Machine Learning?

July 1, 2014, 9:00 am

≫ Next: Real world use cases of the Microsoft Analytics Platform System

≪ Previous: SQL Server chez les clients – L’accélération de croissance avec Microsoft Business Intelligence

This blog post is authored by John Platt, a Distinguished Scientist at Microsoft Research.

Hi, I’m John Platt. For 17 years now, I’ve been creating and using machine learning (ML) algorithms at Microsoft. ML has become very popular lately, so often people ask me, “What is machine learning? What do you use it for?”

I am happy to answer these questions, because, as it turns out, the use of ML is pervasive across Microsoft’s vast array of products – something that makes Microsoft a fun and impactful place to do ML (it’s like being a kid in a candy store).

In general, ML converts data sets into pieces of software, known as “models,” that can represent the data set and generalize to make predictions on new data. Because ML is so broadly used, I find it useful to put ML applications into a rough taxonomy. ML can be used in three different ways:

Data Mining: ML can be used by people to gain insights from large databases.
Statistical Engineering: ML can be used to convert data into software that makes decisions about uncertain data.
Artificial Intelligence: ML can be used to emulate the human mind, to create computers that can see, hear, and understand.

At Microsoft, we use ML for all three of these purposes. For example:

Fighting Malware

The Microsoft Malware Protection Center, collaborating with Microsoft Research, has used ML to create software to automatically detect malware, and to help analysts gain insight into malware development. You can read more about this in a blog entry from Dennis Batchelder.

Building a Search Engine

Microsoft’s Bing search engine is a very complex system that interprets your query, scours the web, and returns information that you will find useful. Because Bing has such high volume of traffic, we must use automated decision-making to handle the uncertainty and ambiguity of natural language. We have exploited ML to create many components of Bing that work together to form a high-quality search engine. One particular form of ML that is useful for search is ranking: a few years ago, a team from Microsoft Research won a Learning to Rank competition using the algorithms we’ve developed.

Enabling Computers to See and Hear

Microsoft has been pushing the state of the art in both computer vision and speech recognition. The software that recognizes your gestures in the Kinect was created by ML. Microsoft’s speech recognition system is based on deep learning, a form of ML model that is inspired by the structure of the brain. We have also used ML to create a real-time speech-to-speech translator.

Looking at these exciting applications, I realize that now is a magical time for machine learning. Many researchers and developers have been working steadily on these applications for years. Because of Moore’s Law and the internet, we now have enough labeled data and computation to enable ML to create remarkable software. I’m looking forward to providing our algorithms and tools to other developers, so that they can use their creativity to create their own remarkable applications.

In my next post, I will talk about how ML at Microsoft has evolved over the last 20 years. I’m looking forward to the opportunity to interact with our readers through this blog!

John Platt
Learn more about my research. Follow me on twitter.

↧

Real world use cases of the Microsoft Analytics Platform System

July 1, 2014, 9:00 am

≫ Next: ODataLib 6.5.0 Release

≪ Previous: What is Machine Learning?

This blog post was authored by: Murshed Zaman, AzureCAT PM and Sumin Mohanan, DS SDET

With the advent of SQL Server Parallel Data Warehouse (the MPP version of SQL Server) V2 AU1 (Appliance Update 1), PDW got a new name: the Analytics Platform System [Appliance] or APS. The name changed with the addition of Microsoft’s Windows distribution of Hadoop (HDInsight or HDI) and PDW sharing the same communication fabric in one appliance. Customers can buy an APS appliance with PDW or with PDW and HDI in configurable combinations.

Used in current versions of PDW, Polybase is a technology that allows PDW users to query HDFS data. SQL users can quickly get results from Hadoop data without learning Java or C#.

Features of Polybase include:

Schematization of Hadoop data in PDW as external tables
Querying Hadoop data
Querying Hadoop data and joining with PDW tables
High speed export and archival of PDW data into Hadoop
Creating persisted tables in PDW from Hadoop data

In V2AU1 Polybase improvements include:

Predicate push-down for queries in Hadoop as Map/Reduce jobs
Statistics on Hadoop data in PDW

Another new feature introduced in PDW V2AU1 is the capability to query data that resides in Microsoft Azure Storage Accounts. Just like HDFS data, PDW can place a schema on data in Microsoft Azure Storage Accounts and move data from PDW to Azure and back.

The APS with these new features and improvements has become a first-class citizen in analytics for any type of data. Any company that has Big Data requirements and wants a highly scale-out Data Warehouse appliance can use APS.

Here are four cases that illustrate how different industries are leveraging APS:

One: Retail brand vs. Name brand

Retail companies that use PDW who also want to harvest and curate data from their social analytics sites. This data provides insights into their products and understand the behaviors of the customers. Using APS, the company can offer the right promotion at the right time and to the right demographics. Data also allows the companies to find brand recommendation coming from a friend, relative or a trusted support group that can be much more effective than marketing literature alone. By monitoring and profiling social media, these companies can also gain a competitive advantage.

Today’s empowered shoppers want personalized offers that appeal to their emotional needs. Using social media retailers offer promotions that are tailored to individuals using real-time analytics. This process starts by ranking blogs, forums, Twitter feed and Facebook posts for predetermined KPIs revealed in these posts and conversations. Retail organizations analyze and use the data to profile shoppers to personalize future marketing campaigns. Measureable or sale data reveals the effectiveness of the campaign and the whole process starts again with the insight gained.

In this example, PDW houses the relational sale data and Hadoop houses the social emotions. PDW with built in HDI region gives the company the arsenal to analyze both data sources in a timely manner to be able to react and make changes.

Retail store APS diagram:

Two: Computer Component Manufacturing

Companies that generate massive amounts of electronic test data can get valuable insights from APS. Test data are usually a good candidate for Hadoop due to its key-value type (JSON or XML) structure.

One example in this space is a computer component manufacturer. Due to the volume, velocity and variety of these (ie: Sort/Class) data a conventional ETL process can be very resource expensive. Using APS, companies can gain insight from their data by putting the semi-structured (key-value pair) data into an HDI-Region and other complementary structured data sources (ie: Wafer Electrical Test) into PDW. With the Polybase query feature these two types of data can easily be combined and evaluated for success/failure rates.

Computer Component Manufacturing Diagram:

Three: Game Analytic Platform for online game vendors

The PDW with HDI regions can offer a complete solution for online game companies, to derive insights from their data. MMORPG’s (Massively Multiplayer Online Role Playing Games) are good examples where APS can deliver value. Game engines produce many transactional data (events like which avatar got killed in the current active game) and a lot of semi-structured data such as activity logs containing chat data and historical logs. PDW is well-suited to loading the transactional data in to the PDW workload and semi-structured data to the HDI region of APS. The data can then be used to derive insights such as:

Customer retention - Discovering when to give customers offers and incentives to keep them in the game
Improving game experience - Discovering where customers are spending more time in the game, and improving in-game experience
Detecting fraudulent gaming activities

Currently these companies deal with multiple solutions and products to achieve the goal. APS provides a single solution to power both their transactional and non-transactional analytics.

Four: Click stream analysis of product websites for targeted advertisement.

In the past, a relational database system was sufficient to satisfy the data requirements of a medium-scale production website. Ever-increasing competition and advancements in technology have changed the way in which websites interact with customers. Apart from storing data that customers explicitly provide the company, sites now record how customers interact with their website. As an example, when a registered user browses a particular car model, additional targeted advertisements and offers can be sent to the user.

This scenario can be captured using collected clickstream data and the Hadoop eco-system. APS acts as the complete solution to these companies by offering the PDW workload to store and analyze transactional data, combined with HDI region to derive insights from the click-stream data.

This solution also applies to Third party companies that specialize in targeted advertising campaigns for their clients.

While “Big Data” is a hot topic, we very often receive questions from customers about the actual use cases that apply to them and how they can derive new business value from “Big Data.” Hopefully these use cases highlight how various industries can truly leverage their data to mine insights that deliver business value in addition to showcasing how traditional data warehouse capabilities work together with Hadoop

Visit the Microsoft Analytics Platform System page to learn more.

↧

ODataLib 6.5.0 Release

July 3, 2014, 1:59 am

≫ Next: SQL Server chez les clients – Enrichir le décisionnel d’entreprise avec la BI Self-Service

≪ Previous: Real world use cases of the Microsoft Analytics Platform System

We are happy to announce that the ODL 6.5.0 is released and available on nuget along with the source code oncodeplex (please read the git history for the v6.5.0 code info and all previous version). Detailed release notes are listed below.

Bug Fix

Fix a bug for supporting “Core.OptimisticConcurrency” annotation.
Fix bugs for enum 1) ODataUriParser should support nullable Enum as a function parameter. 2) Dynamic enum property should have @odata.type annotation.

New Features

EdmLib & ODataLib now support TypeDefinition

A type definition defines a specialization of one of the primitive types. for details please refer to here.

EdmLib & ODataLib now support serializing and deserializing unsigned integers

A shared model that contains type defs for unsigned integers is defined. we will have a blog post on this website soon about how to use it in your service.

ODataLib now supports $count on collection of Primitive or complex type

$count is used to request only the number of items of a collection of entities or items of a collection-valued property.

Support Capabilities.ChangeTracking annotation

Services advertise their change-tracking capabilities by annotating entity sets with the Capabilities.ChangeTracking. see spec at here

OData Client for .Net now supports changing http method in BuildingRequest event handler

OData Client for .Net & ODataLib now support Windows Phone 8.1 project.

MISC

Rename client portable dll from “Microsoft.Data.Services.Client.Portable” to “Microsoft.Data.Services.Client”

Call to Action

You and your team are highly welcomed to try out this new version if you are interested in the new features and fixes above. For any feature request, issue or idea please feel free to reach out to us atodatafeedback@microsoft.com.

↧

SQL Server chez les clients – Enrichir le décisionnel d’entreprise avec la BI Self-Service

July 3, 2014, 7:03 am

≫ Next: Using ODataUriParser for OData V4

≪ Previous: ODataLib 6.5.0 Release

Les équipes métiers expriment régulièrement le besoin d’enrichir une application décisionnelle opérationnelle, avec des données externes (référentiels ou indicateurs complémentaires…) et cela de manière agile, sans recourir à une phase de développement en mode projet.

C’est aujourd’hui possible grâce aux outils de « Self-Service BI » de la suite Power BI de Microsoft, qui donnent la possibilité aux utilisateurs de construire leur propre application décisionnelle

...(read more)

↧

Using ODataUriParser for OData V4

July 4, 2014, 8:53 am

≫ Next: Conor speaking at SQL Saturday Dublin on Sept. 20, 2014

≪ Previous: SQL Server chez les clients – Enrichir le décisionnel d’entreprise avec la BI Self-Service

Background

This post is intended to guide you through the UriParser for OData V4, which is released within ODataLib V6.0 and later.

You may have already read the following posts about OData UriParser in ODataLib V5.x:

Some parts of the articles still apply to V4 UriParser, such as introduction for ODataPath and QueryNode hierarchy. In this post, we will deal with API changes and features newly introduced.

UriParser Overview

The main reference document for UriParser is the URL Conventions specification. The ODataUriParser class is its main implementation in ODataLib.

The ODataUriParser class has two main functionalities:

Parse resource path
Parse query options

We’ve also introduced the new ODataQueryOptionParser class in ODataLib 6.2+, in case you do not have the full resource path and only want to parse the query options only. The ODataQueryOptionParser shares the same API signature for parsing query options, you can find more information below.

What’s new?

If you are familiar with ODataLib V5.x UriParser, you may find the following new things in V6.x:

All static methods are removed, the UriParser only have instance methods;
Support alias parsing (see this article);
Support $count, $skip, $top, and $search (see below);

Using ODataUriParser

The use of ODataUriParser class is easy and straight forward, as we mentioned, we do not support static methods now, we will begin from creating an ODataUriParser instance.

ODataUriParser has only one constructor:

public ODataUriParser(IEdmModel model, Uri serviceRoot, Uri fullUri);

Parameters:

model is the Edm model the UriParser will refer to;
serviceRoot is the base Uri for the service, which could be a constant for certain service. Note that serviceRoot must be an absolute Uri;
fullUri is the full request Uri including query options. When it is an absolute Uri, it must be based on the serviceRoot, or it can be a relative Uri.

In the following demo we will use the model from OData V4 demo service , and create an ODataUriParser instance.

Parsing Resource Path

You can use the following API to parse resource path:

You don’t need to pass in resource path as parameter here, because the constructor has taken the full Uri.

The ODataPath holds the enumeration of path segments for resource path. All path segments are represented by classes derived from ODataPathSegment.

In our demo, the resource Path in the full Uri is Products(1), then the result ODataPath would contain two segments: one EntitySetSegment for EntitySet named Products, and the other KeySegment for key with integer value “1” .

Parsing Query Options

ODataUriParser supports parsing following query options: $select, $expand, $filter, $orderby, $search, $top, $skip, and $count.

For the first four, the parsing result is an instance of class XXXClause, which represents the query option as an Abstract Syntax Tree (with semantic information bound). Note that $select and $expand query options are merged together in one SelectExpandClause class. The later three all have primitive type value, and the parsing result is the corresponding primitive type wrapped by Nullable class.

For all query option parsing results, the Null value indicates the corresponding query option is not specified in the request URL.

Here is a demo for parsing the Uri with all kinds of query options (please notice that value of skip would be null as it is not specified in the request Uri) :

The data structure for SelectExpandClause, FilterClause, OrdeyByClause have already been presented in the two previous articles mentioned at the top of this post. Here I’d like to talk about the newly introduced SearchClause.

SearchClause contains tree representation of the $search query. The detailed rule of $search query option can be found here. In general, the search query string can contain search terms combined with logic keywords: AND, OR and NOT.

All search terms are represented by SearchTermNode, which is derived from SingleValueNode. SearchTermNode has one property named Text, which contains the original word or phrases.

SearchClause’s Expression property holds the tree structure for $search. If the $search contains single word, the Expression would be set to that SearchTermNode. But when $search is a combination of various term and logic keywords, the Expression would also contains nested BinaryOperatorNode and UnaryOperatorNode.

For example, if the query option $search has the value “a AND b”, the result SearchTermNode would have the following structure:

Using ODataQueryOption Parser

There may be some cases that you already know the query context information but does not have the full request Uri. The ODataUriParser does not seems to be available as it will always require the full Uri, then the user would have to fake one.

In ODataLib 6.2 we shipped a new Uri parser that targets at query options only, it requires the model and type information be provided through its constructor, then it could be used for query options parsing as same as ODataUriParser.

The constructor looks like this:

public ODataQueryOptionParser(IEdmModel model, IEdmType targetEdmType, IEdmNavigationSource targetNavigationSource, IDictionary queryOptions);

Parameters (here the target object indicates what resource path was addressing, see spec):

model is the model the UriParser will refer to;
targetEdmType is the type for the which query options apply to, it is the type of target object;
targetNavigationSource is the EntitySet or Singleton where the target comes from, it is usually the NavigationSource of the target object;
queryOptions is the dictionary containing the key-value pairs for query options.

Here is the demo for its usage, it is almost the same as the ODataUriParser:

↧

Conor speaking at SQL Saturday Dublin on Sept. 20, 2014

July 6, 2014, 6:53 am

≫ Next: SQL Server chez les clients – Intégration de données et Self-Service

≪ Previous: Using ODataUriParser for OData V4

I will be travelling to Ireland to speak at the SQL Saturday on Sept. 20th. I plan to give a talk on how the query processor works in SQL Server (including the new execution models used for in-memory OLTP and DW, which are separate from the traditional...(read more)

↧

SQL Server chez les clients – Intégration de données et Self-Service

July 7, 2014, 5:19 am

≫ Next: Visual Studio “14” CTP 2 and Entity Framework

≪ Previous: Conor speaking at SQL Saturday Dublin on Sept. 20, 2014

L'accès et le partage d’information démultiplié, les réseaux sociaux, les outils collaboratifs disponibles sur Internet ont fait évoluer les besoins des utilisateurs vis-à-vis de leurs données tout en démocratisant l'utilisation des outils informatiques.

Qu'en est-il des outils décisionnels ? Les utilisateurs veulent que leurs outils professionnels fonctionnent de la même façon que leurs outils personnels, avec la même facilité et les mêmes disponibilités.

Mais ces besoins utilisateurs en évolution constante doivent tout de même être contrôlés et validés par les équipes responsables du système d'information de l'entreprise.

C'est sur ce constat que Microsoft a bâti sa vision des systèmes d’information décisionnels et propose désormais des outils Self-Service de Business Intelligence (Self-Service BI).

...(read more)

↧

Visual Studio “14” CTP 2 and Entity Framework

July 8, 2014, 8:09 am

≫ Next: Twenty Years of Machine Learning at Microsoft

≪ Previous: SQL Server chez les clients – Intégration de données et Self-Service

Today we’re providing a second CTP of the next version of Visual Studio, to gather your early feedback. For more information on this release, see Visual Studio “14” CTPs. This post covers the places that Entity Framework is included in the release and some limitations to be aware of when using it.

The EF related information for CTP 2 is very similar to what we provided for the first CTP of Visual Studio “14”. In the next preview of Visual Studio “14” we will be updating the EF6.x components to the latest version. The builds of EF7 will also become more usable in future previews as we round out the implementation of the core pieces of the EF7 code base.

Entity Framework Tools

As with past versions of Visual Studio, the Entity Framework Tools are included in-the-box. These tools are capable of working with models created with all versions of Entity Framework up to and including EF6.x.

Visual Studio “14” CTP includes an older build of the EF6 tooling which does not include the bug fixes and improvements from the 6.1.0 and 6.1.1 releases. The next preview of Visual Studio “14” will be updated to version 6.1.1 of the tooling.

At this stage, there isn’t a version of the 6.1.0 or 6.1.1 tooling that can be installed on Visual Studio “14”.

Entity Framework 6 Runtime

An older build of the EF6 runtime is included in a number of places in Visual Studio “14” CTP. This build does not include the bug fixes and improvements from the 6.1.0 and 6.1.1 releases.

The runtime will be installed if you create a new model using the Entity Framework Tools in a project that does not already have the EF runtime installed.
The runtime is pre-installed in new ASP.NET projects, depending on the project template you select.

We recommend using NuGet to update to the latest version of the runtime. At the time of writing 6.1.1 was the latest stable release. For detailed information on how to upgrade, see Updating a Package in the NuGet documentation.

PM> Update-Package EntityFramework

The next preview of Visual Studio “14” will be updated to version 6.1.1 of the runtime.

Entity Framework 7

We recently blogged about our plans for Entity Framework 7. Visual Studio “14” CTP 2 includes an early preview of ASP.NET vNext, which in turn includes a very early build of EF7. The EF7 runtime is installed in new ASP vNext projects that are created.

As with the first CTP of Visual Studio “14”, this build of EF7 only implements very basic functionality and there are a number of limitations with the features that are implemented. Please bear in mind that this preview is designed to give you an idea of what the experience will be like and you will quickly hit limitations if you deviate from the code from the default project template.

For more information on what’s included in this build of EF7, see the release notes page on our GitHub project.

We’re making good progress on the EF7 code base, but it is still in the early stages of development. If you want to experiment with a build we would recommend visiting our GitHub wiki for information on using nightly builds. Just remember that there are lots of things that don’t work… seriously… we warned you :)!

↧

Twenty Years of Machine Learning at Microsoft

July 8, 2014, 9:00 am

≫ Next: SQL Server 2012 with SP2 Slipstream ISO images do not install SP2

≪ Previous: Visual Studio “14” CTP 2 and Entity Framework

This blog post is authored by John Platt, a Distinguished Scientist at Microsoft Research.

People may not realize it: Microsoft has more than twenty years of experience in creating machine learning systems and applying them to real problems. This experience is much longer than the recent buzz around Big Data and Deep Learning. It certainly gives us a good perspective on a variety of technologies and what it takes to actually deploy ML in production.

The story of ML at Microsoft started in 1992. We started working with Bayesian Networks, language modeling, and speech recognition. By 1993, Eric Horvitz, David Heckerman, and Jack Breese started the Decision Theory Group in Research and XD Huang started the Speech Recognition Group. In the 90s, we found that many problems, such as text categorization and email prioritization, were solvable through a combination of linear classification and Bayes networks. That work produced the first content-based spam detector and a number of other prototypes and products.

As we were working on solving specific problems for Microsoft products, we also wanted to get our tools directly into the hands of our customers. Making usable tools requires more than just clever algorithms: we need to consider the end-to-end user experience. We added predictive analytics to the Commerce Server product in order to provide recommendation service to our customers. We shipped the SQL Server Data Mining product in 2005, which allowed customers to build analytics on top of our SQL Server product.

As our algorithms became more sophisticated, we started solving tougher problems in fields related to ML, such as information retrieval, computer vision, and speech recognition. We blended the best ideas from ML and from these fields to make substantial forward progress. As I mentioned in my previous post, there are a number of such examples. Jamie Shotton, Antonio Criminisi, and others used decision forests to perform pixel-wise classification, both for human pose estimation and for medical imaging. Li Deng, Frank Seide, Dong Yu, and colleagues applied deep learning to speech recognition.

In addition to more sophisticated algorithms for existing problems, we have been exploring new frameworks for machine learning. The most common frameworks in ML are classification and regression. In these frameworks, ML learns a mapping from a vector of data to either a label (classification) or a value (regression). But, ML can do much more than produce labels or values. There’s a whole sub-field of ML called “structured output prediction”. An early example of this was “learning to rank”, where ML produces a ranked list of items (very useful for Bing, as I mentioned before). Another interesting framework is the construction of causal models, which we have used to model our advertising system. Yet another framework is generating programs directly from data (rather than through a model).

As ML researchers, we are super excited about Microsoft Azure ML. Azure ML will create models that can be deployed to the cloud, rather than being restricted to one particular data management platform (such as SQL). Creating cloud services with ML should reduce the friction of getting ML into specific applications. As researchers, we would love to capture all of our experience and algorithms into the Azure ML product, so that our customers can use their creativity to build ML-based products.

In future blog posts, we will describe some of our current ML research topics. We can also go into more detail about some of the technology mentioned, above. If you find a particular research topic interesting, please let us know and we will try to get guest blog posts written by the creator of the technology. Thanks for reading!

John Platt
Learn more about my research. Follow me on twitter.

↧

SQL Server 2012 with SP2 Slipstream ISO images do not install SP2

July 8, 2014, 1:08 pm

≫ Next: [Announcement] OData Client Code Generator 2.0.0 release

≪ Previous: Twenty Years of Machine Learning at Microsoft

Hi all, We have been informed that SQL Server 2012 with SP2 Slipstream ISO images do not install SP2. Unfortunately there is the same issue that we had with SP1 that was documented here . The same workaround applies. We are working on fixing...(read more)

↧

[Announcement] OData Client Code Generator 2.0.0 release

July 8, 2014, 8:41 pm

≫ Next: [Tutorial & Sample] Client Delayed Query

≪ Previous: SQL Server 2012 with SP2 Slipstream ISO images do not install SP2

We are happy to announce that the OData Client Code Generator is released and available on Visual Studio Gallery. In this release, we focused on lighting up more OData features and the usability of the generated APIs while preserving the most consistency with the former version. Following is the release notes:

Improvements of this OData Client Code Generator update

Usability

The exception message about the “MetadataDocumentUri” value not filled in before running the code generator is improved.
The exception message about failed to generated proxy of a service whose metadata document access needs authentication is improved.
The configuration of the code generation is now migrated into the .tt template file.

It increases the usability of the template in the following ways:

Before this improvement, users may forget toregenerate the code by saving the .tt template file or “run custom tools” after they’ve modified the configuration in the .odata.config file. Now, as the configurations are in the .tt file, whenever it is changed and saved, the code will always be regenerated.
The configuration can be included as C# codes inthe .tt file, which is more readable and referencable after having installed some extension for T4 templates:

The type of the generated class for singletons is redesigned.

It was formerly designed as a DataServiceQuery class. Which has the limitation that

It supports query options that shouldn’t be supported by singletons logically. (e.g. $filter, $orderby, and etc.
Executing the singleton (e.g. context.Boss.Execute()) returns an IEnumerable (e.g. IEnumerable ). It means that users can iterate through a singleton to get the actual single object contained, which is weird logically
The fact that the singleton class has to Execute()to get the type and it is a DataServiceQuery has the problem that it doesn’t support nested navigation calls (also called as “delayed query” since you can delay the execution of the query until you explicitly execute it) such as context.Boss.Wife.Company.Execute()as navigation properties are not implemented on DataServiceQuery but the entity type classes.

Now it is redesigned as a DataServiceQuerySingle class, which

Has its native implementation of OData query options supported by singletons
Doesn’t implements IEnumerable or IQueryable so that users may not iterate through it
Has a GetValue() method whose return type is the type of the singleton
Has all the navigation properties of the T type implemented on it so it can support delayed query.

With this redesign and a few new APIs introduced on DataServiceQuery , the generated client side proxy can better support complex queries and support them in a more usable way:

Scenario	Query	Client code in old version	Client code in new version
Get a boss singleton	GET http://host/service/Boss	Person boss = context.Boss.First();	Person boss = context.Boss.GetValue();
Get the parent of the boss of the company whose ID is 1	GET http://host/service/Companies(1)/Boss/Parent	Not possible	Person bossesParent = context.Company.ByKey(new Dictionary () {{"ID", 0}}).Boss.Parent.GetValue();

Functionality

Now the client side proxy can be generated referencing a metadata document which is stored in a local file.

This lights up the scenario of generating client side proxy for services whose metadata document access needs authentication. Upon such services, customers can access the metadata document using a web browser then download and store the metadata in a local file. The value of the “MetadataDocumentUri” variable in the configuration file can be set to a local path such as “File:///C:/Odata.edmx” or @"C:\Odata.edmx".

Now the code generator supports generating proxy according to metadata documents which reference external CSDL documents.
Code generation of some actions & functions defined in the model is supported.

Following kinds of operations are supported now: function imports, action imports, actions & functions bound to collection of entities, actions & functions bound to an entity type.

Configurability

A configuration is added to the configuration file “ODataClient.tt” enable the conversion from lower camel case property, entity and namespace names (e.g. “myName”) to upper camel case ones (e.g. “MyName”).

// This flag indicates whether to enable naming alias. The value must be set to true or false.

public const bool EnableNamingAlias = true;

If the “EnableNamingAlias” is set to true, all the names of the properties generated will have be in upper camel case. If it is set to false, the original case defined in the model will be preserved.

Bug fixes

Fixed the bug of generated client code missing System.Nullable<> for nullable EnumType.
Fixed the bug that some of the global namespace in the code generated are missing "global::" prefix which can cause a namespace collision.
Fixed the bug that the T4 templates adds a trailing slash to the end of the metadata document URI. This is actually the default address for "Metadata as a Service".

Features enabled by the OData core and client library updates

The code generator can now recognize the inheritance relationship between complex types in the metadata document and generate the complex type classes reflecting thisrelationship.
The client property tracking is supported by the generated proxy (see this blog post for details) to reduce the payload size for update requests.
Using enum type objects in query options & operation parameter is now supported.
The API of OData client for server-side paging support is improved.

Sample:

NorthwindEntities ctx = new NorthwindEntities(new Uri(@"http://services.odata.org/V4/Northwind/Northwind.svc/"));

var customers = ctx.Customers.GetAllPages(); // automatically get all pages of the Customers entity set

foreach (var customer in customers)

{

Console.WriteLine(customer.CustomerID);

}

The API of OData client for server-side paging support is improved.
New asynchronize API in .NET 4.0 format is supported.

↧

[Tutorial & Sample] Client Delayed Query

July 9, 2014, 12:09 am

≫ Next: New: ASP.NET Session State Provider for SQL Server In-Memory OLTP

≪ Previous: [Announcement] OData Client Code Generator 2.0.0 release

In OData Client 6.5.0, together with OData Client Code Generator 2.0.0, we have improved the user experience on the client side by introducing delayed query into it. This feature enables valid compositions as many as you want when building a query in client, getting all parts into one request URL and sending it out when you get it ready.

Currently, we support the generation of all operation imports and operations bound on single entity or a collection of entities in OData Client Code Generator 2.0.0, so as to invoke them directly in C# or VB .NET when writing client code to build queries. Bound functions can be used in query options such as “$filter” or “$orderby” too. Operations bound to other resources (e.g. primitive types, complex types, etc.) are not supported yet. Neither are unbound operations exclusive of operation imports.

Basically, we aligned with the previous code experience in OData client, but there’re still some minor changes have been made to differentiate a single entity and a collection of entities when building queries. Plus, some new features together with new code patterns will be introduced here.

Hints at the beginning:

1. All samples below are based on the service EDM model here.

2. You can read “How to use OData Client Code Generator to generate client-side proxy class” for generating client side proxy class.

Now, let’s start!

Query a Singleton or a Single Entity

As you know, we simply treated Singletons the same as EntitySets before, which means you cannot distinguish a Singleton from an EntitySet without looking them up in the metadata document. It has been optimized by using another type generated by T4 template to represent Singletons, while EntitySets stay unchanged.

An entity selected from a collection of entities is also represented by this new type when building queries. Use the “ByKey” method to get the specific entity by passing a dictionary as the argument into it to specify its key(s). The usages of Projection and Expand are the same as what they’re on EntitySet. To retrieve the item from the server after query building is finished, call “GetValue” to send the request out and get the result back.

> Query a Singleton:

> Query a person whose UserName is “russellwhyte”:

> Projection and Expand:

Query Navigation Properties

To get a navigation property, we can only use “Expand” on an entity before. This has extremely limited our query capability on the client side. Things have changed now. We can get only the navigation properties of an entity without “Expand”, and invoke operations on them conveniently.

> Query a navigation property on an entity:

> Query a navigation property on an entity for multiple levels:

Invoke actions

ActionImports and actions bound to an EntityType or a collection of entities are supported now. Action requests should be sent with the “POST” method, with all parameters (if any) serialized into the body. Actions cannot be further composed, so they’re always the termination of a query. Actions may or may not have return values. Please call “GetValue” for a single result and “Execute” or “foreach” to iterate directly for a collection results. If the action has no return values, please call “Execute” to send the request out.

> Invoke an ActionImport:

> Invoke Action bound on a collection of entities

> Invoke Action bound on an EntityType:

> Invoke bound Action on a retrieved entity:

Invoke functions

FunctionImports, functions bound to an EntityType or a collection of entities are supported now. The parameters (if any) of a function will be serialized into its request URI as inline parameters. Functions can be composable if they’re marked with the attribute “IsComposable” as “true” in the metadata, then they can be further composed in a query. If an uncomposable function is attached with more segments, an exception will be thrown. Functions must have return values, please call “GetValue” for single result and “Execute” or “foreach” to iterate directly for collection results.

> Invoke FunctionImport returning a single result with inline parameters:

> Invoke Function bound on a collection of entities

> Invoke Function bound on EntityType returning a collection result:

> Invoke bound Function on a retrieved entity:

> Invoke Function in query options:

Use “GetValue” to get the real type of a function return value under synchronous circumstances, then you can get access to its properties or whatever if you want. In Async environment, there’s no “GetValue” method provided, so please substitute it with “GetValueAsync().Result”. Don’t worry, neither “GetValue” nor “GetValueAsync().Result” will send the partial request out when they’re using inside LINQ query expression.

> Invoke Function with further composition:

Type Conversion

In OData, we can add a type segment to an entity resource in order to cast it into its derived type. If you want to cast an entity, there’re all “CastTo[DerivedType]” methods respectively generated on it to accomplish this. If it is a collection of entities and you want to get all of the derived type from it, “OfType” will be the best choice.

> Add type segment to a collection

> Add type segment to an entity

↧

New: ASP.NET Session State Provider for SQL Server In-Memory OLTP

July 9, 2014, 9:00 am

≫ Next: Recommendations Everywhere

≪ Previous: [Tutorial & Sample] Client Delayed Query

Microsoft SQL Server 2014 brings new performance and scalability gains by introducing In-Memory OLTP. In-Memory OLTP contains tables and indexes optimized for in memory. Transactions execute under lock-free algorithms to provide linear scalability and Transact-SQL stored procedures can be compiled in native machine code for maximum efficiency in processing.

Working with SQL Server customers on In-Memory OLTP engagements, a common pattern emerged around the desire for increased performance and scalability when using ASP.NET session state. Some early adopters modified their SQL Server objects to take advantage of In-Memory OLTP for ASP.NET session state, with great success. To learn more, read the bwin.party case study “Gaming site can scale to 250,000 requests per second and improve player experience”. To further enhance this scenario, we have created a new provider to make it easier for customers to take advantage of SQL Server In-Memory OLTP when using ASP.NET session state.

This ASP.NET session state provider is fully optimized for In-Memory OLTP by calling natively compiled Transact-SQL stored procedures and by creating all tables as memory-optimized. The functionality of the provider was tested both internally and by external customers. The results showed the implementation was able to provide some significant gains at scale levels which would have previously exhibited a bottleneck on the database.

NOTE: While some testing has been done before the release, we recommend executing your own testing and validation to understand how this implementation behaves in your specific environment.

Getting Started

Setting up the provider requires two steps, installing the provider into the ASP.NET application and creating the In-Memory OLTP database and object in Microsoft SQL Server 2014.

The provider and scripts can be accessed in two ways:

1. The package has been uploaded to NuGet: https://www.nuget.org/packages/Microsoft.Web.SessionState.SqlInMemory/

2. The source code is also accessible through Codeplex:

NuGet Installation

Download the ASP.NET Session State Provider for SQL Server In-Memory OLTP from the NuGet gallery by running the following command from the Visual Studio Package Manager Console:

PM> Install-Package Microsoft.Web.SessionState.SqlInMemory

More information about the NuGet package can be found here:

https://www.nuget.org/packages/Microsoft.Web.SessionState.SqlInMemory/

Installing the package will do the following things:

Add references to the ASP.NET Session State Provider assembly.

Add to the web.config file a customProvider equals to "SqlInMemoryProvider", where the connectionString attribute needs to be updated.



  
    
      
        
             type="Microsoft.Web.SessionState.SqlInMemoryProvider"
             connectionString="data source=sqlserver;initial catalog=ASPStateInMemory;User ID=user;Password=password;" />

Adds an ASPStateInMemory.sql file that includes the script for creating the SQL Server database configured to support In-Memory OLTP.

Setting up In-Memory OLTP Database and objects

Open the T-SQL script file "ASPStateInMemory.sql" and update the 'CREATE DATABASE' statement to replace the 'FILENAME' attributes to specify a path that will exist in your SQL Server machine where the memory-optimized filegroup should exist. For further considerations on placement of this filegroup see Books Online section Creating and Managing Storage for Memory-Optimized Objects

CREATE DATABASE [ASPStateInMemory]
ON PRIMARY (
  NAME = ASPStateInMemory, FILENAME = 'D:\SQL\data\ASPStateInMemory_data.mdf'
),
FILEGROUP ASPStateInMemory_xtp_fg CONTAINS MEMORY_OPTIMIZED_DATA (
  NAME = ASPStateInMemory_xtp, FILENAME = 'D:\SQL\data\ASPStateInMemory_xtp'
)
GO

After updating the 'FILENAME' attributes, run the entire script for creating the In-Memory tables and the natively compiled stored procedures.

Additionally, create a periodic task in SQL Server to run the stored procedure 'dbo.DeleteExpiredSessions'. This procedure removes the expired sessions and frees up the memory consumed.

NOTE: The memory-optimized tables are created with a durability of SCHEMA_ONLY to optimize for performance. If session data durability is required, then change the 'DURABILITY' attribute from 'SCHEMA_ONLY' to 'SCHEMA_AND_DATA'. More information can be found in Books Online sections Defining Durability for Memory-Optimized Objects and Durability for Memory-Optimized Tables.

Conclusion

SQL Server In-Memory OLTP has shown to greatly improve the performance of ASP.NET session state applications. This provider allows customers to optimize ASP.NET web farms to take advantage of SQL Server In-Memory OLTP using a packaged solution with ease.

For further considerations on session state with In-Memory OLTP, along with other solution patterns which have shown success with SQL Server In-Memory OLTP, please reference the whitepaper: In-Memory OLTP – Common Workload Patterns and Migration Considerations.

Download the Microsoft SQL Server 2014 Evaluation and see how in-memory processing built into SQL Server 2014 delivers breakthrough performance.

↧

Recommendations Everywhere

July 9, 2014, 9:00 am

≫ Next: End of Mainstream support for SQL Server 2008 and SQL Server 2008 R2

≪ Previous: New: ASP.NET Session State Provider for SQL Server In-Memory OLTP

This blog post is authored by Thore Graepel, Principal Researcher at Microsoft.

Good recommendations are needed everywhere. Whether you are looking for a movie you might enjoy watching or a book that you might enjoy reading or even suggestions for people with similar interests who you could connect with on Facebook or LinkedIn, automatic recommender systems are the solution.

Whereas such systems were previously available primarily to the largest online players, with the upcoming release of Microsoft Azure ML we will have a recommender system available to a much wider audience of individuals and businesses who can use it for the benefit of their own customers.

How Do Recommender Systems Work?

Typically, there are two types of entities involved in a recommender system (RS), let’s call them users and items. Users are the people to whom you would like to make recommendations. Items are the things you would like to recommend to them such as movies, books, web pages, recipes, or even other people.

Suppose we would like to recommend, say, a restaurant to a given user based on the 5-star ratings that this user and other users have provided for some of the restaurants in your universe. We can break down the recommendation task into two steps:

Predict, for each restaurant, how the user would rate it, e.g. on a 5-star rating scale.
From a list of eligible restaurants recommend those for which we predict the highest rating by that user.

But how can we predict how that particular user would rate all of those restaurants he has not actually rated? This is where machine learning (ML) comes into play.

How to Predict Ratings?

In order to build an ML model that can predict, for a given user/item combination, how the user would rate the item, we need to collect data of the form (userID, itemID, rating). You can think of this as a large matrix, users as rows, items as columns and entries as ratings.

This will be a sparse matrix (with many missing entries) because typical users will only rate a small subset of items. The Bayesian RS implemented in Azure ML takes this training data, trains a model, and essentially returns a function that predicts for a given user/item pair how the user would rate that item. These ratings are not restricted to 5-star ratings. Other signals such as purchase, clicks, or time-spent can be equally if not more informative for making good recommendations.

So how does this work? The RS learns an embedding of users and items into what we call a latent trait space (see image below). A User (blue dot) rates an Item (red dot) positively if their vectors are aligned with the item vector, and negatively if their vectors point in opposite directions. Similar users and similar items will be placed closely together in trait space, thus making it possible to infer ratings even for user/item combinations for which no ratings are available from the training data. While the image below shows a two-dimensional trait space for illustration purposes, we use 20 to 100 dimensions in our deployed systems. Sometimes, we can even find interpretable axes ("traits") in trait space. For example, below, the North-South trait could be "grown-ups" vs "kids", and the West-East trait could be "mainstream" vs "cult".

How About New Users or New Items?

One key problem for an RS is cold-start. New users may not have rated enough items, and new items may not have been rated by enough users to make good predictions. To mitigate this problem, the Azure ML RS makes it possible to represent users and items not just by their ID, but by a feature vector constructed from meta-data. For users, this may include any profile information such as age or geo-location, for items such as movies it may include information such as genre, actors, director, year of production etc. As a consequence, the system can generalize across users and items by making use of common attributes in the meta data.

Learn More

If you are curious about the mathematical underpinnings of recommender systems, take a look at the paper Matchbox: Large Scale Bayesian Recommendations.

If you are eager to build your very own recommender system pipeline, try your hand at doing so using Azure ML once it becomes available (very soon). Azure ML Studio, shown in the picture below, has a recommender system module and a powerful browser-based graphical user interface including drag/drop capabilities, making your task relatively easy.

In fact, the Azure ML recommender system combines two of the most powerful paradigms for predicting ratings – content-based filtering and collaborative filtering– and, by making this widely available, our hope is that it will result in a much broader use of automatic recommendation systems and in many more cool scenarios that will benefit customers everywhere.

Thore Graepel
Learn more about my research. Follow me on Twitter.

↧

End of Mainstream support for SQL Server 2008 and SQL Server 2008 R2

July 9, 2014, 10:19 am

≫ Next: Machine Learning for Industry: A Case Study

≪ Previous: Recommendations Everywhere

We would like to remind all customers that Mainstream Support for SQL Server 2008 and SQL Server 2008 R2 has ended on July 8, 2014. Microsoft is ending support for these products as part of our Support Lifecycle policy, found in http://support.microsoft...(read more)

↧

Machine Learning for Industry: A Case Study

July 11, 2014, 9:00 am

≫ Next: Pie in the Sky (July 11th, 2014)

≪ Previous: End of Mainstream support for SQL Server 2008 and SQL Server 2008 R2

This blog post is authored by Chris Burges, Principal Research Manager at Microsoft Research, Redmond.

Hi, I’m Chris Burges. Over my last 14 years at Microsoft, and my previous 14 at Bell Labs, I’ve spent much of my time dabbling with machine learning (ML), with some of that time spent on solving industrial strength problems. Since interest in ML, especially in industrial settings, has blossomed lately, it seems like a good time to think about the big picture of how ML works, both from a practical view and from an algorithmic one.

In 2004, Microsoft Research (MSR) and Microsoft’s Web Search team decided to see if we could jointly improve the relevance of our web search results. The system used at the time was called The Flying Dutchman. Over a period of several months we designed a system that not only improved relevance, but, in addition, proved much easier to experiment with: whereas The Flying Dutchman took several days and a cluster to produce a model (to rank web search results), a simple neural net ranker called RankNet was able to produce a ranking model in about an hour using just one machine.

What unfolded over the next few years is a fascinating story of the interplay between science, research, algorithm design and product engineering. In this first post, I’m hoping to give you a feel for that interplay, and in later posts, I’ll explain how the basic algorithms used today actually work, assuming no prior knowledge of ML. We have already touched on one of the keystones of progress: the ability to do rapid experimentation. If you have what you think is a good idea, ideally you’d like experimental evidence, one way or the other, immediately. Thus even if a model initially does not perform quite as well as what one already has, if it is much faster to train and test, overall progress can be much faster, and this alone will often enable the model to quickly surpass the current one in accuracy and speed.

Today, a family of models called Boosted Decision Trees (BDTs) are particularly popular. BDTs are flexible in that they can be used to solve different kinds of predictive tasks, for example:

Ranking, e.g. placing the most relevant web search results at the top of the list,
Classification, e.g. determining if a particular email is spam or not, and
Regression, e.g., predicting what price your house might sell for.

Flexibility is great, but how useful are BDTs, really? Logs collected on an ML service that is used internally within Microsoft show that, over the past year alone, there were over 670,000 training runs using BDTs throughout Microsoft. This number is inflated because a given experimenter will typically perform model selection (i.e. train multiple models, each with a different parameter setting, and using a hold out data set to pick the best model), but it gives the general picture. Is this preoccupation with BDTs a Microsoft proclivity, or do people elsewhere like them too? In 2010, Yahoo! organized a learning to rank challenge, one track of which was designed to see who had the best web search ranking algorithm. Over one thousand teams registered for the challenge. While it was gratifying that the Microsoft team won, the rankings were close, and for me the most interesting takeaway was that the top 5 systems all used ensembles of decision trees, and boosting, in one form or another (in fact our system was an ensemble of BDTs and neural nets). So, if you’re thinking of training a fixed model to solve a predictive task, it’s worth considering BDTs.

Let’s use web search ranking as our canonical example to explore a typical research / product cycle. The hardest challenges of doing research are asking the right question, and getting good validation of the ideas (in addition to the time-honored method of publication, which as a validation test can be quite noisy). Working on real problems that matter to millions of people is a pretty good way of getting help with both of these challenges.

When you issue a query to Bing, we will effectively scan all the documents in our index. A large number of candidate documents are weeded out by applying some very fast filters (e.g. we may not even consider documents that have no words in common with your query). This reduces the set of candidate documents to a manageable size. For each such candidate document, we generate several thousand features that indicate how relevant that document might be for your query. For example, one feature might be “does the document title contain any words in the query?” or, at a higher level, “does the document refer to an entity mentioned in the query?” The task of the ranking model is to take this list of features and map it to a single score that encodes the relevance of that document for that query. This, in combination with the initial filtering process, allows us to rank all documents on the web by their relevance to your query. We used to measure the quality of the search results using a single metric called NDCG(we now use several metrics to try to gauge user satisfaction). The NDCG value for a given query depends on the entire ranked list and it takes values between 0 and 1, where 1 indicates the best ranking achievable on a special, labeled set of data (which we’ll call D).

So, how did we get from RankNet to BDTs? RankNet, although a breakthrough at the time, is not well adapted to the task: in particular, it ignores the NDCG measure, and just tries to get the pairwise ordering of the documents correct. So if, for a given query, you had a pair of documents from D, one of which had been labeled a perfect match for the query, and the other terrible, RankNet would spend just as much effort trying to get the perfect placed above the terrible as it would a good above a not quite as good (I should add that these are not the actual labels we use!). The problem in creating a model that directly optimizes for NDCG is that NDCG is ill-behaved, mathematically; if you think of each document as having a score (assigned by your ranking model), such that the ranking is obtained by ordering the documents by their score, then the NDCG changes discontinuously as those scores change continuously. To address this problem we used the fact that, when you train a neural net, you don’t have to provide actual values of the function you’re optimizing, just the gradients (values that indicate how that function would change as the neural net’s output score changes). For the ranking task, you can think of these values as little arrows or forces, pulling each document up or down in the ranked list. We can model these little forces between a pair of documents as the change in NDCG you’d get by swapping the two documents (for the set D), then add up all the forces for each document for a given query, and then use these as gradients to train the neural net. Thus was born LambdaRank, which while still a neural net model, gave better relevance performance than RankNet. Later we extended this idea to boosted tree models with an algorithm called LambdaMART, to leverage some of the advantages that BDTs offer over neural nets, two of which are:

The ability to more naturally handle features whose ranges vary hugely from one feature to another, and
Faster training, and hence faster experimentation turnaround time.

Subsequently a team led by Ofer Dekel showed how to engineer BDTs so that training became approximately two orders of magnitude faster than for the neural nets, and also able to handle much larger datasets.

That, in a nutshell, is how we came to love BDTs. The overall process was a cycle of engineering and product needs driving the research, and the research opening new opportunities for the product. For two of the three steps (RankNet and BDTs), the main contribution was the ability to do faster experimentation with more data. Although I’ve focused here on the ranking story, it should be noted that there is a great deal more that goes into the quality and engineering of Bing than just the ranking algorithms, which are a small but vital part. In my next post, we’ll take a look at how BDTs actually work.

Chris Burges
Learn about my research.

↧

Pie in the Sky (July 11th, 2014)

July 11, 2014, 12:42 pm

≫ Next: How Azure ML Partners are Innovating for their Customers

≪ Previous: Machine Learning for Industry: A Case Study

Here's some links for your weekend reading.

Cloud

Microsoft, Google, and Docker collaboration: To support new open source projects on Azure.
Wolfram programming cloud: It's live.

Client\mobile

Become acquainted with SVG images: An introduction to SVG.
Using SVG stroke attributes: More SVG.
Server-side device detection: Figuring out what the client is, from the server.
Kouto Swiss: A CSS toolbox for Stylus

Node.js

next-update: A utility to test if a module's dependencies can be updated to the latest version.
jsonapitest: A JSON driven test runner for REST APIs.
current-processes: A library to get a snapshot of the currently running process.
Passwordless: Token based authentication for Express & Node.js.

Ruby

Rails 3.2.19, 4.0.7, and 4.1.3 released - Latest rails updates. Except then 4.0.8 and 4.1.4 were pushed out to fix a regression.

Misc.

brackets-regex-diagram: If you're using Brackets.io, this is a neat extension that will generate diagrams for your regular expressions.
Create a static Ghost blog: Buster is a static site generator for the Ghost blogging software.
GeoJson: A JSON format for encoding geographic data.
Some useful sublime packages: GitGutter and DocBlockr, where have you been all my life?
A first look at 0xDBE: No, my site didn't just throw an obscure error at you. 0xDBE is a new database management system for a wide range of SQL databases.
Grunt, Gulp, or npm: A comparison of build tools.
Some useful command-line/Bash tricks: I'm going to end up using !! a lot.
Don't be scared of functional programming: Seriously.

Enjoy!

- Larry

↧

How Azure ML Partners are Innovating for their Customers

July 14, 2014, 8:30 am

≫ Next: SQL Server Data Tools July Update

≪ Previous: Pie in the Sky (July 11th, 2014)

This blog post is authored by Joseph Sirosh, Corporate Vice President of Machine Learning at Microsoft.

Last week, Microsoft announced a preview release of Azure Machine Learning (Azure ML) which is now available for customers and partners to try. Azure ML is a fully managed service in the cloud that allows you to publish advanced analytic web services in minutes and build robust enterprise grade applications. Because ML is a new science to many customers and partners, I’m also excited to introduce, just in time for this year’s Worldwide Partner Conference, our new online Machine Learning University (MLU). MLU is a collection of online learning assets to help partners get up and running on Azure ML. It includes walkthroughs of the data science lifecycle from importing and cleaning data to building predictive models and deploying them as production web services. MLU gives partners access to in-person training events, regular product updates and other valuable Azure ML resources.

My earlier post talked about why Azure ML changes the game for building ML applications. This post describes how our partners are using it to rapidly build novel solutions for our customers.

Azure ML partners, with their wealth of specialized knowledge in analytics and vertical expertise are helping customers transform their mountains of data into actionable insights. Partners such as MAX451, Neal Analytics, OSISoft, and Versium are already deploying enterprise grade predictive analytics solutions for our customers with Azure ML. The breadth of solutions they are building is quite remarkable. Let me share four stories and quotes.

MAX451 helps Pier 1 Imports predict what customers might like to buy next

Operating over 1,000 stores, Pier 1 Imports aims to be their customers’ neighborhood store for furniture and home décor. They recently launched a multi-year, omni-channel strategy called “1 Pier 1” with a key goal being to understand their customers better and serve them with a more personalized experience across all interactions and touch points with the Pier 1 brand.

MAX451 has built an Azure ML solution to predict what a customer’s future product preferences might be and how they would like to purchase and receive these products. To quote Eric Hunter, Executive VP of Marketing at Pier 1 Imports:

Deepening our customer relationship is important to us. Gaining better insight into our data enables us to be there for her when, how and where she wants to shop, and with predictive analytics, we can invite her back to shop by featuring a product we know she’ll love. Whatever the medium may be, a more personalized message will likely encourage her to visit Pier 1 Imports again sooner… During this test phase, we have been able to improve the accuracy of predicting which product might speak to her next by more than 40 percent. Historically, translating data into great, usable information has been rather slow. Now we can reduce that time to a matter of days.

Sharon Leite, Executive Vice President of Sales and Customer Experience at Pier 1, had this to say:

Pier 1 Imports is helping prove Microsoft can take something as complex as advanced predictive data analytics and machine learning and make it accessible via the cloud. We are especially pleased that our analysts can focus on the results and not worry about the complex algorithms that are used to generate this data. We are extremely pleased with how quickly the team was able to get to meaningful results during this project. We enjoyed working with MAX451 – Pier 1 Imports is pleased with the results of working with a small and nimble partner.

And MAX451 CEO, Kristian Kimbro, had this to add:

At MAX451, we operate our entire business in the cloud, and our services and products are geared towards customers who are either already in or are migrating to the cloud… Microsoft’s machine learning products do not require an army of data scientist consultants to help customers. We are small, agile, and we move quickly, and we wanted to keep it that way – Microsoft’s machine learning products allow us to continue providing the same great services we always have, without straining our recruiters to find elusive, highly-skilled, highly-paid consultants…

Neal Analytics helps an ecommerce site optimize their marketing spend

Neal Analytics has built an Azure ML solution that is helping a large ecommerce site optimize their marketing spend on search terms, to drive traffic to their site. Search companies use auctions to rank ads on different search terms, balancing bids with content quality. The solution developed by Neal Analytics allows this customer to predict how much of a bid increase they need to spend in an auction in order to move up to the position they want for a given search term. Since competitors don’t tend to change their bids on search terms regularly, having this sort of timely, in-the-moment response to bid strategy is giving this customer a competitive advantage.

Neal Analytics had to build a predictive modeling solution for optimizing the bids for a large set of low frequency key words. The solution had to be easy to deploy and maintain. They wanted to avoid the need to stand up a net-new R/Linux computing stack to handle the volume. They wanted to enable rapid turns in deploying, testing, and refining their model to stay current with trends. Azure ML was a good fit to allow their data scientists to focus on their job and not be distracted by the complexity of setting up a big data computing infrastructure. Neal Analytics CEO and President Dylan Dias had this to say:

…because Azure ML is built on Azure, we enjoy the scalability that is seamlessly built in. Speed. Accuracy. TCO – Azure ML trumps other options out there. The learning curve with Azure ML is the shortest. It’s also much easier to drive adoption because of the short time-to-operationalize cycle. I am able to scale my precious data science talent. Relatively inexperienced analysts are now effective in their jobs. Azure ML helps our data science practice to improve time-to-insight and time-to-action metrics significantly (2-4 times quicker). We can do more with less, which results in happier clients.

OSIsoft helps Carnegie Mellon University conserve energy

The Center for Building Performance and Diagnostics at Carnegie Mellon University develops integrated hardware and software solutions to improve the efficiency of buildings on the CMU campus while achieving higher occupant comfort. Over the past decade, the center has conducted thousands of field surveys and measurements with a view to identify critical factors that affect occupant satisfaction. External factors such as weather forecasts too help predict cooling or heating energy consumption, of course. Using all this data, the center wanted to create a system to increase the overall energy performance of their buildings. ML was viewed as a critical component of any solution.

The center worked with OSIsoft, first to collect the real time data mentioned above using the OSIsoft PI System and then to develop a system to predict energy consumption across buildings, detect faults, take actions to mitigate issues in real-time and deliver cost savings. CMU has seen up to 30 percent energy reduction in some buildings after this system was deployed. Bertrand Lasternas, a CMU Researcher working on this project, had this to say:

A web-based, platform independent, machine learning solution was extremely appealing to us… The ease of implementation makes machine learning accessible to a larger number of investigators with various backgrounds, even non-data scientists. The Azure ML solution provides comparable accuracy with a more user-friendly set up and better integration with existing systems. A RESTful API is key to a seamless and successful integration... Data handling is the biggest advantage as a seamless stream can be set up very quickly and be integrated into existing solutions.

Gregg Le Blanc is Director of Research and Innovation at OSIsoft and responsible for evaluating new technologies. He has evaluated several ML technologies and here’s what he had to share:

We found Azure ML has the right balance of readiness and capabilities. While our infrastructure has long enabled real-time operation intelligence, the holy-grail is to predict issues before they are seen in real-time sensor-data. Using Azure ML we are investigating the ability to predict the implications of different actions based on acquired data in the PI System and store the data within the PI System for exploratory investigations. The result will be the ability for our customers to predict more and test less, enabling our customers to find the optimal balance for delivering operational excellence in the shortest possible time.

Versium helps a major retailer more accurately predict gift card fraud

Versium operates a predictive analytics scoring service called LifeData™. Pulling together over 400 billion real-life attributes across disparate sets of data such as purchase interests, social behavior, demographic data and financial information, Versium creates unique insights into customer behavior and helps companies leverage these insights in their promotion and marketing campaigns.

Versium is working with a major retail customer to help them detect fraudulent purchases of gift cards. This retailer already has an existing rules-based system to detect such fraud, but it generated many false positives (i.e. erroneous prediction of fraud). Minimizing such errors while stopping fraud was an important success criteria for this customer. Versium was able to quickly put together a predictive modeling solution on Azure ML, which, in a test run, showed that only 6 percent of 1000 transactions that had been denied by the old rules-based system were actually fraudulent – numbers that translate into much higher customer satisfaction, higher revenue and considerable value for this retailer. Here’s what Chris Matty, CEO of Versium, had to say:

Main advantages [of Azure ML] are in being able to interactively visualize the whole machine learning process, data and metrics of the model, being able to publish a web service quickly after the model is built. From my perspective, the solutions we deploy are very high value and mission critical – e.g. fraud prevention. So accuracy, speed and security are critical and I see all of these as value points in the technology. We deploy many scores and being able to build, tune and validate a model within days is a strong value benefit. Leveraging the Azure ML platform enables us to create and deploy a predictive score that uses Versium's proprietary LifeData™, in combination with our partners' enterprise CRM, marketing, or other internal data elements in a matter of days.

In conclusion

Often, the successful application of machine learning requires not only great tools but deep expertise in a domain, painstaking work to acquire and understand the client’s data, and experience in integrating with the client’s software solutions. Our partners bring the much needed expertise spanning a variety of industries. They share our passion for helping customers transform data into actionable business insights and are blazing the trail on cloud hosted ML solutions. Boundless opportunities await.

Joseph

Follow me on Twitter.

↧

SQL Server Data Tools July Update

July 14, 2014, 4:38 pm

≫ Next: MSBuild support for Schema Compare is available

≪ Previous: How Azure ML Partners are Innovating for their Customers

We’d like to announce the availability of the latest July 2014 release of SSDT. This update is now available for Visual Studio 2012 and 2013. For Visual Studio 2012 use the “SQL –> Check for Updates” tool inside Visual Studio or download via the download link below. For Visual Studio 2013 download check the Visual Studio update channel (Tools –> Extensions and Updates –> Updates) for this update. The update for VS 2013 is still being propagated, so it may not show until later this evening.

Get it here: http://msdn.microsoft.com/en-us/data/hh297027

Contact Us

If you have any questions or feedback, please visit our forum or Microsoft Connect page.
We look forward to hearing from you.

What’s New

The July 2014 update includes many bug fixes along with the following enhancements:

Schema Compare update
- Added MSBuild support for Schema Compare with text and XML output. A blog post will be published this week with additional information about this change.
Improved Windows Azure SQL Database node in the Server Explorer
- Added Token-based authentication using a Microsoft account (MSA) or organizational account (OrgId)
- Added supported for VS2012
Improved PDW support

PDW tooling is now part of the Microsoft Visual Studio Express 2013 for Windows Desktop SKU. This requires the VS 2013 Update 2 or later to be installed
Support for PDW appliance updates in both VS2012 and VS2013

↧