Quantcast
Channel: Data Platform
Viewing all 808 articles
Browse latest View live

Six Benefits to Planning for SQL Server 2005 and Windows Server 2003 End of Support Now

$
0
0

As the end of 2014 nears, now is the perfect time to review IT infrastructure plans for the coming year.  If you haven’t made supportability a key initiative for 2015, there are some important dates that you should know about:

After the end of extended support no more security updates will be made available for these products.  Staying ahead of these support dates will help achieve regulatory compliance and mitigate potential future security risks. That means SQL Server 2005 users, especially those running databases on Windows Server 2003, should make upgrading the data platform an IT priority. 

Security isn’t the only reason to think about upgrading. Here are six benefits to upgrading and migrating your SQL Server 2005 databases before the end of extended support:

  1. Maintain compliance – As encryption technologies age, it will become harder to prove compliance with the latest regulations such as the upcoming PCI DSS 3.0. Protect your data and stay on top of regulatory compliance and internal security audits by running an upgraded version of SQL Server.
  2. Achieve breakthrough performance– Per industry benchmarks, SQL Server 2014 delivers 13x performance gains relative to SQL Server 2005 and 5.5x performance gains over SQL Server 2008.  Customers using SQL Server 2014 can further accelerate mission critical applications with up to 30x transaction performance gains with our new in-memory OLTP engine and accelerate queries up to 100x with our in-memory columnstore. 
  3. Virtualize and consolidate with Windows Server– Scale up on-premises or scale-out via private cloud with Windows Server 2012 R2. Reduce costs by consolidating more database workloads on fewer servers, and increase agility using the same virtualization platform on-premises and in the cloud.
  4. Reduce TCO and increase availability with Microsoft AzureAzure Virtual Machines can help you reduce the total cost of ownership of deployment, management, and maintenance of your enterprise database applications. And, it’s easier than ever to upgrade your applications and achieve high availability in the cloud using pre-configured templates in Azure.
  5. Use our easy on-ramp to cloud for web applications– The new preview of Microsoft Azure SQL Database announced last week has enhanced compatibility with SQL Server that makes it easier than ever to migrate from SQL Server 2005 to Microsoft Azure SQL Database. Microsoft’s enterprise-strength cloud brings global scale and near zero maintenance to database-as-a-service, and enables you to scale out your application on demand.
  6. Get more from your data platform investments - Upgrading and migrating your databases doesn’t have to be painful or expensive. A Forrester Total Economic ImpactTM of Microsoft SQL Server study found a payback period of just 9.5 months for moving to SQL Server 2012 or 2014.

Here are some additional resources to help with your upgrade or migration:


Relational Data Warehouse + Big Data Analytics: Analytics Platform System (APS) Appliance Update 3

$
0
0

This blog post was authored by: Matt Usher, Senior PM on the Microsoft Analytics Platform System (APS) team

Microsoft is happy to announce the release of the Analytics Platform System (APS) Appliance Update (AU) 3. APS is Microsoft’s big data in a box appliance for serving the needs of relational data warehouses at massive scale. With this release, the APS appliance supports new scenarios for utilizing Power BI modeling, visualization, and collaboration tools over on premise data sets. In addition, this release extends the PolyBase to allow customers to utilize the HDFS infrastructure in Hadoop for ORC files and directory modeling to more easily integrate non-relational data into their data insights.

The AU3 release includes:

  • PolyBase recursive Directory Transversal and ORC file format support
  • Integrated Data Management Gateway enables query from Power BI to on premise APS
  • TSQL compatibility improvements to reduce migration friction from SQL Server SMP
  • Replatformed to Windows Server 2012 R2 and SQL Server 2014

PolyBase Directory Transversal and ORC File Support

PolyBase is an integrated technology that allows customers to utilize the skillset that they have developed in TSQL for querying and managing data in Hadoop platforms. With the AU3 release, the APS team has augmented this technology with the ability to define an external table that targets a directory structure as a whole. This new ability unlocks a whole new set of scenarios for customers to utilize their existing investments in Hadoop as well as APS to provide greater insight into all of the data collected within their data systems. In addition, AU3 introduces full support for the Optimized Row Column (ORC) file format – a common storage mechanism for files within Hadoop.

As an example of this new capability, let’s examine a customer that is using APS to host inventory and Point of Sale (POS) data in an APS appliance while storing the web logs from their ecommerce site in a Hadoop path structure. With AU3, the customer can simply maintain a structure for their logs in Hadoop in a structure that is easy to construct such as year/month/date/server/log for simple storage and recovery within Hadoop that can then be exposed as a single table to analysts and data scientists for insights.

In this example, let’s assume that each of the Serverxx folders contains the log file for that server on that particular day. In order to surface the entire structure, we can construct an external table using the following definition:

CREATE EXTERNAL TABLE [dbo].[WebLogs]
(
	[Date] DATETIME NULL,
	[Uri] NVARCHAR(256) NULL,
	[Server] NVARCHAR(256) NULL,
	[Referrer] NVARCHAR(256) NULL
)
WITH
(
	LOCATION='//Logs/',
	DATA_SOURCE = Azure_DS,
	FILE_FORMAT = LogFileFormat,
	REJECT_TYPE = VALUE,
	REJECT_VALUE = 100
);

By setting the LOCATION targeted at the //Logs/ folder, the external table will pull data from all folders and files within the directory structure. In this case, a simple select of the data will return data from only the last 10 entries regardless of the log file that contains the data:

SELECT TOP 5
	*
FROM
	[dbo].[WebLogs]
ORDER BY
	[Date]

The results are:

Note: PolyBase, like Hadoop, will not return results from hidden folders or any file that begins with an underscore (_) or period(.).

Integrated Data Management Gateway

With the integration of the Microsoft Data Management Gateway into APS, customers now have a scale-out compute gateway for Azure cloud services to more effectively query sophisticated sets of on-premises data.  Power BI users can leverage PolyBase in APS to perform more complicated mash-ups of results from on-premises unstructured data sets in Hadoop distributions. By exposing the data from the APS Appliance as an OData feed, Power BI is able to easily and quickly consume the data for display to end users.

For more details, please look for an upcoming blog post on the Integrated Data Management Gateway.

TSQL Compatibility improvements

The AU3 release incorporates a set of TSQL improvements targeted at richer language support to improve the types of queries and procedures that can be written for APS. For AU3, the primary focus was on implementing full error handling within TSQL to allow customers to port existing applications to APS with minimal code change and to introduce full error handling to existing APS customers. Released in AU3 are the following keywords and constructs for handling errors:

In addition to the error handling components, the AU3 release also includes support for the XACT_STATE scalar function that is used to indicate the current running transaction state of a user request.

Replatformed to Windows Server 2012 R2 and SQL Server 2014

The AU3 release also marks the upgrade of the core fabric of the APS appliance to Windows Server 2012 R2 and SQL Server 2014. With the upgrade to the latest versions of Microsoft’s flagship server operating system and core relational database engine, the APS appliance takes advantage of the improved networking, storage and query execution components of these products. For example, the APS appliance now utilizes a virtualized Active Directory infrastructure which helps to reduce cost and increase domain reliability within the appliance helping to make APS the price/performance leader in the big data appliance space.

APS on the Web

To learn more about the Microsoft Analytics Platform System, please visit us on the web at http://www.microsoft.com/aps

Azure Data Factory Now Integrates with Azure ML!

$
0
0

An update to Azure Data Factory (ADF) now integrates this service with Azure Machine Learning, allowing you to run finished Azure ML models from within ADF pipelines.

Click on this link or the image below for more details on how to take advantage of this feature. You can also visit the Azure Data Factory GitHub repository where there’s an end to end Twitter analytics sample that takes advantage of this new integration capability.

ML Blog Team

Results are Beautiful: 3 Best Practices for Big Data in Healthcare

$
0
0

When you put big data to work, results can be beautiful. Especially when those results are as impactful as saving lives. Here are four best practice examples of how big data is being used in healthcare to improve, and often save, lives.

Aerocrine improves asthma care with near-real-time data

Millions of asthma sufferers worldwide depend on Aerocrine monitoring devices to diagnose and treat their disease effectively. But those devices are sensitive to small changes in ambient environment. That’s why Aerocrine is using a cloud analytics solution to boost reliability. Read more.

Virginia Tech advances DNA sequencing with cloud big data solution

DNA sequencing analysis is a form of life sciences research that has the potential to lead to a wide range of medical and pharmaceutical breakthroughs. However, this type of analysis requires supercomputing resources and Big Data storage that many researchers lack. Working through a grant provided by the National Science Foundation in partnership with Microsoft, a team of computer scientists at Virginia Tech addressed this challenge by developing an on-demand, cloud-computing model using the Windows Azure HDInsight Service. By moving to an on-demand cloud computing model, researchers will now have easier, more cost-effective access to DNA sequencing tools and resources, which could lead to even faster, more exciting advancements in medical research. Read more.

The Grameen Foundation expands global humanitarian efforts with cloud BI

Global nonprofit Grameen Foundation is dedicated to helping as many impoverished people as possible, which means continually improving the way Grameen works. To do so, it needed an ongoing sense of its programs’ performance. Grameen and Microsoft brought people and technology together to create a BI solution that helps program managers and financial staff: glean insights in minutes, not hours; expand services to more people; and make the best use of the foundation’s funding. Read more.

Ascribe transforms healthcare with faster access to information

Ascribe, a leading provider of IT solutions for the healthcare industry, wanted to help clinicians identify trends and improve services by supplying faster access to information. However, exploding volumes of structured and unstructured data hindered insight. To solve the problem, Ascribe designed a hybrid-cloud solution with built-in business intelligence (BI) tools based on Microsoft SQL Server 2012 and Windows Azure. Now, clinicians can respond faster with self-service BI tools. Read more.

Learn more about Microsoft’s big data solutions

Cumulative Update #5 for SQL Server 2014 RTM

$
0
0
Dear Customers, The 5 th cumulative update release for SQL Server 2014 RTM is now available for download at the Microsoft Support site. To learn more about the release or servicing model, please visit: CU#5 KB Article: http://support.microsoft...(read more)

SQL Server Database Tooling Preview Release for the latest Azure SQL Database Update V12 (preview)

$
0
0

We are excited to announce the release of the Preview of Microsoft® SQL Server Database Tooling in Visual Studio for the latest Azure SQL Database Update V12 (preview) and Microsoft® Data-Tier Application Framework Preview for the latest Azure SQL Database Update V12 (preview). Note that this preview is only available for English versions of Visual Studio 2013.As a pre-release version, this preview may not work the way a final version of the software will and does not support upgrade. Do not use this preview software in a live operating environmentFor more information, refer to the License Terms installed by this preview software. 

 

Get it 

Microsoft® SQL Server Database Tooling Preview for the latest Azure SQL Database Update V12 (preview) at here 

Microsoft® Data-Tier Application Framework Preview for the latest Azure SQL Database Update V12 (preview) at here 

 

What’s New 

This update adds support to SQL Server Database Tooling in Visual Studio for the latest Microsoft Azure SQL Database Update V12 (Preview): 

  • Improved reliability when targeting Basic and Standard Performance Tiers in Microsoft Azure SQL Databases. 

  • SQL Server Object Explorer supports creating, editing and browsing all database objects located in your Azure SQL Database, including support for the latest features added in Azure SQL Database Update V12 (Preview).  

  • SQL Server Database Project support for targeting Azure SQL Database Update V12 (preview). The project system supports build-time validation, compilation, refactoring, and support for incremental deployment to SQL Server databases. Now all the great features added in Azure SQL Database Update V12 can be modeled in your database project.

Image

  • Ability to migrate existing on-premise databases to Azure SQL Database including to databases supporting the latest Azure SQL Database Update V12 (preview) features, and develop and maintain databases across both on-premise and cloud deployments. 

  • Schema Compare now has improved support for comparing databases, projects and dacpacs across different SQL Server platforms.  Script/update operations are deployable to a different target platform if all source objects are compatible with the target platform.  Comparison between on-premises and Azure SQL Databases is now allowed by default.   

  • Data Compare support for Azure SQL Database Update V12 (preview). Please note that using Data Compare against Azure SQL Databases will incur bandwidth charges, and as such, is not recommended for use with large databases. 

  • Support for the latest Azure SQL Database Update V12 (preview) when using the SQL Server Data-Tier Application Framework APIs. 

  Enhancements include: 

  • Improved working experience against Azure SQL Database Basic and Standard Editions with increased default CommandTimeout value. 

  • Ability to specify CommandTimeout value in SQLPackage.exe. Note that for deployments to Azure SQL Databases, if this value is lower than the new default timeout for Azure SQL Databases then the default value will be used instead. 

  • Ability to specify Azure SQL Database version during export. 

  • Ability to specify Azure SQL Database edition, service objective and maximum size for publish and import action 

  • You will need to apply SQL Server 2014 CU5 available here to be able to use the service objective and edition support in SQLPackage.exe.  

 

Installation Instructions 

This update can only be applied to an existing installation of Visual Studio 2013 Professional, Premium, Ultimate, Express for Web, or Express for Windows Desktop.  

  • Close any open instances of Visual Studio 2013 before installation 

  • Launch SSDTSetup.exe by either  

  • Downloading the software from Download Center and starting the SSDTSetup.exe file from your download location, or 

  • Running the SSDTSetup.exe directly from download center 

  • Follow the installation wizard to install 

  • If Visual Studio was open during installation, it must be restarted or errors will occur in the SQL Server database features 

 

An administrative install can be created by running  

SSDTSetup.exe /layout  
Where is the location you wish to create the administrative install point. 

 

Contact Us 

If you have any questions or feedback, please visit our forum or Microsoft Connect page. We look forward to hearing from you. 

 

 

 

[Announcement] WCF Data Services 5.6.3 RTM Tools Installer Release

$
0
0

We are happy to announce the release of WCF Data Services tooling version 5.6.3. It enables “Add Service Reference” to work with .NET 4.5.2 for consuming OData V1-3 services. 

Previously, when you try to “Add Service Reference” for OData V1-3 service, in VS2012 or VS2013 with .Net 4.5.2, you will get the following error window with the message: “The custom tool ‘DataServicesCoreClientGenerator’ failed. Data service client code-generation failed: The element ‘DataService’ has an attribute ‘DataServiceVersion’ with an unrecognized version ‘3.0’...”

 

We have fixed this issue in the WCF data service tooling 5.6.3. You can download from Here and install it. Then, “Add Service Reference” with OData V1-3 service in VS2012 and VS2013 should work with .Net 4.5.2 and future version 4.5.X. 

SQL Server 2014 Management Studio - updated support for the latest Azure SQL Database Update V12 (preview)

$
0
0
We are excited to announce enhanced SQL Server 2014 Management Studio (SSMS) support for Azure SQL database including the latest SQL Database Update V12 (preview). Bringing this functionality into SSMS provides an easy way to discover and better leverage...(read more)

Data Science Perspectives: Q&A with Microsoft Data Scientists Val Fontama and Wee Hyong Tok

$
0
0

You can’t read the tech press without seeing news of exciting advancements or opportunities in data science and advanced analytics. We sat down with two of our own Microsoft Data Scientists to learn more about their role in the field, some of the real-world successes they’ve seen, and get their perspective on today’s opportunities in these evolving areas of data analytics.

If you want to learn more about predictive analytics in the cloud or hear more from Val and Wee Hyong, check out their new book, Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes.

First, tell us about your roles at Microsoft?

 [Val] Principal Data Scientist in the Data and Decision Sciences Group (DDSG) at Microsoft

 [Wee Hyong] Senior Program Manager, Azure Data Factory team at Microsoft

 And how did you get here? What’s your background in data science?

[Val] I started in data science over 20 years ago when I did a PhD in Artificial Intelligence. I used Artificial Neural Networks to solve challenging engineering problems, such as the measurement of fluid velocities and heat transfer. After my PhD, I applied data mining in the environmental science and credit industry: I did a year’s post-doctoral fellowship before joining Equifax as a New Technology Consultant in their London office. There, I pioneered the application of data mining to risk assessment and marketing in the consumer credit industry. I hand coded over ten machine learning algorithms, including neural networks, genetic algorithms, and Bayesian belief networks in C++ and applied them to fraud detection, predicting risk of default, and customer segmentation.    

[Wee Hyong] I’ve worked on database systems for over 10 years, from academia to industry.  I joined Microsoft after I completed my PhD in Data Streaming Systems. When I started, I worked on shaping the SSIS server from concept to release in SQL Server 2012. I have been super passionate about data science before joining Microsoft. Prior to joining Microsoft, I wrote code on integrating association rule mining into a relational database management system, which allows users to combine association rule mining queries with SQL queries. I was a SQL Server Most Valuable Professional (MVP), where I was running data mining boot camps for IT professionals in Southeast Asia, and showed how to transform raw data into insights using data mining capabilities in Analysis Services.

What are the common challenges you see with people, companies, or other organizations who are building out their data science skills and practices?

[Val] The first challenge is finding the right talent. Many of the executives we talk to are keen to form their own data science teams but may not know where to start. First, they are not clear what skills to hire – should they hire PhDs in math, statistics, computer science or other? Should the data scientist also have strong programming skills? If so, in what programming languages? What domain knowledge is required? We have learned that data science is a team sport, because it spans so many disciplines including math, statistics, computer science, etc. Hence it is hard to find all the requisite skills in a single person. So you need to hire people with complementary skills across these disciplines to build a complete team.

The next challenge arises once there is a data science team in place – what’s the best way to organize this team? Should the team be centralized or decentralized? Where should it sit relative to the BI team? Should data scientists be part of the BI team or separate? In our experience at Microsoft, we recommend having a hybrid model with a centralized team of data scientists, plus additional data scientists embedded in the business units. Through the embedded data scientists, the team can build good domain knowledge in specific lines of business. In addition, the central team allows them to share knowledge and best practices easily. Our experience also shows that it is better to have the data science team separate from the BI team. The BI team can focus on descriptive and diagnostic analysis, while the data science team focuses on predictive and prescriptive analysis. Together they will span the full continuum of analytics.

The last major challenge I often hear about is the actual practice of deploying models in production. Once a model is built, it takes time and effort to deploy it in production. Today many organizations rewrite the models to run on their production environments. We’ve found success using Azure Machine Learning, as it simplifies this process significantly and allows you to deploy models to run as web services that can be invoked from any device.

[Wee Hyong] I also hear about challenges in identifying tools and resource to help build these data science skills. There are a significant number of online and printed resources that provide a wide spectrum of data science topics – from theoretical foundations for machine learning, to practical applications of machine learning. One of the challenges is trying to navigate amongst the sea of resources, and selecting the right resources that can be used to help them begin.

Another challenge I have seen often is identifying and figuring out the right set of tools that can be used to model the predictive analytics scenario. Once they have figured out the right set of tools to use, it is equally important for people/companies to be able to easily operationalize the predictive analytics solutions that they have built to create new value for their organization.

What is your favorite data science success story?

[Val] My two favorite projects are the predictive analytics projects for ThyssenKrupp and Pier 1 Imports. I’ll speak today about the Pier 1 project. Last spring my team worked with Pier 1 Imports and their partner, MAX451, to improve cross-selling and upselling with predictive analytics. We built models that predict the next logical product category once a customer makes a purchase. Based on Azure Machine Learning, this solution will lead to a much better experience for Pier 1 customers.

[Wee Hyong] One of my favorite data science success story is how OSIsoft collaborated with the Carnegie Mellon University (CMU) Center for Building Performance and Diagnostics to build an end-to-end solution that addresses several predictive analytics scenarios. With predictive analytics, they were able to solve many of their business challenges ranging from predicting energy consumption in different buildings to fault detection. The team was able to effectively operationalize the machine learning models that are built using Azure Machine Learning, which led to better energy utilization in the buildings at CMU.

What advice would you give to developers looking to grow their data science skills?
[Val] I would highly recommend learning multiple subjects: statistics, machine learning, and data visualization. Statistics is a critical skill for data scientists that offers a good grounding in correct data analysis and interpretation. With good statistical skills we learn best practices that help us avoid pitfalls and wrong interpretation of data. This is critical because it is too easy to unwittingly draw the wrong conclusions from data. Statistics provides the tools to avoid this. Machine learning is a critical data science skill that offers great techniques and algorithms for data pre-processing and modeling. And last, data visualization is a very important way to share the results of analysis. A good picture is worth a thousand words – the right chart can help to translate the results of complex modeling into your stakeholder’s language. So it is an important skill for a budding data scientist.

[Wee Hyong] Be obsessed with data, and acquire a good understanding of the problems that can be solved by the different algorithms in the data science toolbox. It is a good exercise to jumpstart by modeling a business problem in your organization where predictive analytics can help to create value. You might not get it right in the first try, but it’s OK. Keep iterating and figuring out how you can improve the quality of the model. Over time, you will see that these early experiences help build up your data science skills.

Besides your own book, what else are you reading to help sharpen your data science skills?

[Val] I am reading the following books:

  • Data Mining and Business Analytics with R by Johannes Ledolter
  • Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) by Ian H. Witten, Eibe Frank, and Mark A. Hall
  • Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die by Eric Siegel

[Wee Hyong] I am reading the following books:

  • Super Crunchers: Why Thinking-By-Numbers Is the New Way to Be Smart by Ian Ayres
  • Competing on Analytics: The New Science of Winning by Thomas H. Davenport and Jeanne G. Harris.

Any closing thoughts?

[Val] One of the things we share in the book is that, despite the current hype, data science is not new. In fact, the term data science has been around since 1960. That said, I believe we have many lessons and best practices to learn from other quantitative analytics professions, such as actuarial science. These include the value of peer reviews, the role of domain knowledge, etc. More on this later.

[Wee Hyong] One of the reasons that motivated us to write the book is we wanted to contribute back to the data science community, and have a good, concise data science resource that can help fellow data scientists get started with Azure Machine Learning. We hope you find it helpful. 

EF6.1.2 RTM Available

$
0
0

Today we are pleased to announce the availability of EF6.1.2. This patch release includes a number of high priority bug fixes and some contributions from our community.

 

What’s in EF6.1.2?

EF6.1.2 is mostly about bug fixes, you can see a list of the fixes included in EF6.1.2 on our CodePlex site.

We also accepted a couple of noteworthy changes from members of the community:

  • Query cache parameters can be configured from the app/web.configuration file
       
        
  • SqlFile and SqlResource methods on DbMigration allow you to run a SQL script stored as a file or embedded resource.

 

Where do I get EF6.1.2?

The runtime is available on NuGet. Follow the instructions on our Get It page for installing the latest version of Entity Framework runtime.

The tooling is available on the Microsoft Download Center. You only need to install the tooling if you want to create models using the EF Designer, or generate a Code First model from an existing database.

 

Thank you to our contributors

We’d like to say thank you to folks from the community who have contributed to the 6.1.2 release so far:

  • BrandonDahler
  • ErikEJ
  • Honza Široký
  • martincostello
  • UnaiZorrilla

 

What’s next?

In addition to working on the next major version of EF (Entity Framework 7), we’re also working on another update to EF6. This update to EF6 is tentatively slated to be another patch release (EF6.1.3) and we are working a series of bug fixes and accepting pull requests.

WIRED: How Skype Used AI to Build Its Amazing New Language Translator

$
0
0

Re-post of an article that recently appeared on 

“… a new Microsoft technology that seems borrowed from the world of Star Trek

“… a Skype add-on that listens to the English words you speak into Microsoft’s internet phone-calling software and translates them into Spanish, or vice versa.”

“… an amazing technology, and it’s based on work that’s been going on quietly inside Microsoft’s research and development labs for more than a decade.”

Read the original WIRED magazine post here.

ML Blog Team

Skype Translator Puts Machine Learning to the Test

Readers’ Choice – Our 10 Most Popular ML Blog Posts of 2014

$
0
0

We launched this blog in June 2014 with the intent of sharing important advances and practical knowledge accumulated by Microsoft in the field of ML. After six months of regular posts, many of them authored by world-leading ML researchers and practitioners, we are seeing tens of thousands of readers such as yourself regularly visiting our blog site where, we hope, you are finding articles of value and relevance to your own ML journeys.

As we take one final look back at the year 2014, we figured we would share the top 10 most-read posts of 2014. Here they are, listed below, in increasing order of popularity.

10. Machine Learning, meet Computer Vision
Jamie Shotton, Antonio Criminisi and Sebastian Nowozin explore the challenges of computer vision and touch on the powerful ML technique of decision forests for pixel-wise classification.

9. Python Tools for Visual Studio now integrates with Azure Machine Learning
Shahrokh Mortazavi talks about Python support in Azure ML, including the powerful Python centric Data Science IDE, PTVS – a completely free and open source tool that is helping democratize ML and advanced analytics. 

8. Vowpal Wabbit for Fast Learning
John Langford shares information about the speedy VW open source ML system sponsored by Microsoft.

7. Machine Learning and Text Analytics
Dr. Ashok Chandra talks about how we are now able to take advantage of signals to determine the salient entities being discussed in textual articles.

6.The Joy (and Hard Work) of Machine Learning
Joseph Sirosh discusses how enterprises can tap into the potential of ML to deliver enormous value in diverse applications that can improve customer experience, reduce the risk of systemic failures, grow revenue and bring about significant cost savings.

5. Machine Learning Trends from NIPS 2014
John Platt shares 3 exciting trends he saw at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal this year.

4. What is Machine Learning?
John Platt provides some much-needed context around ML and also shares a taxonomy of ML applications.

3. Twenty Years of Machine Learning at Microsoft
John Platt discusses Microsoft’s 20+ years of experience in creating ML systems and applying them to real world problems, including what it takes to actually deploy ML in production.

2.How Azure ML Partners are Innovating for their Customers
At the Worldwide Partner Conference, Joseph Sirosh talks about how Azure ML – which is changing the game for building ML applications at scale and in the cloud – is being used by Microsoft’s partners to rapidly build novel solutions for our customers.

1. Rapid Progress in Automatic Image Captioning
John Platt talks about the exciting progress researchers have made in creating systems to automatically generate descriptive captions of images.

 

We wish our readers a very happy and productive 2015!   

ML Blog Team

[Tutorial & Sample] ODataLib Custom Payload Format

$
0
0

This post is to guide you through the custom payload format support in ODataLib, which was introduced in ODataLib 6.9.0 release.

All the sample codes in this post have been put to ODataSamples project, you can check it out under Scenarios\CustomFormat folder.

Background

The OData specification defined several payload formats for data exchange. ODataLib has got built-in support for the JSON format(see OData JSON Format spec). But there are cases when someone tries to build up a RESTful service following OData conventions, and also wants to use a custom payload format other than JSON. For example, assuming we have an existing service that uses a custom data format understood by existing clients.  If we change the service to OData, we may prefer to keep the current data format, so that while taking advantage of OData features, existing clients could still consume the data returned by the service, or generate request that the service could read.

The OData library was designed to support various payload formats, however, some of those read/write APIs were not publicly visible. In 6.9.0 release, we changed some of those APIs to be public, so that users are able to write out payloads with custom format.

In the following section, we’ll first start by looking at the overall architecture of ODataLib’s reader/writer component and the media type resolving process. Then I will give a demo on how to write a custom payload extension for CSV (comma separated value) format.

Reader/Writer Overview

Here is a figure to describe the main classes used by ODataLib’s reader and writer component.

 

[click to enlarge]

As you can see, the ODataLib’s reader and writer share similar structure. The main entry point classes are ODataMessageReader and ODataMessageWriter. They both take a message and a setting class as input. Then when user calls some read/write APIs, ODataMessageReader/ODataMessageWriter would internally figure out the proper payload format to use, then perform corresponding input/output actions using that format.

Let’s take a look at what happens when user tries to write an entry in response payload:

  1. User prepares an OData response message, sets the header information, and then creates an ODataMessageWriter;
  2. User calls CreateODataEntryWriter method on ODataMessageWriter, and gets a format specific ODataWriter;
  3. User calls writing actions on the ODataWriter, such as StartEntry, EndEntry, etc.

In step 2, when creating the format specific ODataWriter, ODataMessageWriter would at first figure out what the payload format to use, we call this media type resolving. After that, an ODataFormat instance is present. Later on the ODataMessageWriter calls the CreateOutputContext method on ODataFormat to get the format specific output context, then calls the corresponding method on the context (CreateEntryWriter in this case).

Media Type Resolving

For media type resolving, the input is the content-type header from ODataMessage, or instructions from settings (for writing only), and the output is an ODataFormat class. The ODataFormat represents the actual payload format, and it is the key point for decoupling various formats with reader/writer APIs. Besides, we have ODataMediaType class which represents a certain kind of media type. Also, we have got ODataMediaTypeFormat that gets one ODataFormat bound to an ODataMediaType. At last, we provide MediaTypeResolver class responsible for choosing correct media types.

MediaTypeResolver class contains following method:

public virtual IEnumerable GetMediaTypeFormats(ODataPayloadKind payloadKind)

This API should give out which media types are available for given payload kind. Internally, ODataLib’s reader and writer would first call this method for certain payload kind to get a list of supported ODataMediaTypeFormat, then it would choose the best match based on media type information from request message.

In general, the default implementation of MediaTypeResolver would return JSON format for data request, and XML format for metadata request. Derived classes could choose to override this method to have their own behavior. Thus users could provide a custom media payload format by overriding it and returning the expected ODataMeidaFormat. We’ll demo how to write a custom MediaTypeResolver in following section.

Implementing a CSV Format extension

Here we’ll have a demo on how to implement an extension that supports writing out CSV format payload. Our goal is to support writing an ODataEntry as a CSV format, every property appear to be in a single column. Please make sure your project have installed Microsoft.OData.Core 6.9.0 package from NuGet gallery before getting started.

At first, we will implement the class CsvWriter and CsvOutputContext.

  • The class CsvWriter derives from ODataWriter, to simplify the demo, we omitted some validation logic and the async API implementation here.
  • The class CsvOutputContext derived from ODataOutputContext, this class acts as a bridge between ODataMessageWriter and specific writer, we'll return CsvWriter instances from CreateODataEntryWriter/CreateODataFeedWriter method here.

 

Then we can write our own CsvFormat class, which is quite simple. Please note that we omit implementation for other abstract method here.

 

At last, we can implement the MediaTypeResolver for CSV, here we’ll bind media type ‘text/csv’ to our CsvFormat.

 

Here is a usage demo for writing an OData feed with CSV format. Please note that we almost use same code logic for writing JSON payload here. The only difference is we’ll set our CsvMediaTypeResolver to the MessageWriterSettings, while the message’s content type header is set to CSV format.

 

Here is the output:

Id,Name,

51,Name_A,

52,Name_B,

 

In our ODataSample project, we have a WebAPI service project which supports both CSV and VCard formats, you can check it out here.

Azure ML Predicts Customers’ Shopping Lists – Even Before They Shop!

$
0
0

We continue our series of posts how Microsoft customers are gaining actionable insights on their data by operationalizing ML and advanced analytics – at scale and in the cloud.

As one of the largest independent food delivery service companies in the UK, JJ Food Service provides over 60,000 customers with everything they need for their own food businesses. Their catalog has over 4,500 products ranging from fresh, frozen or dry foods to paper and cleaning supplies and get fulfilled from any of eight warehouses.

Customers can either place orders online or by speaking to call center representatives over the phone. As orders come in each day, logistics teams route and sequence these orders, employees at warehouses then load the appropriate products overnight, and drivers hit their delivery routes the next morning – and the same cycle repeats all over again.

Although the existing processes at JJ Food Service are quite streamlined, as a company that prides itself on staying at the cutting-edge of technology, their ambitions ran much further.

Back in 2004, JJ Food Service implemented Microsoft Dynamics for their ERP and CRM needs. Over the past decade, they refined their operations and Microsoft Dynamics AX now powers their entire operations – right from HR, procurement and sales to warehouse management and order processing.

Recognizing that they had an exceptionally rich vein of customer data, the Chief Operating Officer at JJ Food Service, Mushtaque Ahmed, saw an opportunity to use this data to further boost customer satisfaction. An area where they felt they could save their customers’ some time was by anticipating customer orders, i.e. recommending products to them even before they had entered anything into the system. They had several other ideas for predictive analytics too. At the same time, the company was concerned about the potentially big costs they might incur in staffing up and implementing an advanced analytics project such as this.

That’s when Azure ML entered the picture. 

Predictive Shopping Lists

Customer orders at JJ Food Service, of course, vary widely in terms of what gets purchased and when, order size, type, frequency and many other criteria. In anticipating customers’ future needs, what they needed were tailored insights based on each customer’s past order patterns. For instance, a particular restaurant might order salad greens every day, flour about every two weeks, and cooking oil once a month. “To be successful, we needed to be relevant for that week, that day, that exact point in time,” Ahmed explained.

JJ Food Service was convinced that Azure ML could help them address their needs in a very cost-effective manner. They started working with the Microsoft Azure team, first writing code for their website to capture customer behavior and then using three years of transactional data to train an Azure ML predictive model. Next, they integrated the recommendations from this model into both their call center environment and their website, thus ensuring that their phone-based customers would get the exact same recommendations (via call center representatives) as what online customers would see on their site.

The system took only three months to implement. Today, whether customers call in or log in, the system bubbles up the same predictions using its analysis of past purchases – in both cases, the order pad gets filled out in the same fashion, and automatically.

The net result? More satisfied customers who find a high level of efficiency in their shopping experience.

Recommendations Add a More Personal Touch

In addition to the predictive shopping list, customers also get recommendations for related items that they might want to order. For instance, if a fish and chips shop were to order batter, the system might ask whether they need specific spices that go along with that. Also, prior to checkout, the system reviews the overall order to determine whether the combination of items shopped indicates a need for additional products. For instance, if a fast-food restaurant orders meat, poultry, vegetables and beverages, would it also need cooking oil? Or perhaps paper cups, if their supply might be running low?

JJ Food Service estimates that these recommendations currently make up about 5% of the shopping cart. While that may not seem large – and, in fact, Ahmed expects this number to go down a bit as the system gets smarter at predicting orders even more accurately – when you consider the company’s size, this really adds up. Plus it’s a nice personal touch for customers. As Ahmed says, “The wow factor is huge. Customers are amazed that we can predict so accurately what they need.”

Targeting New Customers More Effectively

JJ Food Service realizes that there’s no better way to capture business from new customers than by making themselves indispensable from the very moment they log on.

By using the Azure ML recommendation system to display products purchased by similar companies, they are now able to show immediate value to brand new customers, shaving valuable time that would otherwise be spent in browsing a new catalog and compiling orders for the first time.

At JJ Food Service this is just the start of a journey. They are looking at additional possibilities beyond increasing customer satisfaction and driving incremental sales. For instance, they plan to stock their warehouses more efficiently by using forecasts of what customers, in aggregate, are likely to buy in the near future. They are also exploring how to use the recommendation system for promotions and to target new product launches at specific types of customers.

As Ahmed concludes. “Microsoft Dynamics AX works hard for us, automating processes. But we also need to make these processes intelligent – and that’s where Microsoft Azure Machine Learning is vital.”

ML Blog Team
Subscribe to this blog. Follow us on twitter


[Announcement] RESTier - A turn-key solution to build OData services

$
0
0

What is RESTier

RESTier is a RESTful API development framework for building standardized, OData V4 based REST services on .NET. It can be seen as a middle-ware on top of Web API OData.  RESTier is built with the inspiration of combining simplicity of WCF DS with the flexibility of Web API OData.

The main exciting features of RESTier are:

  • Help developer quickly build an OData service within minutes. You need just one controller, no more than 100 lines of code to easily bootstrap an OData service. 
  • Help developer easily add business logic into their services.

What about ASP.NET Web API OData?

As mentioned in the first part, RESTier is based on Web API OData. Web API OData will continuously be improved and RESTier will benefit from the improvements.

Getting started

The main getting started tutorials below show you how to user RESTier step by step.

Samples 1: Getting started - basic shows you how to use RESTier within minutes to build an OData V4 RESTful service.

Samples 2: Getting started - advanced  is based on the Getting started basic with more advance scenarios showing you how to add rich business logic to the RESTful service and fall back to Web API OData to enable complex features currently nor directly supported by RESTier.

Document and more samples

RESTier intends to be fully open-sourced, source code will be available on GitHub soon.

  • GitHub repository . We use GitHub to track issues. You can report bugs, provide improvement suggestion directly on GitHub
  • RESTier wiki . Detailed document and samples are available here.

All kinds of contribution and feedback are warmly welcomed.

Please be noted

  • RESTier is still at a preview stage.
  • RESTier currently only supports Entity Framework data provider. Other data providers will be added in the future.

Advice or suggestion

There are some issues with the comment system of MSDN blog. Any advice or suggestion please send to odatafeedback@microsoft.com or open GitHub issue on its repository listed above. 

The commenting system is not working

$
0
0

Dear OData blog readers,

The commenting system is not working. The MSDN support team has been reached out to and the rough fix timeline is early Feb.

During this time, any advice or suggestion please send mail to odatafeedback@microsoft.com or open GitHub issues at our project repositories on GitHub.

Best regards,

The OData team

Perspectives from Microsoft Data Scientists Val Fontama and Wee Hyong Tok

$
0
0

Repost of an article earlier published on the SQL Data Platform Insider blog. 

   In an earlier post we talked about a new book titled Predictive Analytics with Microsoft Azure Machine Learning which released in December on Amazon.com where it was doing rather well.

The Data Platform Insider blog team at SQL recently had an opportunity to sit down with a couple of Microsoft authors of that book to learn more about their roles as Data Scientists, some of the real-world successes they’ve seen, their perspectives on opportunities in this evolving field as well as about their new book. Click here to hear directly from authors Val and Wee Hyong.

ML Blog Team

 

 

How to consume SQL Spatial Data with Web API V2.2 for OData V4

$
0
0

Introduction

Today, along with the increasing demands of Location-Based Services (LBS), it becomes more and more important to provide functionalities on SQL Spatial Data through a unique, robust and scalable service based on a standard protocol. 

This post is intended to give a tutorial about how to consume the SQL Spatial Data through EF &Web API V2.2 for OData V4. The method in this blog is similar to the method in How to use SQL Spatial Data with WCF ODATA Spatial, but the latter is based on WCF Data service. However, for more information about how to create a simple OData V4 service based on Web API, please refer to Create an OData v4 Endpoint Using ASP.NET Web API 2.2.

Ok, let’s get started.

Overview of Spatial Data types

First of all, let’s quickly review the Spatial Data types both in OData and SQL. OData defines eight Spatial Data types for geography & geometry respectively. Both of them are implemented in the Microsoft.Spatial name-space of the corresponding OData library. However, the CLR classes for SQL Spatial Data types are defined in the System.Data.Spatial namespace, in which class DbGeography is for geography Spatial Data type and class DbGeometry is for geometry Spatial Data type.

In order to make the SQL Spatial Data types working on Web API, we should make a mapping between them. Below are the Spatial Data types in OData and the mapping between SQL Spatial Data types.

OData & SQL Spatial Data Types and the Mapping
Geography*Geometry*SQL Spatial Data Type
Edm.GeographyEdm.Geometry~
Edm.GeographyPointEdm.GeometryPointPoint
Edm.GeographyLineStringEdm.GeometryLineStringLineString
Edm.GeographyPolygonEdm.GeometryPolygonPolygon
Edm.GeographyMultiPointEdm.GeometryMultiPointMultiPoint
Edm.GeographyMultiLineStringEdm.GeometryMultiLineStringMultiLineString
Edm.GeographyMultiPolygonEdm.GeometryMultiPolygonMultiPolygon
Edm.GeographyCollectionEdm.GeometryCollectionGeometryCollection

 

Create database with Spatial Data types

Code First

Let’s define a simple CLR class and use Code First to perform the database access.

Where, Location describes a point, Line describes a LineString.

Database context

Based on the above class, we can define a class derived from DbContext to represent a connection with the Database, by which we can create, query, update and delete the data.

For simplicity, we create the following sample “CustomerGeoContext” table for test.

Apply the OData Spatial Data types

Wrapping the Spatial Data types

We use the explicit and implicit operator to define wrappers by which SQL Spatial Data types can convert to and from ODL Spatial Data types. Here, we provide the wrapper for point and LineString, users can easily add more into this wrapper to convert other Spatial Data types.

//GeographyWrapper

//GeographConvert

Change the Model

Based on the wrapper, we can change the class to apply the OData Spatial Data types:

Where, EdmLocation and EdmLine are new added properties and marked them by NotMappedAttribute to exclude them from database mapping.

Build Edm Mode

Now, we can use the model builder to build the Edm model.

Where, two Ignore() calls are necessary to exclude the DbGeography type from final Edm model.

Expose Metadata document

Once the Edm model is built, we can query the metadata document as:

From the metadata document, we can find that:

  1. Customer is an entity type with four properties.
  2. Locationand Line are OData Spatial Data types.

Consume SQL Spatial Data

It’s time to query the SQL Spatial Data through Web API OData service.

Build OData Controller

Let’s build a Web API convention controller in which we provide the basic query functionalities.

Expose single entity

Let’s have an example to query single entity with spatial data.

Request:
GET ~/Customers(2)

Here's the response: 

Thanks.

Addressing Fairness, Accountability, and Transparency in Machine Learning

$
0
0

Hanna Wallach is a Microsoft ML researcher based out of New York City and also serves as a faculty member in the Computational Social Science Initiative at the University of Massachusetts Amherst.

Hanna recently gave a talk on the topic of Fairness, Accountability, and Transparency in ML at NIPS 2014.

You can read a transcript of her talk by clicking here or on the image below.

ML Blog Team

Viewing all 808 articles
Browse latest View live