MSBuild support for Schema Compare is available

July 15, 2014, 10:26 pm

≫ Next: Microsoft named a Leader in Agile Business Intelligence by Forrester

≪ Previous: SQL Server Data Tools July Update

Schema compare is one of the most important Visual Studio SQL Server tooling components. As of our July release the schema compare functionality is available via MSBuild. It can be run from the command line or as an integrated part of automated project build systems to detect changes and generate reports.

Supported versions

SQL Server 2005, 2008, 2008 R2, 2012, 2014, and Microsoft Azure SQL Databases
Dacpac files targeting SQL Server 2005, 2008, 2008 R2, 2012, 2014, and Microsoft Azure SQL Databases

Installation and Usage

Environment Setup

Schema Comparison requires a supported version of Visual Studio. This version must include the latest version of the SQL Server tooling. Note that this is required in all cases, whether running on a local machine or on a build server.

Functionality

The core schema compare engine has been totally redesigned in this latest release. One benefit is improved functionality and configurability. MSBuild integration was at the top of our list for new functionality and we’re very happy to add this much-requested feature.

You can now embed schema compare into your daily build process and easily trace your schema changes. Databases and .dacpac files are supported and all the settings and options you are familiar with are included. Two report formats are supported in this version, plain text and XML. The report contains exactly the same information you would see in Visual Studio. You can even customize your report by providing your own XSD when generating an XML report.

Supported Features

Participant Type (Source/Target for Schema Comparison)	Supported?
Database	Yes
Dacpac	Yes
Project	No. The generated .dacpac file from a build must be used instead

Action	Supported?
Generate Text Report	Yes
Generate XML Report	Yes
Update Database	Yes
Generate DB Update script	No
Update Project	No (Projects not directly supported in this release)

Sample: Running Schema Compare from the command line

Note:

Because Schema Compare is run via MSBuild, a valid project file is required. This can be a .sqlproj file, in which case all necessary targets will be defined, but can also be any project file as long as it imports the SSDT target file. If you are using a .sqlproj file remember that MSBuild will default to a Visual Studio version associated with the .NET Framework on your machine. If you have both Visual Studio 2012 and Visual Studio 2013 installed, running MSBuild from the command line will default to running the SQL Server Data Tools components installed inside Visual Studio 2012. To overcome this add /p:VisualStudioVersion=12.0 to your MSBuild statement if you wish to run using the SQL Server Data Tools components installed inside the VS2013 install directory. Here is the simplest possible project file you might need. Creating a “MinSchemaCompare.proj” file and copying this into the file allows you to run schema compare against any of the supported targets :

xml version="1.0" encoding="utf-8"?><Project DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="4.0"><Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')"/><PropertyGroup><VisualStudioVersion>12.0VisualStudioVersion>PropertyGroup><Import Project="$(SqlTaskTargetPath)\Microsoft.Data.Tools.Schema.SqlTasks.targets"/>Project>

If you are not working on a SQL Project, make sure you set VisualStudioVersion and have Microsoft.Data.Tools.Schema.SqlTasks.targets imported in your project file. You can find it under %ProgramFiles(x86)%\MSBuild\Microsoft\VisualStudio\{VisualStudioVersion}\SSDT.

Here is what the command looks like:

msbuild "d:\sample.csproj" /t:SqlSchemaCompare /p:source="d:\source.dacpac" /p:target="d:\target.dacpac" /p:XmlOutput="d:\1.xml" /p:XsdPath="d:\SampleXsd.xsd"

This command line compares source and target .dacpac files and generates the comparison result to an XML file based on the XML schema provided.

If you don’t want to write long command line strings (especially when you are comparing databases), you can save a .scmp file and use it in the schema compare MSBuild task as follows

D:\SampleProject > msbuild /t:SqlSchemaCompare /p:SqlScmpFilePath="d:\sc.scmp" /p:XmlOutput="d:\1.xml" /p:Deploy="true"

Notice that schema compare uses default XSD file if the file is not specified.

The settings inside the .scmp file can be overwritten by specifying values from command line.

C:\SampleProject > msbuild /t:SqlSchemaCompare /p:SqlScmpFilePath="d:\sc.scmp"  /p:target="d:\target.dacpac" /p:TextOutput="d:\1.out" /p:Deploy="true"

Schema Compare Parameters

Source and Target

Database: The input is the connection string

.dacpac file: The input is the file location.

Schema Compare Options

There are a lot of comparison and deployment settings. Please see the Default Settings section at the end for a list with default values. Here are how some of the buttons in the Schema Compare tool bar map to command line arguments.

Parameter	Supported?	Default Value	Note
Deploy	Yes	False	.dacpac file is not deployable.
GroupBy	Yes	Action
Timeout	Yes	5601000 ms (5 minutes)
ShowEqualObjects	Will be supported in a future release
ShowUndeployableObjects	Will be supported in a future release

Output Format

Plain text output (Indicated by parameter TextOutput)

The default output format for each difference entry (DiffEntry) is: entry name, entry type, source value, update type, target value. To illustrate the output, let’s look at a difference in the Visual Studio UI and see how this will be output by the text formatter:

Here are the comparison result text format output corresponding to what we get from the UI.

by action(Root):        NotSpecified    
  Change(Folder):        Change    
    Table(TopLevelElement):    dbo.Table(Table)    Change    dbo.Table(Table)
      Columns(Folder):        Delete    
        Column(Element):        Delete    dbo.Table.c1
          Properties(Folder):        Delete    
            Collation(Property):        Delete    null
            IdentityIncrement(Property):        Delete    1
            IdentityIsNotForReplication(Property):        Delete    False
            IdentitySeed(Property):        Delete    1
            IsFileStream(Property):        Delete    False
            IsIdentity(Property):        Delete    False
            IsMax(Property):        Delete    False
            IsNullable(Property):        Delete    True
            IsRowGuidColumn(Property):        Delete    False
            IsSparse(Property):        Delete    False
            Length(Property):        Delete    0
            Precision(Property):        Delete    0
            Scale(Property):        Delete    0

When running from the command line it’s possible to generate extra information that isn't shown in the UI. This is configurable using the following command line parameters:

Parameter	Default Value	Note
OutputOrdering	False	If set to true, it shows OrderChanged, Ordinal, ContainsOrderChanged, SiblingOrderChanged in order
OutputRefactoring	False	If set to true, it shows Refactored and ChildRefactored in order
OutputInclusionState	False	If set to true, it shows InclusionState. This corresponds to the checkbox of each entry line in the UI which indicates whether this difference will be included in future deployments.
OnlyTopLevelItems	False
IgnorePropertyFileNameValue	False	If set to true, the value of property named “filename” will be set to the default fixed value “File_Name”. It is useful if the file name keeps changing for every compare operation and you want to ignore this difference.

If you choose to show all information of a DiffEntry, the sequence will be: Default output values, Ordering values, Refactoring values, Inclusion values.

XML file output (Indicated by parameter XmlOutput)

Unlike plain text format result which is clean but not easy to read, XML format is human readable and easier to manipulate. We provide a default XML schema file as well as Common Type XSD in case you wish to create your own custom report format.

	Parameter	Note
Use default XSD	N/A	You don’t need to specify the XSD file
Use your own XSD	XsdPath	Indicates the path of your XSD file. Multiple paths are separated by semicolon.

We pre-define a few types corresponding to each DiffEntry information, such as

The schema compare task populates the Inclusion State value when it finds an attribute or element associated with this type. You can find the Pre-defined types in %ProgramFiles(x86)%\MSBuild\Microsoft\VisualStudio\{version}\SSDT\Microsoft.Data.Tools.Tasks.SchemaCompare.CommonTypes.xsd .

At bottom of this file, there are four types you need to extend besides creating a ResultType element.

You have the flexibility to create your own format but you also need to follow some rules. Results have a hierarchical layout format, whether in the UI or when output on the command line. This is something that must be preserved in any customized XSD you create – the XML also needs to have the same hierarchy, which means the root is on top followed by the group, with DiffEntry nested inside the group.

Here is a sample XSD:

xml version="1.0" encoding="utf-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://schemas.microsoft.com/SSDT/SqlTasks/SchemaCompare.xsd"
    xmlns="http://schemas.microsoft.com/SSDT/SqlTasks/SchemaCompare.xsd"><xs:redefine schemaLocation="Microsoft.Data.Tools.Tasks.SchemaCompare.CommonTypes.xsd"><xs:complexType name="DiffEntryType"><xs:complexContent><xs:extension base="DiffEntryType"><xs:sequence><xs:element name="MyResult"><xs:complexType><xs:sequence><xs:element name ="MySource" type="SourceValueType"/><xs:element name ="MyTarget" type="TargetValueType"/>xs:sequence>xs:complexType>xs:element><xs:element name="MyOrderChanged" type ="OrderChangedType"/><xs:element name="MyOrdinal" type ="OrdinalType"/><xs:element name="MySiblingOrderChanged" type ="SiblingOrderChangedType"/><xs:element name="MyRefactored" type ="RefactoredType"/><xs:element name="MyChildRefactored" type ="ChildRefactoredType"/><xs:element name="MyInclusionState" type ="InclusionStateType"/><xs:element name="MyChildren" type="ChildrenEntiesType"/>xs:sequence><xs:attribute name="MyUpdate" type="UpdateCategoryType"/><xs:attribute name="MyName" type="DisplayNameType"/><xs:attribute name="MyType" type="EntryTypeType"/>xs:extension>xs:complexContent>xs:complexType><xs:complexType name="ChildrenEntiesType"><xs:complexContent><xs:extension base="ChildrenEntiesType"><xs:sequence><xs:element name="MyEntry" type="DiffEntryType" minOccurs="0" maxOccurs="unbounded"/>xs:sequence>xs:extension>xs:complexContent>xs:complexType><xs:complexType name="GroupType"><xs:complexContent><xs:extension base="GroupType"><xs:sequence><xs:element name="MyEntry" type="DiffEntryType" minOccurs="0" maxOccurs="unbounded"/>xs:sequence><xs:attribute name="Value" type="GroupByVauleType"/>xs:extension>xs:complexContent>xs:complexType><xs:complexType name="ResultType"><xs:complexContent><xs:extension base="ResultType"><xs:sequence><xs:element name="MyGroup" type="GroupType" minOccurs="0" maxOccurs="unbounded"/>xs:sequence><xs:attribute name="MyGroupBy" type="GroupByCategoryType"/>xs:extension>xs:complexContent>xs:complexType>xs:redefine><xs:element name="Result" type="ResultType"/>xs:schema>

The output looks like:

<Result MyGroupBy="by action"><MyGroup Value="Change"><MyEntry MyUpdate="Change" MyName="Table" MyType="TopLevelElement"><MyResult><MySource>dbo.TableMySource><MyTarget>dbo.TableMyTarget>MyResult><MyOrderChanged>FalseMyOrderChanged><MyOrdinal>0MyOrdinal><MyContainsOrderChanged>FalseMyContainsOrderChanged><MySiblingOrderChanged>FalseMySiblingOrderChanged><MyRefactored>FalseMyRefactored><MyChildRefactored>FalseMyChildRefactored><MyInclusionState>IncludedMyInclusionState><MyChildren><MyEntry MyUpdate="Delete" MyName="Columns" MyType="Folder"><MyResult><MySource /><MyTarget />MyResult><MyOrderChanged>FalseMyOrderChanged><MyOrdinal>2147483647MyOrdinal><MyContainsOrderChanged>FalseMyContainsOrderChanged><MySiblingOrderChanged>FalseMySiblingOrderChanged><MyRefactored>FalseMyRefactored><MyChildRefactored>FalseMyChildRefactored><MyInclusionState>NoneMyInclusionState><MyChildren><MyEntry MyUpdate="Delete" MyName="Column" MyType="Element"><MyResult><MySource /><MyTarget>dbo.Table.c1MyTarget>MyResult>
              .
              .
              .
              .MyEntry>MyChildren>MyEntry>MyChildren>MyEntry>MyGroup>Result>

Deployment

If you want to deploy differences to the target, just use /p:Deploy=”true”. When you deploy from the UI you can choose which differences to deploy. This is also supported from the command line if the SelectedObjectsFilePath property is set. The SelectedObjectsFile looks like:

 1xml version="1.0" encoding="utf-8"?> 2<root> 3 4<Set Included="true"> 5<SelectedItem Type="Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlTable, Microsoft.Data.Tools.Schema.Sql, Version=12.0.0.0  , Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a"> 6<Name>dboName> 7<Name>table1Name> 8SelectedItem> 9Set>1011-Toggle the inclusion status of dbo.table2 -->12<Toggle>13<SelectedItem Type="Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlTable, Microsoft.Data.Tools.Schema.Sql, Version=12.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">14<Name>dboName>15<Name>table2Name>16SelectedItem>17Toggle>1819root>20

Default Settings

AdditionalDeploymentContributorArguments	null
AdditionalDeploymentContributors	null
AllowDropBlockingAssemblies	FALSE
AllowIncompatiblePlatform	FALSE
BackupDatabaseBeforeChanges	FALSE
BlockOnPossibleDataLoss	TRUE
BlockWhenDriftDetected	FALSE
CommentOutSetVarDeclarations	FALSE
CompareUsingTargetCollation	FALSE
CreateNewDatabase	FALSE
DeployDatabaseInSingleUserMode	FALSE
DisableAndReenableDdlTriggers	TRUE
DoNotAlterChangeDataCaptureObjects	TRUE
DoNotAlterReplicatedObjects	TRUE
DropConstraintsNotInSource	TRUE
DropDmlTriggersNotInSource	TRUE
DropExtendedPropertiesNotInSource	TRUE
DropIndexesNotInSource	TRUE
DropObjectsNotInSource	TRUE
DropPermissionsNotInSource	FALSE
DropRoleMembersNotInSource	FALSE
DropStatisticsNotInSource	TRUE
ExcludedTypes	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlServerDdlTrigger"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlRoute"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlEventNotification"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlEndpoint"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlErrorMessage"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlFile"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlLogin"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlLinkedServer"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlCredential"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlDatabaseEncryptionKey"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlMasterKey"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlDatabaseAuditSpecification"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlServerAudit"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlServerAuditSpecification"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlCryptographicProvider"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlUserDefinedServerRole"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlEventSession"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlDatabaseOptions"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlLinkedServerLogin"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlServerRoleMembership"
	"Microsoft.Data.Tools.Schema.Sql.SchemaModel.SqlAssemblyFile"
GenerateSmartDefaults	FALSE
IgnoreAnsiNulls	TRUE
IgnoreAuthorizer	FALSE
IgnoreColumnCollation	FALSE
IgnoreComments	FALSE
IgnoreCryptographicProviderFilePath	TRUE
IgnoreDdlTriggerOrder	FALSE
IgnoreDdlTriggerState	FALSE
IgnoreDefaultSchema	FALSE
IgnoreDmlTriggerOrder	FALSE
IgnoreDmlTriggerState	FALSE
IgnoreExtendedProperties	FALSE
IgnoreFileAndLogFilePath	TRUE
IgnoreFilegroupPlacement	TRUE
IgnoreFileSize	TRUE
IgnoreFillFactor	TRUE
IgnoreFullTextCatalogFilePath	TRUE
IgnoreIdentitySeed	FALSE
IgnoreIncrement	FALSE
IgnoreIndexOptions	FALSE
IgnoreIndexPadding	TRUE
IgnoreKeywordCasing	TRUE
IgnoreLockHintsOnIndexes	FALSE
IgnoreLoginSids	TRUE
IgnoreNotForReplication	FALSE
IgnoreObjectPlacementOnPartitionScheme	TRUE
IgnorePartitionSchemes	FALSE
IgnorePermissions	FALSE
IgnoreQuotedIdentifiers	TRUE
IgnoreRoleMembership	FALSE
IgnoreRouteLifetime	TRUE
IgnoreSemicolonBetweenStatements	TRUE
IgnoreTableOptions	FALSE
IgnoreUserSettingsObjects	FALSE
IgnoreWhitespace	TRUE
IgnoreWithNocheckOnCheckConstraints	FALSE
IgnoreWithNocheckOnForeignKeys	FALSE
IncludeCompositeObjects	FALSE
IncludeTransactionalScripts	FALSE
NoAlterStatementsToChangeCLRTypes	FALSE
PopulateFilesOnFileGroups	TRUE
RegisterDataTierApplication	FALSE
TargetDatabaseName	null
TreatVerificationErrorsAsWarnings	FALSE
UnmodifiableObjectWarnings	TRUE
VerifyCollationCompatibility	TRUE
VerifyDeployment	TRUE

There is the default XSD for XML output.

xml version="1.0" encoding="utf-8"?><xs:schema targetNamespace="http://schemas.microsoft.com/SSDT/SqlTasks/SchemaCompare.xsd"
    xmlns="http://schemas.microsoft.com/SSDT/SqlTasks/SchemaCompare.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:simpleType name="SourceValueType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="SourceValueWithoutSchemaType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="TargetValueType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="TargetValueWithoutSchemaType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="UpdateCategoryType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="DisplayNameType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="EntryTypeType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="OrderChangedType"><xs:restriction base="xs:boolean"/>xs:simpleType><xs:simpleType name="OrdinalType"><xs:restriction base="xs:integer"/>xs:simpleType><xs:simpleType name="ContainsOrderChangedType"><xs:restriction base="xs:boolean"/>xs:simpleType><xs:simpleType name="SiblingOrderChangedType"><xs:restriction base="xs:boolean"/>xs:simpleType><xs:simpleType name="RefactoredType"><xs:restriction base="xs:boolean"/>xs:simpleType><xs:simpleType name="ChildRefactoredType"><xs:restriction base="xs:boolean"/>xs:simpleType><xs:simpleType name="InclusionStateType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="GroupByCategoryType"><xs:restriction base="xs:string"/>xs:simpleType><xs:simpleType name="GroupByVauleType"><xs:restriction base="xs:string"/>xs:simpleType><xs:complexType name="DiffEntryType"><xs:sequence><xs:element name="Source" type="SourceValueType"/><xs:element name="Target" type="TargetValueType"/><xs:element name="OrderChanged" type ="OrderChangedType"/><xs:element name="Ordinal" type ="OrdinalType"/><xs:element name="ContainsOrderChanged" type ="ContainsOrderChangedType"/><xs:element name="SiblingOrderChanged" type ="SiblingOrderChangedType"/><xs:element name="Refactored" type ="RefactoredType"/><xs:element name="ChildRefactored" type ="ChildRefactoredType"/><xs:element name="InclusionState" type ="InclusionStateType"/><xs:element name="Children" type="ChildrenEntiesType"/>xs:sequence><xs:attribute name="Update" type="UpdateCategoryType"/><xs:attribute name="Name" type="DisplayNameType"/><xs:attribute name="Type" type="EntryTypeType"/>xs:complexType><xs:complexType name="ChildrenEntiesType"><xs:sequence><xs:element name="Entry" type="DiffEntryType" minOccurs="0" maxOccurs="unbounded"/>xs:sequence>xs:complexType><xs:complexType name="GroupType"><xs:sequence><xs:element name="Entry" type="DiffEntryType" minOccurs="0" maxOccurs="unbounded"/>xs:sequence><xs:attribute name="Value" type="GroupByVauleType"/>xs:complexType><xs:complexType name="ResultType"><xs:sequence><xs:element name="Group" type="GroupType" minOccurs="0" maxOccurs="unbounded"/>xs:sequence><xs:attribute name="GroupBy" type="GroupByCategoryType"/>xs:complexType><xs:element name="Result" type="ResultType"/>xs:schema>

↧

Microsoft named a Leader in Agile Business Intelligence by Forrester

July 16, 2014, 3:14 pm

≫ Next: Sentiment Analysis with Microsoft APS and StreamInsight

≪ Previous: MSBuild support for Schema Compare is available

We are pleased to see Microsoft acknowledged by Forrester Research as a Leader in The Forrester Wave™: Agile Business Intelligence Platforms, Q3 2014.

We are happy to see what we believe to be an affirmation in our approach and in the strength of our technologies. Our placement in this report reflects both high scores from our clients for product vision, as well as for client feedback collected as part of the customer survey. Forrester notes that “Microsoft received high client feedback scores for its agile, business user self-service and [advanced data visualization] ADV functionality. Clients also gave Microsoft BI a high score for its product vision”. This feedback from our customers is especially gratifying to see.

Microsoft is delivering on our vision of making business intelligence more agile and accessible through the tools that people use every day. With the accessibility of Excel and the recent release of Power BI for Office 365, we aim to lower the barrier of entry for users and reduce the complexity of deploying business intelligence solutions for IT. Using Microsoft’s business intelligence solution, companies such as MediaCom have reduced time to reporting from weeks to days, Carnegie Mellon is using data to reduce energy consumption by 30%, and Helse Vest is combining hospital data to visualize trends in real time.

We appreciate the recognition of our software in this report. Above all, we value our customer’s voice in helping shape and validate this approach.

↧

Sentiment Analysis with Microsoft APS and StreamInsight

July 17, 2014, 9:00 am

≫ Next: Users Embrace Azure ML Public Preview

≪ Previous: Microsoft named a Leader in Agile Business Intelligence by Forrester

In this overview and demo, we will show you what sentiment analysis is and how to build a quick mashup that combines real-time access to multiple data sources using tools from Microsoft.

Sentiment analysis is one of the hottest topics in the Big Data space. Sentiment analysis is the process of analyzing customer comments and feedback from Facebook, Twitter, Email, and more. The purpose of the analysis is to understand the overall sentiment the customer is trying to convey. This could be a negative sentiment, when the customer is unhappy with a company or its product. Neutral sentiment, when the customer is only mentioning a company or product, in passing, without a good or a bad feeling. The last is positive sentiment, when a customer is happy or excited about a company or its product.

Traditionally sentiment analysis was complicated because it required a mixture of very complex platforms and tools. Each component required for sentiment analysis was offered from a different company and required a large amount of custom work. The difficulty is further exasperated by hard-to- achieve business requirements. When we discuss sentiment analysis there are 3 key business requirements we see repeated:

Real-time access
Full granular data set (structured & unstructured)
BI and SQL front-end

Real-time Access

In the case of real-time access, business users need access to fresh data. In the world of social media, customer sentiment can change rapidly. With images and videos quickly being posted with re-tweets and Facebook ‘like’ capabilities, a good or bad aspect of a company’s product can go viral in minutes. Business users need to have the ability to analyze data as it comes in, in real-time. We will show in our overview video and demo, how we can utilize Microsoft’s StreamInsight technology for real-time data analysis and complex-event processing.

Full Granular Data Set

In the case of full granular data, in practice we have seen that using a traditional database system can hinder development. This is because a lot of the data that comes in for sentiment analysis such as email, is in a semi-structured or unstructured format. This means the data is not easily modeled into a database. The data does not come in a simple row/column format. Thus we utilize our Big Data technology that is meant for this type of data: HDInsight (Hadoop). HDInsight is essentially Hortonworks Data Platform running on Windows. In our case we utilize HDInsight to land all of the data, in its raw original format, into the distributed file system HDFS. This allows us to ingest any kind of data, regardless of structure, and store that data online for further analysis at low cost. The Hadoop software is open-source and readily available.

BI and SQL Front-End

The most important area around delivering sentiment analysis to the business is access, making sure we are able to provide the data both in real-time (and high-fidelity) within the tools that our business users know and love. Previously when our customers were doing sentiment analysis on Hadoop systems, BI and SQL access was not available. This was not because the tools could not integrate with Hadoop systems. This was because they could not scale or have the same level of functionality. Some BI users have chosen Hive ODBC in Hadoop, which many claim to be slow and ‘buggy’. Instead here we utilize one of our flagship technologies: PolyBase. With PolyBase we expose the data in Hadoop, and relational SQL Server, with one T-SQL query. What this means is users can use BI tools like Excel, SSAS, or other 3^rd party tools. They can then utilize PolyBase within Analytics Platform System (APS) to query that data either in Hadoop, or Parallel Data Warehouse (SQL Server), or mash up the data from both systems!

How It Works

Now we will show you how to use all of the tools from the SQL Server data platform to achieve sentiment analysis. This will allow you to quickly deploy and meet all 3 business requirements through a set of tools and platforms that are very easy to use, fully integrated, and ‘just work’ together.

Let’s get started with the first video (~5 minutes) where we present sentiment analysis using Microsoft technologies. We show you how sentiment analysis works, and how the Microsoft products fit. We then follow up by discussing the architecture in detail surrounding StreamInsight, HDInsight, and Analytics Platform System.

Watch the overview video:

Demo

In the second video (~7 minutes), we show you sentiment analysis in action. The demo will include a full sentiment-analysis engine running in real-time against Twitter data along with a web dashboard. We then stream Twitter data to both HDInsight and Parallel Data Warehouse. Finally, we end the demo by showcasing PolyBase, our flagship technology. With Polybase we can do data mashups combining data from relational and non-relational systems. We will use Polybase to write standard T-SQL queries against this data to determine tweet analytics and how social sentiment is fairing for our marketing campaigns and products.

Watch the demo video:

↧

Users Embrace Azure ML Public Preview

July 18, 2014, 9:00 am

≫ Next: Pie in the Sky (July 18th, 2014)

≪ Previous: Sentiment Analysis with Microsoft APS and StreamInsight

This blog post is authored by Roger Barga, Group Program Manager for Microsoft Azure Machine Learning

Last month we announced Microsoft Azure Machine Learning and, this week, we made it available for public preview at our Worldwide Partner Conference 2014 (WPC). There was a lot of excitement and anticipation in our team leading up to the launch. We have worked closely with customers and partners in our Technical Advisory Program (TAP) and Private Preview program, listening to their feedback and adjusting our service accordingly. But there still was the open question of how it would be received by the general user community. In machine learning (ML) speak, we had great training data through our early private preview customers, but would our model generalize? That question was answered in short order.

Minutes after Scott Guthrie announced the public preview launch of Azure ML in his WPC opening keynote on Monday morning, our service meters signaled the first users had provisioned their own Azure ML workspace from the Azure Portal and had started running experiments. Momentum built throughout the first day and by day two of WPC on Tuesday, over 1,000 users had provisioned roughly 1,300 modeling workspaces on Azure ML, and these new users had built and run over 2,000 experiments and deployed over 50 ML web services on Azure.

These numbers were encouraging but there is nothing quite like hearing directly from our users. On Tuesday evening at a WPC social event, I met a data scientist from one of our partner companies. He shared with me that, upon hearing of Azure ML in the keynote on Monday, he had skipped all social events that evening and returned to his hotel room where he worked with Azure ML until the wee hours of the morning. He was thrilled with the service and noted that he had never been able to put a model into production so fast in his professional career.

It’s electrifying to see this level of passion and intellectual curiosity around data science and ML, as it is to see our customers using Azure ML to build and evaluate predictive models, run experiments, and then publish their model as a web service in minutes.

There was a lot of activity at our demo booth at WPC as attendees stopped by to learn more about Azure ML and to see the applications that our partners had built and deployed for their customers. If you wish to learn more or get started yourself you will find self-learning resources and a user forum on the Azure ML Central site.

And, something else I am very excited to share…

At the 2014 Microsoft Research Faculty Summit which took place in Redmond earlier this week, MSR announced a new program which will provide Azure ML access grants to both seasoned researchers and students. There are two flavor of these access grants. The first is a data science instructional award which will provide an individual account on Azure ML for each student in an intermediate or advanced data science class, along with 500 GB of cloud data storage for each student. The second is a research collaboration award which will provide a shared workspace on Azure ML, along with 10 TB of cloud data storage, to enable a group of researchers interested in hosting a data collection in Microsoft Azure ML to discover and share predictive models.

We look forward to seeing the data science courses that will use Azure ML, and the creative ML web services that students will build, and the research collaboration that spring up in the academic community around shared Azure ML workspaces. Read more about the MSR Azure ML grant program here.

Having returned from WPC, our team is now turning our collective attention to the road ahead. This is just the beginning for our new service. We look forward to seeing what exciting things our customers, partners and researchers in academia accomplish with Azure ML. We’ll listen closely to their feedback and requests while our service is in public preview. Just as an ML model never really ships, but rather it constantly improves over time with feedback and learning, our team will continue to refine and improve Azure ML in response to customer feedback and our own learnings while in public preview.

If you have not tried Azure ML yet, you can go ahead and get started right now. Happy modeling and let us know your thoughts – we are listening…

Roger

↧

Pie in the Sky (July 18th, 2014)

July 18, 2014, 12:12 pm

≫ Next: SQL Server 2012 with SP2 Slipstream ISO images fixed.

≪ Previous: Users Embrace Azure ML Public Preview

Flying this weekend, so definitely need stuff to read. Here's part of what I will be reading while traveling.

Cloud

Scaling an Azure website to 380k queries per minute: Some perf stuff tested with Loader.io.

Client/mobile

What's new in Chrome 36: New stuff in chrome including full support for web components.
Web components aren't ready for production: Not yet at least.
LREditor: An editor for HTML5 2D games.
App.js: A lightweight JavaScript UI for mobile web apps.
CSS Shapes: Because not everything needs to be a rectangle.
Introduction to Swift: An introduction to Apple's new language.
Moving beyond responsive, to the adaptive web: But I just figured out how responsive worked!

Hardware/Internet of Things

LED matrix display with Jonny-Five: Arduino + Node.js
Getting started with Azure Service Bus Event Hubs: An introduction to this new preview service
IoT with Azure Service Bus Event Hubs: Authenticating and sending from any type of device (.net and JS samples.)

Node.js\JavaScript

httpolyglot: A module for serving http and https over the same port.
Resful APIs with Express: Pretty much everything needs a REST API.
Managing garbage collection: Performance tip.
Model driven development: Plus a new product from StrongLoop
Classing{js}: Classical OOP for JavaScript.
SQL Server and Node.js: A very good read the adventures of figuring out how NTLM auth works when connecting to SQL Server.
Node.js, Azure and IIS: An interesting bit of info on running Node.js under IISNode.

Ruby

Ruby bookmarks: A collection of links for learning Ruby.
Run a remote rails production console: Via Capistrano.
Zero downtime database migrations: Definitely nice to have.
Processing and Ruby: Using Ruby with Processing for graphics.

Misc.

How to create and share a Vagrant base box: In case you need to set something up for a bunch of developers.
iTerm2: A terminal emulator for OS X.
openFrameworks + universal Windows apps: You can now create universal apps for Windows using openFrameworks.

Enjoy!

-Larry

↧

SQL Server 2012 with SP2 Slipstream ISO images fixed.

July 18, 2014, 4:28 pm

≫ Next: Open complex type step by step with Web API 2.2 for OData v4.0

≪ Previous: Pie in the Sky (July 18th, 2014)

Hi all, just to let you know that we have fixed the issue that I was referring to couple days ago ( SQL Server 2012 with SP2 Slipstream ISO images do not install SP2 ). The new ISOs have been posted on their respective release channels (MSDN......(read more)

↧

Open complex type step by step with Web API 2.2 for OData v4.0

July 20, 2014, 6:59 pm

≫ Next: Cumulative Update #11 for SQL Server 2012 SP1

≪ Previous: SQL Server 2012 with SP2 Slipstream ISO images fixed.

Introduction

Recently, Microsoft officially announced Web API 2.2 for OData v4 via the blog post Announcing the Release of ASP.NET MVC 5.2, Web API 2.2 and Web Pages 3.2. In this release, a new feature named open complex type is introduced.

As the ODL v4 spec says: Open complex types are keyless named structured types consisting of a set of declared and dynamic (un-declared) properties.

Open complex types allow customers to add undeclared properties in the payload. And in the future they can use these properties in queries.

This blog is intended to provide a step by step guide for you to use the open complex types with Web API 2.2 for OData v4.0. Let’s get started.

BookStore Console Application

For simplicity, we’ll start by creating a simple console application named BookStore. In this console application, we’ll create an inline Web API OData Service to provide the basic functionality of a book store:

Query the metadata information of the book store.
Query the books from the book store.
Create new books into the book store.

Install the Nuget package

Once the empty console application has been created, the first thing is to install the Web API 2.2 for OData v4.0 Nuget package from Nuget.org. On the solution explorer, right click on “References” in the BookStore project and select “Manage Nuget Packages” in the Nuget Packages Management dialog. You should see:

In the above dialog, search and select “Microsoft ASP.NET Web API 2.2 for OData v4.0” package and click the install button to install the package into the console application. After being installed, the updated references are the follows:

Where:

Microsoft.OData.Core, Microsoft.OData.Edm, Microsoft.Spatial are the OData v4 Dlls.
System.Web.OData is the Web API 2.2 Dll.

And the packages.config has the following values:

Build the open complex type model

CLR type definition

For developers, it’s quite easy to define a model with open complex type. You should only add an extra property with IDictionary in your CLR class.

For the BookStore application, we'll create a couple of C# classes to build the model. First of all, add a new folder in the solution named “Models”. In the “Models” folder, add the following classes:

// CLR classes:

Where in OData terms:

Book is an entity type.
Press is an open complex type, because it has an extra property named DynamicProperties as IDictionary .
Address is an open-less complex type.
Category is an Enum type.

Note: The DynamicProperties property in the Press type is a container used to contain the dynamic properties. In WebAPI 2.2 for OData v4, a complex type with an IDictionary property will be built as an open complex type.

Inline model data

For simplicity we'll store all the data in memory using a BooksContext class which as you can see below has three books.

// BooksContext.cs

Where:

The Press of Book1 has no dynamic properties.
The Press of Book2 has two dynamic primitive properties.
The Press of Book3 has one dynamic complex property.

Build the Edm Model

Now it’s easy to build the Edm Model like this:

// GetEdmModel()

Note: The convention model builder won’t automatically add the Address type because there are no properties of the Book or the related Press classes that explicitly reference the Address class. However we plan on using Address in our open Press class, so we need to add it explicitly to the model.

Build the controller

It's time to build the controller to implement the OData routing. Add a new folder named "Controllers" in the BookStore project. In this folder, add a C# class named BooksController and derived it from ODataController. In this class, we'll add a private instance of BooksContext to play the DB role like this:

// BooksController.cs

Note: While this controller only supports Querying Books, Getting a single Book by key and Creating a new Book, you can easily add additional methods to implement the rest of OData’s supported interactions if needed.

Build the client

For simplicity, we'll build the client in the same console application. First, we change the Program class name to BookStoreApp class and use it to serve as our client. By adding the following method to create the instance of HttpClient:

// GetClient()

Query the metadata

For customers to use the OData service, they first need to query the metadata document. Here’s how you can do that:

// QueryMetadata()

The resulting metadata document is below. For a customer, he can find the complex type “Press” has an attribute named OpenType and its value is true, while the complex type Address doesn’t have such attribute. Most importantly, “Press” complex type has only THREE declared properties named “Name, Web, Category”. The customer doesn’t know anything about the “DynamicProperties” property, because this is merely an implementation detail.

// Metadata document

Query the entity with the dynamic properties

Customers can now retrieve a single entity (and it’s dynamic properties) like this:

// QueryEntity()

The payload of the entity with dynamic properties should be:

// Payload

Where, the customer can find out that the Press property of Book('978-1-107-63706-1’) has four properties (three declared properties and one dynamic property). The name of dynamic property is “Address” and its type is #BookStore.Address.

Create an entity with dynamic properties

The customer can post an entity with dynamic properties to the service. The code is similar to above. In this case the request looks like this:

POST ~/odata/Books

Content-Type: application/json

Content:

Summary

The open complex type feature included in Web API 2.2 for OData v4 provides a very easy way for customers to post their customized properties to the data service and allow them to be queried and retrieved in the future. We believe open complex type support is a really useful feature for modelling real world problems and in the future, after we add open entity type and the dynamic collection property support to the next release of Web API OData, it will be even better.

Thanks.

↧

Cumulative Update #11 for SQL Server 2012 SP1

July 21, 2014, 5:13 pm

≫ Next: Machine Learning and Text Analytics

≪ Previous: Open complex type step by step with Web API 2.2 for OData v4.0

Dear Customers, The 11 th cumulative update release for SQL Server 2012 SP1 is now available for download at the Microsoft Support site. Cumulative Update 11 contains all the SQL Server 2012 SP1 hotfixes which have been available since the initial...(read more)

↧

Machine Learning and Text Analytics

July 23, 2014, 9:00 am

≫ Next: Creating an administrative install of SSDT update for Visual Studio 2013

≪ Previous: Cumulative Update #11 for SQL Server 2012 SP1

The following post is from Dr. Ashok Chandra, Distinguished Scientist at Microsoft Research and Dhyanesh Narayanan, Program Manager at Microsoft Research

When I (Ashok) was a student at the Stanford Artificial Intelligence Laboratory in the 70’s, there was great optimism that human-level machine intelligence was just around the corner. Well, it is still just around the corner. But meanwhile computers are becoming more capable all the time, using machine learning (ML) technologies. So much so, that almost all the new products created in Microsoft now use some level of ML, for analyzing speech, data or text. In this post we focus largely on text.

As computers better understand natural language, new frontiers open up everywhere – improved user-interfaces for applications; better search engines; personal assistants like Cortana and Siri; and tools for figuring out what a given document is really about. For instance, a news website might enable a more engaging experience for its uses, if the individuals mentioned in those articles were algorithmically linked to Wikipedia (or some appropriate database), so the reader could easily obtain more information about those people. Furthermore, by leveraging additional signals from the text, one could also determine the salient entities (e.g. players, teams) that the article was talking about, as seen in Figure 1.

Figure 1 Motivating Scenario for Text Analytics

Text Analytics has been, and continues, as an area of active scientific research. After all, creating a semantic model of all human knowledge (represented as text) is no easy task. Early work, dating to the early 90’s, included Brill taggers [1] that determine parts-of-speech in sentences, and [2] gives just a hint of new work. Microsoft Research has been very active in creating ideas in this scientific field, but we go further in tailoring new science with pragmatic considerations to create production-level technologies.

In this blog post, we present a glimpse of how ML techniques can be leveraged for text analytics, using Named Entity Recognition (NER) as a reference point. As a platform that offers turnkey ML functionality, Microsoft Azure ML includes text analytics capabilities in general, and support for NER in particular – so we use that to make the connection from general concepts to specific design choices.

NER is the task of determining references in text to people, places, organizations, sports teams, etc. Let’s take a quick look at how we might solve this problem using a “supervised learning” approach.

Figure 2: Workflows for Named Entity Recognition

At Design Time or "learning time", the system uses training data to create a “model” of what is to be learned. The idea is for the system to generalize from a small set of examples to handle arbitrary new text.

The training data consists of human-annotated tags for the named entities to be learned. It might look something like this: “When Chris Bosh excels, Miami Heat becomes juggernaut”. The expectation is that, a model that learns from examples of this nature, will be trained to recognize Player entities and Team entities from new input text.

The effectiveness of the Design Time workflow hinges on the Feature Extraction phase – typically, the more diligently engineered features results in more powerful models. For instance, the local context associated with a word in a piece of text [say, the previous k words and next k words] is a strong feature that we as humans use to associate entities with words. For example, in the sentence “San Francisco beat the Cardinals in an intense match yesterday”, it is apparent from the context that the mention “San Francisco” refers to a sports team (i.e. the San Francisco Giants) rather than the city San Francisco. Capitalization is another useful feature that is often useful to recognize named entities such as People or Locations that occur in text.

Model Training is what ML is about, i.e. to produce a good model. It is typically a complex combination of the features selected. There are several ML techniques available, including Perceptron, Conditional Random Fields and more. The choice of technique depends on how accurate the model can become with limited training data, the speed of processing, and the number of different named entity types to be learned simultaneously. For instance, the Azure ML NER module supports three entity-types by default, namely People, Places, and Organizations.

The goal of the Run Time workflow is to take unlabeled input text and produce corresponding output text with entities recognized by the model that was created at Design Time. As one can observe, the Run Time workflow reuses the Feature Extraction module from the Design Time workflow – accordingly, if high throughput of entity recognition is necessary for an application, one has to provision relatively lightweight yet high-value features in the pipeline. As an illustrative example, the Azure ML NER module uses a small set of easy-to-compute features that are primarily based on local context, which also turn out to be very effective. Ambiguity during processing is often resolved using something like Viterbi decoding for assigning entity-labels to the sequence of input words.

It is important to realize that NER is just the beginning, but nevertheless an important first step towards capturing “knowledge” from raw text. This recent blog post describes how NER plus a set of related technologies were used to light up compelling experiences in the Bing Sports app - and the very same NER stack is available for you to use in Azure ML here. Beyond NER, general natural language parsing, linking and salience, sentiment analysis, fact extraction, etc. represent additional steps to enhance the user experience of applications built around content, these are additional techniques that can help you make your text "come alive".

We hope you enjoyed reading this post and look forward to your comments.

Ashok Chandra.
Follow my research here.

Dhyanesh Narayanan.
Follow my research here.

References

[1] Eric Brill, 1992, A simple rule-based part of speech tagger, Applied natural language processing (ANLC '92)
[2] Li Deng, Dong Yu, 2014, Deep Learning: Methods and Applications

↧

Creating an administrative install of SSDT update for Visual Studio 2013

July 23, 2014, 4:48 pm

≫ Next: Cumulative Update #1 for SQL Server 2012 SP2

≪ Previous: Machine Learning and Text Analytics

Since we are offering our VS2013 update via the Visual Studio update channel, there aren't any specific instructions on how to get the download if you need to create an administrative layout. You may need to do this if your firewall or proxy settings do not allow some computers to access the download center.

You can either choose the update from within visual studio on a machine that has internet access and instead of running the SSDTSetup.exe, download it locally; or you can use this fwlink to get to the download: http://go.microsoft.com/fwlink/?LinkID=393521

*Disclaimer - this FWLink may change in the future, so while it works for the July update, that doesn't mean it will work for upcoming updates

Once you have the SSDTSetup.exe file downloaded locally, you can run the administrative layout command on a computer with internet access to create a copy that you can burn or place on a share

Run the following command using an administrator command prompt (cmd.exe run as administrator):

SSDTSetup.exe /layout

Where is the location you wish to create the administrative install point (e.g on a USB drive, a LAN drive or other accessible location).
To use the install point once created on a computer without internet access, simply run SSDTSetup.exe from the location with no arguments. This will use the install point rather than attempting to download new copies of the relevant chained components.

↧

Cumulative Update #1 for SQL Server 2012 SP2

July 23, 2014, 4:59 pm

≫ Next: Using Unsigned Integers in OData

≪ Previous: Creating an administrative install of SSDT update for Visual Studio 2013

Dear Customers, The 1 st cumulative update release for SQL Server 2012 SP2 is now available for download at the Microsoft Support site. Cumulative Update 1 contains all hotfixes in the initial release of SQL Server 2012 SP2 as well as all hotfixes...(read more)

↧

Using Unsigned Integers in OData

July 23, 2014, 8:22 pm

≫ Next: Get started backing up to the cloud with SQL Server Backup to Microsoft Azure Tool

≪ Previous: Cumulative Update #1 for SQL Server 2012 SP2

Unsigned integers can be useful in many ways, such as representing data sizes, resource handles and so on. Though OData V4 only supports signed integer natively, the protocol offers a flexible way called type definition to allow users to ‘define’ unsigned integer types themselves.

As an example, we know that any UInt16 integer can be represented by the primitive type Edm.Int32. Thus by type definition, we can define a new type named MyNamespace.UInt16 whose underlying type is Edm.Int32. By doing so, we can store and serialize UInt16 integers as Edm.Int32 ones. There are three advantages of leveraging type definition here:

(1) Prevent breaking clients who conform to the protocol (to recognize type definitions) but are unaware of unsigned integer types.

(2) Give the underlying type a different name that is meaningful to the context.

(3) Enable the flexibility to change the underlying type without breaking the ‘unsigned integer’ type semantics.

From version 6.5.0, our OData library starts to provide built-in support for unsigned integer types (including UInt16, UInt32 and UInt64 for now) as a protocol extension. Generally a user would have to write very little code to gain a workable model with default implementation of unsigned integers. Meanwhile, the library is flexible that users are allowed to use their customized implementation as well.

Introducing Type Definition

Before diving into unsigned integers, let’s first take a look at how to define and use type definitions in OData. Suppose we want to define a new type Height whose underlying type is Edm.Double and use it to define a property MyHeight in an entity type Person (it is the same for complex type).

We can write the following CSDL for the model:

Or write the equivalent code in C#:

The following code demonstrates the creation of a Person entry:

You can serialize/deserialize the above entry to/from payload with the above model as usual.

The underlying type of a type definition must be a primitive type rather than an entity type, a complex type, an enumeration type, or even a type definition. However, two type definitions with the same underlying type along with the underlying primitive type itself are treated assignable from (or equivalent to) each other so all the expressions below evaluate to true:

This means the three types are type-comparable and exchangeable to each other in model definition, serialization and deserialization. Take entry deserialization for example, you can post Person entities specifying the type of MyHeight as SomeNamespace.Length or Edm.Double, which should both work perfectly.

Using Unsigned Integers

Unsigned integers are supported based on type definition. Say if a user wants to use UInt32 as property type in his model, a corresponding type definition should be added to the model so that a compliant client can recognize the UInt32type.

This is done automatically if using the default implementation of unsigned integers of our library. Or you can add your own type definition if you want to override the underlying type.

But this is just about model definition. The next thing to consider should be how to serialize/deserialize entries with unsigned integers. Suppose we have the following Employeeentry:

Since OData V4 only supports signed integers, we have to convert UInt32 value to the underlying type Int64 known to the protocol before serializing it to the payload. Thus we may obtain an entry payload like:

If we want to deserialize the above payload to an Employee entry, we first get an Int64 value of StockQuantity directly from the payload. Then we need to convert the value from the underlying type Int64 to UInt32.

These two kinds of conversion are defined by the interface IPrimitiveValueConverter and its implementing classes:

Each model has an internal dictionary that maps each type definition within the model to a primitive value converter, which converts value between the user type (e.g., UInt32 here) and its underlying type (e.g., Int64 here). The library also offers a DefaultPrimitiveValueConverter used to handle the default conversions of unsigned integers. If a type definition is not associated with a converter in the model, the library uses the internal PassThroughPrimitiveValueConverter to directly pass though the value without conversion.

Default Implementation

The default implementation of unsigned integers enables users to write the least code to support unsigned integers in their models. It consists of two parts: (1).default type definitions of unsigned integer types; (2).default primitive value converter for unsigned integers.

For the first part, the default type definitions are listed below:

Type Definition Name	Default Underlying Type
SomeNamspace.UInt16	Edm.Int32
SomeNamspace.UInt32	Edm.Int64
SomeNamspace.UInt64	Edm.Decimal

For the second part, the default conversions of unsigned integers are listed below:

User Type	Underlying Type	Type Definition
System.UInt16	System.Int32	SomeNamespace.UInt16
System.UInt32	System.Int64	SomeNamespace.UInt32
System.UInt64	System.Decimal	SomeNamespace.UInt64

The following example illustrates the usage of the default implementation. Suppose we want to create an entity type Product with Quantity of UInt16, StockQuantity of UInt32 and LifeTimeSeconds of UInt64, we can simply write the following code:

You can then serialize/deserialize the entry with the model as usual and the default primitive value converter will automatically handle all the underlying conversions.

User Customization

In case you want to override the underlying type and the conversions of an unsigned integer type, you can define your own type definition and primitive value converter.

Say if you want to use Edm.String as the underlying type of UInt64, you first need to create a new type definition along with the types that need it.

Secondly define a custom converter between UInt64 and String.

Thirdly associate a MyConverter instance with that type definition in the model.

Then you will be able to serialize an entry with UInt64:

You may get the payload like:

If you want to get the corresponding converter for a type definition, you can do as follows:

Querying Unsigned Integer Properties

You can query unsigned integer properties just as querying other primitive ones. Regarding the above sample, the following queries are supported:

For query options, support of custom unsigned integer types is NOT guaranteed. Currently only unsigned integers of default implementation are well supported. Here are a few examples:

↧

Get started backing up to the cloud with SQL Server Backup to Microsoft Azure Tool

July 24, 2014, 9:00 am

≫ Next: For proven in-memory technology without costly add-ons, migrate your Oracle databases to SQL Server 2014

≪ Previous: Using Unsigned Integers in OData

If you’re considering backing up your SQL Server database to the cloud, there are many compelling reasons. Not only will you have an offsite copy of your data for business continuity and disaster recovery purposes, but you can save on CAPEX by using Microsoft Azure for cost-effective storage. And now, you can choose to backup to Microsoft Azure even for databases that aren’t running the latest version of SQL Server – creating a consistent backup strategy across your database environment.

SQL Server has these tools and features to help you back up to the cloud:

In SQL Server 2014, Managed Backup to Microsoft Azure manages your backup to Microsoft Azure, setting backup frequency based on data activity. It is available inside the SQL Server Management Studio in SQL Server 2014.
In SQL Server 2012 and 2014, Backup to URL provides backup to Microsoft Azure using T-SQL and PowerShell scripting.
For prior versions, SQL Server Backup to Microsoft Azure Tool enables you to back up to the cloud all supported versions of SQL Server, including older ones. It can also be used to provide encryption and compression for your backups – even for versions of SQL Server that don’t support these functions natively.

To show you how easy it is to get started with SQL Server Backup to Microsoft Azure Tool, we’ve outlined the four simple steps you need to follow:

Prerequisites: Microsoft Azure subscription and a Microsoft Azure Storage Account. You can log in to the Microsoft Azure Management Portal using your Microsoft account. In addition, you will need to create a Microsoft Azure Blob Storage Container: SQL Server uses the Microsoft Azure Blob storage service and stores the backups as blobs.

Step 1: Download the SQL Server Backup to Microsoft Azure Tool, which is available on the Microsoft Download Center.

Step 2:Install the tool. From the download page, download the MSI (x86/x64) to your local machine that has the SQL Server Instances installed, or to a local share with access to the Internet. Use the MSI to install the tool on your production machines. Double click to start the installation.

Step 3:Create your rules. Start the Microsoft SQL Server Backup to Microsoft Azure Tool Service by running SQLBackup2Azure.exe. Going through the wizard to setup the rules allows the program to process the backup files that should be encrypted, compressed or uploaded to Azure storage. The Tool does not do job scheduling or error tracking, so you should continue to use SQL Server Management Studio for this functionality.

On the Rules page, click Add to create a new rule. This will launch a three screen rule entry wizard.

The rule will tell the Tool what local folder to watch for backup file creation. You must also specify the file name pattern that this rule should apply to.

To store the backup in Microsoft Azure Storage, you must specify the name of the account, the storage access key, and the name of the container. You can retrieve the name of the storage account and the access key information by logging into the Microsoft Azure management portal.

At this time, you can also specify whether or not you wish to have the backup files encrypted or compressed.

Once you have created one or more rules, you will see the existing rules and the option to Modify or Delete the rule.

Step 4: Restore a Database from a Backup Taken with SQL Server Backup to Microsoft Azure Tool in place. The SQL Server Backup to Microsoft Azure Tool creates a ‘stub’ file with some metadata to use during restore. Use this file like your regular backup file when you wish to restore a database. SQL Server uses the metadata from this file and the backup on Microsoft Azure storage to complete the restore.

If the stub file is ever deleted, you can recover a copy of it from the Microsoft Azure storage container in which the backups are stored. Place the stub file into a folder on the local machine where the Tool is configured to detect and upload backup files.

That’s all it takes! Now you’re up and running with Backup to and Restore from Microsoft Azure.

To learn more about why to back up to the cloud, join Forrester Research analyst Noel Yuhanna in a webinar on Database Cloud Backup and Disaster Recovery. You’ll find out why enterprises should make database cloud backup and DR part of their enterprise database strategy.

The webinar takes place on Tuesday, 7/29 at 9 AM Pacific time; register now.

↧

For proven in-memory technology without costly add-ons, migrate your Oracle databases to SQL Server 2014

July 24, 2014, 12:00 pm

≫ Next: Clustered Column Store Index: Concurrency and Isolation Level

≪ Previous: Get started backing up to the cloud with SQL Server Backup to Microsoft Azure Tool

Today, we are making available a new version of SQL Server Migration Assistant (SSMA), a free tool to help customers migrate their existing Oracle databases to SQL Server 2014. Microsoft released SQL Server 2014 earlier this year, after months of customer testing, with features such as In-Memory OLTP to speed up transaction performance, In-Memory Columnstore to speed up query performance, and other great hybrid cloud features such as backup to cloud directly from SQL Server Management Studio and the ability to utilize Azure as a disaster recovery site using SQL Server 2014 AlwaysOn.

Available now, the SQL Server Migration Assistant version 6.0 for Oracle databases, greatly simplifies the database migration process from Oracle databases to SQL Server. SSMA automates all aspects of migration including migration assessment analysis, schema and SQL statement conversion, data migration as well as migration testing to reduce cost and reduce risk of database migration projects. Moreover, SSMA version 6.0 for Oracle databases brings additional features such as automatically moving Oracle tables into SQL Server 2014 in-memory tables, the ability to process 10,000 Oracle objects in a single migration, and increased performance in database migration and report generation.

Many customers have realized the benefits of migrating their database to SQL Server using previous versions of SSMA. For example:

Dollar Thrifty Automotive Group migrated their rental car rate engine and saves $135,000 annually
Sumitomo Rubber Industries migrated their 21 mission critical systems from an Oracle database to SQL Server and cut software licensing costs by half.
G&T Conveyor saves 83 percent on ERP by moving from an Oracle database to SQL Server

SSMA for Oracle is designed to support migration from Oracle 9i or later version to all editions of SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012 and SQL Server 2014. The SSMA product team is also available to answer your questions and provide technical support at ssmahelp@microsoft.com

To download SSMA for Oracle, go here. To evaluate SQL Server 2014, go here.

↧

Clustered Column Store Index: Concurrency and Isolation Level

July 26, 2014, 6:48 pm

≫ Next: Clustered Column Store Index: Concurrency with INSERT Operations

≪ Previous: For proven in-memory technology without costly add-ons, migrate your Oracle databases to SQL Server 2014

Clustered Column Store and Concurrency

The clustered column store index (CCI) has been designed for Data Warehouse scenario which primarily involves

Write once and read multiple times – CCI is optimized for query performance. It give order of magnitude better query performance by compressing the data in columnar format, processing set of row in batches and by bringing only the columns that are required by the query.
Bulk data import and trickle data load – Insert Operation

While it supports UPDATE/DELETE operations but it is not optimized for large number of these operation. In fact, concurrent DELETE/UPDATE can cause blocking in some cases and can lead to multiple delta row-groups.To understand the concurrency model, there is a new lock resource, called ROWGROUP. Let us see how locks are taken for different scenarios. I will walk through concurrency using a series of blogs starting with transaction isolation levels

Transaction Isolation levels Supported

Read Uncommitted–While this is ok for most DW queries, and in fact, queries running on PDW appliance access CCI under read uncommitted to avoid blocking with concurrent DML operations. This is how CCI is queried in Analytics Platform System, a re-branding of PDW. Please refer to the http://www.microsoft.com/en-us/server-cloud/products/analytics-platform-system/default.aspx#fbid=CRIMcFvfkD2
Read Committed– Only lock based implementation of read committed isolation is supported which can get blocked with concurrent DML transactions.

If RCSI is enabled on the database containing one or more tables with CCI, all tables other than CCI can be accessed with non-blocking semantics under read committed isolation level but not for CCI

Example:

select is_read_committed_snapshot_on, snapshot_isolation_state_desc,snapshot_isolation_state

from sys.databases where name='AdventureWorksDW2012'

CREATE TABLE [dbo].[T_ACCOUNT](

[accountkey] [int] IDENTITY(1,1) NOT NULL,

[accountdescription] [nvarchar](50) NULL

) ON [PRIMARY]

-- create a CCI

CREATE CLUSTERED INDEX ACCOUNT_CI ON T_ACCOUNT (ACCOUNTKEY)

Session-1

use AdventureWorksDW2012

-- Do a DML transaction on CCI but don't commit

begin tran

insert into T_ACCOUNT (accountdescription )

values ('value-1');

Session-2

-- query the table under read committed in a different session

set transaction isolation level read committed

select * from t_account

You will see CCI query is blocked on session-1 as shown using the query below

select

request_session_id as spid,

resource_type as rt,

resource_database_id as rdb,

(case resource_type

WHEN 'OBJECT' then object_name(resource_associated_entity_id)

WHEN 'DATABASE' then ' '

ELSE (select object_name(object_id)

from sys.partitions

where hobt_id=resource_associated_entity_id)

END) as objname,

resource_description as rd,

request_mode as rm,

request_status as rs

from sys.dm_tran_locks

Even though the database is using default non-blocking read committed isolation level using row versioning, the CCI is accessed using lock based implementation of read committed.

Snapshot Isolation– It can be enabled for the database containing CCI. Any disk-based table other than CCI can be accessed under Snapshot Isolation but access to CCI is disallowed and it generates the following error

Msg 35371, Level 16, State 1, Line 26

SNAPSHOT isolation level is not supported on a table which has a clustered columnstore index.

Repeatable Read– Supported in CCI

set transaction isolation level repeatable read

begin tran

select * from t_account

Here are the locks. Note it takes S lock on all rowgroups as we are doing the full table scan

Serializable – Supported in CCI

set transaction isolation level serializable

begin tran

select * from t_account

Here are the locks. Note it takes S lock at the table level to guarantee serializable Isolation level

In the next blog, I will discuss locks taken when inserting rows into CCI

Thanks

Sunil Agarwal

↧

Clustered Column Store Index: Concurrency with INSERT Operations

July 27, 2014, 9:44 am

≫ Next: Clustered Column Store Index: Bulk Loading the Data

≪ Previous: Clustered Column Store Index: Concurrency and Isolation Level

Clustered Column Store: Insert Operations

As described in the blog http://blogs.msdn.com/b/sqlserverstorageengine/archive/2014/07/27/clustered-column-store-index-concurrency-and-isolation-level.aspx , the clustered column store index has been optimized for typical DW scenario supporting nightly or trickle data load with fast query performance. Multiple inserts can load the data in parallel concurrently while DW queries are being run in read uncommitted transaction isolation level.

This blog describes locking behavior when data is inserted concurrently. For the scenarios below, we will use the following table

CREATE TABLE [dbo].[T_ACCOUNT](

[accountkey] [int] IDENTITY(1,1) NOT NULL,

[accountdescription] [nvarchar](50) NULL

) ON [PRIMARY]

-- create a CCI

CREATE CLUSTERED INDEX ACCOUNT_CI ON T_ACCOUNT (ACCOUNTKEY)

Insert Operations

Let us insert 1 row and see the locks taken. Note, we did not commit the transaction

begin tran

insert into T_ACCOUNT (accountdescription ) values ('row-1');

Here are the locks. Note, the new row is inserted into delta rowgroup which is organized as a btree in traditional row storage format. There is a new resource ROWGROUP in the context of CCI. The current transaction has taken IX lock on the ROWGROUP

Now, let us insert another row in another session as follows and look at the lock

begin tran

insert into T_ACCOUNT (accountdescription ) values ('row-2');

Note, that the second transaction in session-55, also inserted the row into the same rowgroup. In other words, concurrent inserts can load the data into same rowgroup without blocking each other.

In summary, the insert into CCI does not block other concurrent inserts and concurrent inserts load data into the same delta rowgorup. In the next blog, we will look into BulkLoad Operations

Thanks

Sunil Agarwal

↧

Clustered Column Store Index: Bulk Loading the Data

July 27, 2014, 6:49 pm

≫ Next: Machine Learning Summer School at CMU

≪ Previous: Clustered Column Store Index: Concurrency with INSERT Operations

Clustered Column Store: Bulk Load

As described in the blog http://blogs.msdn.com/b/sqlserverstorageengine/archive/2014/07/27/clustered-column-store-index-concurrency-and-isolation-level.aspx, the clustered column store index has been optimized for typical DW scenario supporting nightly or trickle data load with fast query performance. Multiple inserts can load the data in parallel concurrently while DW queries are being run in read uncommitted transaction isolation level.

This blog describes locking behavior when data is inserted through Bulk Load command. Here is the table we will use in the example

Create table t_bulkload (

accountkey int not null,

accountdescription nvarchar (50),

accounttype nvarchar(50),

AccountCodeAlternatekey int)

Bulk loading into CCI

A more common scenario is to bulk import data into CCI. The bulk import loads the data into delta store if the batch size is < 100K rows otherwise the rows are directly loaded into a compressed row group. Let us walk through an example of illustrate Bulk Load

-- Let us prepare the data

-- insert 110K rows into a regular table

begin tran

declare @i int = 0

while (@i < 110000)

begin

insert into t_bulkload values (@i, 'description', 'dummy-accounttype', @i*2)

set @i = @i + 1

end

commit

-- bcp out the data... run the following command in command window

bcp adventureworksDW2012..t_bulkload out c:\temp\t_bulkoad.dat -c -T

As the next step, let us truncate the table t_bulkload and create a clustered columnstore index on it. At this time, there are no rowgroups as the table has no rows

--truncate the table

Truncate table t_bulkload

-- convert row clustered index into clustered columnstore index

CREATE CLUSTERED COLUMNSTORE index t_bulkload_cci on t_bulkload

Now, we will bulk import the data with a batchsize > 102400 as follows. Notice, I am running this command under a transaction. This will help us identify us to see what locks are taken

-- now bulkload the data

begin tran

bulk insert t_bulkload

FROM 'c:\temp\t_bulkoad.dat'

WITH

(

BATCHSIZE = 103000

)

-- show rowgroups

select * from sys.column_store_row_groups where object_id = object_id('t_bulkload')

The output below shows that there are two row groups created. . First row group with row_group_id=0 is ‘compressed’ with 103000 rows. This is because the batchsize >= 102400, the SQL Server will directly compress this row group. This is a useful scenario because Bulk Load is a common scenario to load the data into a Data Warehouse. With directly compressing the rows, SQL Server can minimize logging (I will blog transaction logging into CCI later) as the rows do not go through delta row group. Also, there is no need for tuple mover to move the data. The second batch had only 7000 row because we ran out of rows in the data file (remember, we the data file had only 110000 rows) and this set of rows are inserted into delta row group ‘1’. Note, that the row group is still marked ‘OPEN’ meaning that it is not closed. It will eventually get closed and eventually compressed by the background ‘tuple mover’ when the number of rows hit 1 million row mark.

Let us now look at the locks. Here is the output. Note that for we have X lock on both the delta row group and compressed row group. Taking lock at row group level minimizes the locking overhead.

You may wonder what will happen if we insert a row from another session. Let us just do that

begin tran

insert into t_bulkload values (-1, 'single row', 'single row', -1)

Now let us look the row groups. You will note that the new row actually was inserted into new delta row group as hi-lighted below because the Bulk Insert transaction holds an X lock on row group = 1. SQL Server allows the INSERT operation to succeed instead of blocking it because INSERT is a common operation for DW therefore maximum concurrency is needed. The down side is that now you have two open delta row groups. The future inserts can go into any of these row groups so in the worst case you may have 2 million rows in the delta row groups before they get compressed. This will impact the DW query performance because part of the query accessing rows from delta row group is not as efficient.

Hope this blog clarifies how data is bulk imported into clustered columnstore index. In most case, there are no issues if you are loading large amount of data. In the worst case, I expect the number of delta row groups will be same as degree of concurrency for Bulk Import Operations.

Thanks

Sunil Agarwal

↧

Machine Learning Summer School at CMU

July 29, 2014, 9:00 am

≫ Next: Transitioning from SMP to MPP, the why and the how

≪ Previous: Clustered Column Store Index: Bulk Loading the Data

This blog post is authored by Markus Weimer, Principal Scientist at Microsoft

Eight years ago, I made the 24-hour journey from my college town of Darmstadt, Germany, to Canberra, Australia to attend a Machine Learning Summer School (MLSS) there. Why, you might ask? At the time, I didn’t have a good answer myself, to be quite honest. Well, at least nothing beyond my love for the land down under and a suggestion from my PhD advisor that it would be a good idea to attend. In hindsight, he was very right. During the two weeks I was in Canberra, I made many new friends and learned things that changed the course of my PhD research and ultimately set me onto a path that led me to the US and indeed to my current position at Microsoft.

I was reminded of this trip when earlier this year, I received an email from Carnegie Mellon University (CMU) professors Alex Smola and Zico Kolter to teach a hands-on class on REEF, my current project at Microsoft, at this year’s Machine Learning Summer School at CMU in Pittsburgh. This seemed like a unique way for me to give back to the student community so I jumped on the opportunity. REEF is a framework for writing distributed applications on top of Apache Hadoop 2 clusters. In order to give the students an opportunity to experience a real Big Data environment, Microsoft sponsored a 1000 core Azure HDInsight cluster for the duration of the class. HDInsight is Microsoft’s fully-managed Hadoop-on-Azure offering.

In the course of five lectures during the week, I walked students through the basics of the Big Data cloud environment, introduced REEF and Azure HDInsight and discussed what we call “Resource Aware Machine Learning”. The main idea behind the latter being that systems events such as adding and removing machines from a distributed application have implications for machine learning (ML). For instance, losing a machine due to hardware failure in the middle of the computation leads to a lost partition of data. That in turn leads to estimates computed on that data to have higher variance. And variance of estimators is a first class object in ML. Hence, we just might find more efficient ways to deal with machine failure than to require the underlying system to handle them.

Summer School attendees and lecturers after the farewell dinner. Photo by Alex Smola

This decidedly hands-on-keyboard lecture was embedded into a full schedule for the students with lectures from industry and academia alike, focusing on such diverse topics as new theoretical foundations for the learning of factor models to the practical lessons from operating internet-scale recommender systems.

I sure hope that the Machine Learning Summer School at CMU this year will have the same profound impact on at least a few students as the one from many years ago had on my own research and career. For those of you who wish to learn more about the CMU MLSS, I highly recommend the lecture recordings posted online by the organizers.

Markus Weimer
Follow me on twitter. Follow my blog.

↧

Transitioning from SMP to MPP, the why and the how

July 30, 2014, 9:00 am

≫ Next: PASS Summit 2014: Inside the World’s Largest Gathering of SQL Server and BI Professionals

≪ Previous: Machine Learning Summer School at CMU

This blog post was authored by: Sahaj Saini, PM on the Microsoft Analytics Platform System (APS) team.

In this blog post, we’ll provide a quick overview of Symmetric Multi-Processing (SMP) vs. Massively Parallel Processing (MPP) systems, how to identify triggers for migrating from SMP to MPP, key considerations when moving to Microsoft Analytics Platform System (APS), and a discussion about how to take advantage of the power of an MPP solution such as APS.

Let us begin with a scenario. Emma is the Database Administrator at Adventure Works Cycles, a bicycle manufacturing company. At Adventure Works, Emma and her team are using traditional SQL Server SMP as their data warehousing solution. The company has been growing rapidly and with growing competition in the bicycle industry, the business analysts at Adventure Works Cycles would like quicker insight into their data. Emma is now facing the following challenges with the SMP deployment –

High Data Volume and Data Growth: With increasing sales and a growing customer base, the data volume has grown rapidly to cross 10 TB.
Longer Data Loading/ETL times: With the need to produce daily reports to management, Emma finds the current ETL speed inadequate to intake and process the increasing quantity of data flowing from other OLTP and non-relational systems.
Slow Query Execution: Query execution times are slowing down due to the increase of data and it is becoming increasingly difficult to generate insights for daily reporting in a timely manner.
Long Cube Processing Time: With the current cube processing time, it is difficult to meet the real-time reporting needs of the company.

In order to overcome these challenges, Emma and her team evaluate the purchase of a larger, expensive and more powerful set of server and storage hardware to their datacenter. This approach would solve their problem but only for the short-term as the data growth is expected to explode in the next 12 months. With data growth that Adventure Works is expecting to see, even the bigger and more powerful SMP solutions would hit a wall very quickly. Emma would like to see a solution that scales as their data needs grow.

What’s the difference between SMP and MPP?

Before we jump into solving Emma’s problems, let’s quickly define what SMP and MPP are. Symmetric Multi-Processing (SMP) is a tightly coupled multiprocessor system where processors share resources – single instances of the Operating System (OS), memory, I/O devices and connected using a common bus. SMP is the primary parallel architecture employed in servers and is depicted in the following image.

Massively Parallel Processing (MPP) is the coordinated processing of a single task by multiple processors, each processor using its own OS and memory and communicating with each other using some form of messaging interface. MPP can be setup with a shared nothing or shared disk architecture.

In a shared nothing architecture, there is no single point of contention across the system and nodes do not share memory or disk storage. Data is horizontally partitioned across nodes, such that each node has a subset of rows from each table in the database. Each node then processes only the rows on its own disks. Systems based on this architecture can achieve massive scale as there is no single bottleneck to slow down the system. This is what Emma is looking for.

MPP with shared-nothing architecture is depicted in the following image.

Microsoft Parallel Data Warehouse (PDW) running on a Microsoft Analytics Platform System appliance is implemented as an MPP shared-nothing architecture. It consists of one control node and storage attached compute nodes inter-connected by Ethernet and Infiniband. The control node hosts the PDW engine – the brains of the MPP system – that creates parallel query plans, co-ordinates query execution on compute nodes, and data aggregation across the entire appliance. All nodes, including control and compute, host a Data Movement Service (DMS) to transfer data between nodes.

For more details on PDW architecture, you can read the Architecture of the Microsoft Analytics Platform System post.

Transitioning to MPP

To realize the value offered by MPP, Emma and her team purchase a Microsoft APS appliance and begin transitioning to MPP. Let’s take a look at how they adapt their solution to take full advantage of APS’s shared nothing MPP architecture.

Table Design

As previously mentioned, APS is based on a shared nothing MPP architecture which means that nodes are self-sufficient and do not share memory or disks. The architecture, therefore, requires you to distribute your large tables across nodes to get the benefits of the massively parallel processing. APS allows the definition of a table as either distributed or replicated. The decision to choose one versus the other depends on the volume of data and the need for access to all of the data on a single node.

Distributed Tables

A distributed table is one where row data within the table is distributed across the nodes within the appliance to allow for massive scale. Each row ends up in a one distribution in one compute node as depicted by the image below.

To take advantage of the distributed nature of APS, Emma modifies the large tables, typically Fact and large dimension tables, to be distributed in APS as follows:

CREATE TABLE [dbo].[FactInternetSales]
(
  [ProductKey] [int] NOT NULL,
  [OrderDateKey] [int] NOT NULL,
  .
  .
  [ShipDate] [datetime] NULL
) 
WITH
(
  DISTRIBUTION = HASH(ProductKey),
CLUSTERED COLUMNSTORE INDEX
);

As you can see, this is a typical DDL statement for table creation with a minor addition for distributed tables. Tables are distributed by a deterministic hash function applied to the Distribution Column chosen for that table. Emma chooses Product Key as the distribution column in the FactInternetSales table because of the high cardinality and absence of skew, therefore distributing the table evenly across nodes.

Replicated Tables

If all tables were distributed, however, it would require a great deal of data movement between nodes before performing join operations for all operations. Therefore, for smaller dimension tables such as language, countries etc. it makes sense to replicate the entire table on each compute node. That is to say, the benefits of enabling local join operations with these tables outweigh the cost of extra storage consumed. A replicated table is one that is replicated across all compute nodes as depicted below.

Emma designs the small tables, typically dimension tables, to be replicated as follows:

 CREATE TABLE [dbo].[DimDate](
  [DateKey] [int] NOT NULL,
  .
  .
  [SpanishDayNameOfWeek] [nvarchar](10) NOT NULL,
)
WITH
(
CLUSTERED COLUMNSTORE INDEX
);

By appropriately designing distributed and replicated tables, Emma aligns her solution with common MPP design best practices and enables efficient processing of high volumes of data. For example, a query against 100 billion rows in a SQL Server SMP environment would require the processing of all of the data in a single execution space. With MPP, the work is spread across many nodes to break the problem into more manageable and easier ways to execute tasks. In a four node appliance (see the picture above), each node is only asked to process roughly 25 billion rows – a much quicker task. As a result, Emma observes significant improvements to the query execution time and her business can now make better decisions, faster. Additionally, Emma can grow the data warehouse to anywhere from a few terabytes to over 6 petabytes of data in by adding “scale units” to APS.

Data Loading

With SQL Server SMP, Emma and her team were using ETL processes via a set of SSIS packages to load data into the data warehouse – (1) Extracting data from the OLTP and other systems; (2) Transforming the data into dimensional format; and (3) Loading the data to target dimension or fact tables in the Data Warehouse. With increasing volumes of data, the SSIS sever in the middle becomes a bottleneck while performing transformations, resulting in slow data loading.

With APS, Emma and her team can use ELT instead, to Extract the data from the OLTP and other systems and Load it to a staging location on APS. Then, the data can be Transformed into dimensional format not with SSIS but with the APS Engine utilizing the distributed nature of the appliance and the power of parallel processing. In a 4-node appliance, four servers would be doing the transformations on subsets of data versus the single node SSIS server.

This parallel processing results in a significant boost in data loading performance. Emma can then use the Create Table As Select (CTAS) statement to create the table from the staging table as follows.

CREATE TABLE [dbo].[DimCustomer] 
WITH
(
  CLUSTERED COLUMN INDEX,
  DISTRIBUTION = HASH (CustomerKey)
)
AS
SELECT * FROM [staging].[DimCustomer];

By switching to an ELT process, Emma utilizes the parallel processing power of APS to see performance gains in data loading.

In conclusion, Emma and her team have found answers to their SMP woes with MPP. They can now feel confident handling the data volume and growth at Adventure Works with the ability to scale the data warehouse as needed. With ELT and the power of parallel processing in APS, they can load data into APS faster and within the expected time-window. And by aligning with APS’s MPP design, they can achieve breakthrough query performance, allowing for real-time reporting and insight into their data.

Visit the Analytics Platform System page to access more resources including: datasheet, video, solution brief, and more..

To learn more about migration from SQL Server to the Analytics Platform System

↧

PASS Summit 2014: Inside the World’s Largest Gathering of SQL Server and BI Professionals

July 31, 2014, 9:00 am

≫ Next: Table of Contents: PASS Summit

≪ Previous: Transitioning from SMP to MPP, the why and the how

PASS VP of Marketing Denise McInerney– a SQL Server MVP and Data Engineer at Intuit – began her career as a SQL Server DBA in 1998 and attended her first PASS Summit in 2002. The SQL Server Team caught up with her ahead of this year’s event, returning to Seattle, WA, Nov. 4-7, to see what she’s looking forward to at the world’s largest conference for SQL Server and BI professionals.

For those who’ve never attended or who’ve been away for a while, what is PASS Summit?
PASS Summit is the world’s largest gathering of Microsoft SQL Server and BI professionals. Organized by and for the community, PASS Summit delivers the most technical sessions, the largest number of attendees, the best networking, and the highest-rated sessions and speakers of any SQL Server event.

We like to think of PASS Summit as the annual reunion for the #sqlfamily. With over 200 technical sessions and 70+ hours of networking opportunities with MVPs, experts and peers, it’s 3 focused days of SQL Server. You can take hands-on workshops, attend Chalk Talks with the experts, and get the answers you need right away at the SQL Server Clinic, staffed by the Microsoft CSS and SQLCAT experts who build and support the features you use every day. Plus, you can join us early for 2 days of pre-conference sessions with top industry experts and explore the whole range of SQL Server solutions and services under one roof in the PASS Summit Exhibit Hall.

Nowhere else will you find over 5,000 passionate SQL Server and BI professionals from 50+ countries and 2,000 different companies connecting, sharing, and learning how to take their SQL Server skills to the next level.

What’s on tap this year as far as sessions?
We’ve announced a record 160+ incredible community sessions across 5 topic tracks: Application and Database Development, BI Information Delivery, BI Platform Architecture, Development and Administration; Enterprise Database Administration and Deployment, and Professional Development. And watch for over 60 sessions from Microsoft’s top experts to be added to the lineup in early September.

You can search by speaker, track, session skill level, or session type – from 10-minute Lightning Talks, to 75-minute General Sessions, to 3-hour Half-Day Sessions and our full-day pre-conference workshops.

And with this year’s new Learning Paths, we’ve made it even easier to find the sessions you’re most interested in. Just use our 9 Learning Path filters to slice and dice the lineup by everything from Beginner sessions to Big Data, Cloud, Hardware Virtualization, and Power BI sessions to SQL Server 2014, High Availability/Disaster Recovery, Performance, and Security sessions.

Networking is at the heart of PASS Summit – what opportunities do you have for attendees to connect with each other?
PASS Summit is all about meeting and talking with people, sharing issues and solutions, and gaining knowledge that will make you a better SQL Server professional. Breakfasts, lunches, and evening receptions are all included and are designed to offer dedicated networking opportunities. And don't underestimate the value of hallway chats and the ability to talk to speakers after their sessions, during lunches and breaks, and at the networking events.

We have special networking activities for first-time attendees, for people interested in the same technical topics at our Birds of a Feather luncheon, and at our popular annual Women in Technology luncheon, which connects 600+ attendees interested in advancing role of women in STEM fields. Plus, our Community Zone is THE place to hang out with fellow attendees and community leaders and learn how to stay involved year-round.

You mentioned the networking events for first-time attendees. With everything going on at Summit, how can new attendees get the most out of their experience?
Our First-Timers Program takes the hard work out of conference prep and is designed specifically to help new attendees make the most of their time at Summit. We connect first-timers with conference alumni, take them inside the week with community webinars, help them sharpen their networking skills through fun onsite workshops, and share inside advice during our First Timers orientation meeting.

In addition, in our “Get to Know Your Community Sessions,” longtime PASS members share how to get involved with PASS and the worldwide #sqlfamily, including encouraging those new to PASS to connect with their local SQL Server communities through PASS Chapters and continue their learning through Virtual Chapters, SQLSaturdays, and other free channels.

How can you learn more about sessions and the overall PASS Summit experience?
A great way to get a taste of Summit is by watching PASS Summit 2013 sessions, interviews, and more on PASStv. You can also check out the best of last year’s Community blogs.

Plus, stay tuned for 24 Hours of PASS: Summit Preview Edition on September 9 to get a free sneak peek at some of the top sessions and speakers coming to PASS Summit this year. Make sure you follow us on Twitter at @PASS24HOP / #pass24hop for the latest updates on these 24 back-to-back webinars.

Where can you register for PASS Summit?
To register, just go to Register Now– and remember to take advantage of the $150 discount code from your local or Virtual PASS Chapter. We also have a great group discount for companies sending 5 or more employees. And don’t forget to purchase the session recordings for year-round learning on all aspects of SQL Server.

Once you get a taste for the learning and networking waiting for you at PASS Summit, we invite you to join the conversation by following us on Twitter (watch the #sqlpass #summit 14 hashtags) and joining our Facebook and LinkedIn groups. We’re looking forward to an amazing, record-breaking event, and can’t wait to see everyone there!

Please stay tuned for regular updates and highlights on Microsoft and PASS activities planned for this year’s conference.

↧