High Tech Analytics

Saturday, May 28, 2011

Facebook allows Bing to let your friends help you find what they "like"

Bing's collaboration with Facebook opens up all its "LIKES" data to bing to include in its search.

After all, friends suggestions and their expertise do matter :)
Bing will track recommendations from your peers on Facebook and will return granular, more relevant and specific search results to your interests.
Your search would be more like taking your friends along with and telling them about the laptop you are planning to buy; and they will guide you through what they "like".

This could be a major breakthrough in internet search. Further, it might give substantial edge to bing over its search rival Google, which might be working to exploit social search using data from orkut, but it has less user base than Facebook does.

To enable bing social search with Facebook; go to bing and follow below steps.

Friday, May 27, 2011

Offline Wikipedea an example of Anti Cloud

Offline wikipedia offered by apple itunes, brings all the content about world in the small hand-held device as well as PC.

This gives great ease and power of information to the users while travelling with little or no connectivity, or being anywhere without internet.

It further offers customization and personalization like, queing articles for later review, organizing peferred articles into folders and synchronization with latest text and images on the cloud.

The app when deployed on a shared PC accessed by multiple users at work places could also save internet traffic and avoid bandwidth issues.

Read More!

Google Wallet, a mobile payment platform. Still Carrying Credit Cards?

Google launches its Near-Field Communication(

NFC) enabled Mobile Payment Platform, which allows payments sent directly from your smart phone. Your phone will be your wallet.
Google has partnered with big industry retailers and merchants including Master Card and City Bank for its Google Wallet.
This launch has created buzz in all related industries from Point Of Sales Vendors, Retailers & Merchants through Banks and Software Companies in support of this platform and get compatible with the future of customer payments.
There are also Gossips about Apple that is working too on an NFC enabled mobile payment system for its iPhone.

Read More about Google Wallet.

Wednesday, May 25, 2011

Google's Cloud Based Machine Learning Service

Google Labs now offers its cloud based machine learning API and Big Query, powered by a full fledged BI enabled engine with a hand some of Google's industry tested and renowned prediction algorithms.

This empowers developers and organizations alike to take jump start towards greatly improving their applications and services by making use of Google's deep data analytic capabilities.
It specially attracts those Organizations lacking the high tech infrastructure of maintaining a data warehouse followed by high licensing costs of purchasing data mining engines and business intelligence applications offered by major vendors like Oracle, Teradata, Microsoft.

In order for this to work; you have to upload your data to Google, process the data using Big Query and then applying the set of prediction APIs.

However, when it comes to prediction, machine learning, and data mining; we are talking about data in Giga/Tera Bytes.
Given the type of business, size, and sensitivity of data; uploading data over the internet to Google infrastructure for the first time followed by daily incremental uploads raises feasibility and security concerns. But as the cloud Buzz continue to grow massively, so does the adaptability.

Read More!

Sunday, October 03, 2010

SQL 2008 SP2 Available now

http://sqlblog.com/blogs/aaron_bertrand/archive/2010/09/29/this-time-it-s-for-real-sql-server-2008-sp2-is-available.aspx

Monday, February 15, 2010

SP4 for SQL Server 2005 and SP2 for SQL Server 2008

Microsoft has agreed to release the last Service Pack SP4 for SQL 2005 in the fourth quarter of this year 2010.
SP2 for SQL Server 2008 is also expected in the third quarter.

http://blogs.msdn.com/sqlreleaseservices/archive/2010/02/12/sql-server-servicing-plans.aspx

Monday, March 23, 2009

Business Intelligence & Basic Datamining in Excel 2007 with SQL Server 2008 Annalysis Services

One of the myths in decision making is that we get selective in gathering information than being objective! Thanks to the business intelligence tools, which help reveal hidden patterns in the data to broaden the scope of information and helping people think out of the box.

The SQL Server datamining addins for Excel 2007 have a very rich and wide range of business intelligence features including data key influences, forecasting and trend annalysis, categorizing the data, identifying annomalies in the data (exception highlighting), and shoping basket annalysis etc.

The most cool thing about these addins is, that managers, data-analysts, and decision makers can do alot without any substantial technical IT background!

Note: the installation and configuation procedure is at the end of this discussion:

Here is Sample Base Customer Data of a Byke Selling Company (click on the image below):

The company's marketing and sales people need to build future strategy and inline future campaigns to make sure that they hit more sales and attract high value customers! The focus is;

Potential customers: i.e., what type of people buy a byke and which ones that do not. (eg, a car owner is probably not going to buy a byke, however those with a substantial commute distance with no car is a potential byke buyer)
Categories and hidden patterns in the data:
Sales forecasting
Buyer trends

Below are some of the examples of datamining features!

Annalyse Key Influencers:

This feature shows, how much impact/influence the other columns of the sample data put on a target column. in this case, we have selected the influence of all columns on Byke Purchase column to see what characteristics in other columns impact on byke purchase!

Below is the result!

Here, you see that the result shows the people with zero cars are the most valuable customers and this helps the management save a lot of money spending on marketing campaigns targeting irrelevant people!

Detect Categories:

This feature mines the data and then classifies the data into very interesting categories!

Excel 2007 will add the category column to the source data as well.

Forecasting:

Given below is the sample of region wise monthly sales!

Below is the forecast report for the sample data above!

The estimated sales have also been added to the actual source data:

Forecast Graph report!

Buyer Trends: shopping Basket Annalysis:

Sampe data of customer orders and trends of items purchased!

The annalysis shows that with a particular item, what other items are purchased in a single order; so that relevant items are placed together for sales; that's how METRO, WallMart and other large cash & carry stores do to hit more sales!

Below are the results of shopping basket annalysis below for the above sample data!

Highlight Exceptions:

This feature mines the data and detects annomalies/outfitters with in the data patterns!

Here's how we use this feature:

Below are the examples of exceptions highlighted by the tool:

Summary of exceptions found!

Exceptions in the data (outliers). the row is marked yellow with the highlighted actual field causing exception.

If you fix the annomaly by changing the field, the exception goes away automatically!

Installation/Configuration Procedure:

Install SQL Server 2008 Annalysis Services.

Follow the link below for the details!

http://talatnauman.blogspot.com/2008/08/sql-server-2008-katmai-download-links.html

After the installation, connect the Annalysis Server through SQL Server Management Studio and open properties of the server.

Choose Show Advance (All) Properties

Set the below two options to TRUE:

DataMining \ AllowAdHocOpenRowsetQueries
DataMining \ AllowSessionMiningModels

Additionally, you can download the sample databases for SQL 2008 including Warehouse database to building you own mining models using Business Intelligence Studio!

http://msftdbprodsamples.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=18407

Download the SQLServer2008 datamining Addins for Excel 2007:

http://download.microsoft.com/download/8/a/9/8a91f14c-4f2e-488b-a3db-dd3451f0f602/SQLServer2008_DMAddin.msi

Run the setup and install the datamining addins.

Run Excel 2007 and open the file :

c:\Program Files\Microsoft SQL Server 2008 DM Add-Ins\DMAddins_SampleData.xlsx

You will see a new button Data Mining appeared in the excel tool.

Note: if the datamining buttons do not appear in the excel tool: cleck the excel options from the menu and choos ADDINS ; under the manage options; choose exceladdins and make sure the datamining addins are selected.

Click on the dataming button, and from the list; choose CONNECTIONS and add connection to the annlaysis server.

You are ready to use Business Intelligence data mining features of Sql Server 2008 Annalysis Services.

Additional to datamining button, if you load data and format the data to a table; ANALYZE button appears in the menu that contain many Data mining features!

Please leave comments on the discussion if have any question/issue installing and configuring the setup!

Cheers!

Sunday, March 15, 2009

SQL Server 2008 Service Pack 1 Released

Microsoft has released the first Service Pack for SQL Server 2008

SP1 for Katmai can be downloaded below
http://www.microsoft.com/downloads/details.aspx?FamilyID=66ab3dbb-bf3e-4f46-9559-ccc6a4f9dc19&displaylang=en:

Wednesday, March 11, 2009

Enterprise Policy Management Framework with SQL Server 2008

If you have dozens of SQL instances (including sql 2000, yukon, katmai) running within the enterprise; policy management wouldn't have been so easy!

http://msdn.microsoft.com/en-us/library/dd542632.aspx

http://www.codeplex.com/EPMFramework

Monday, February 09, 2009

OLD newid VS newsequentialid

SqlServer 2005 has introduced newsequentialid to avoid page-splits by producing bigger value than the last generated value against its predecessor newid which generates random values causing I/O issues.

That also helps in replication performance.

Details below:

http://sqlblog.com/blogs/denis_gobo/archive/2009/02/05/11743.aspx

SQL Server 2005 SP3 released

Posting late though!

Link below.

http://www.microsoft.com/downloads/details.aspx?familyid=AE7387C3-348C-4FAA-8AE5-949FDFBE59C4&displaylang=en

Microsoft has also pushed the sp3 in it's regular Windows Updates!

Sunday, November 09, 2008

Stored Procedures Best Practices (learn from the Best)

Instead of adding more to already overloaded junk on the web regarding SQL Stored Procedure Best Practices, let's learn from those who are the best! :)

Follow the link below!

http://sqlblog.com/blogs/aaron_bertrand/archive/2008/10/30/my-stored-procedure-best-practices-checklist.aspx

Wednesday, October 22, 2008

Avoiding Cursors and Loops for Append Operations

Most of the times, to do basic append operations in SQL; the very first thing comes in mind is the CURSOR and a LOOP.

For example, if I want to print a comma seperated list of columns in any given table; i would quickly think of a cursor with a loop assigning the comma seperated column list to a variable.

BUT, there is pretty simpler, smarter, and quicker work around to do appends like this:

TRADITIONAL CURSOR WITH WHILE LOOP SOLUTION FOR APPENDS:

create table #mytable (column_a varchar(2), column_b varchar(2), column_c varchar(3))

declare cursor_mytable cursor

for select c.name

from tempdb.dbo.sysobjects o

join tempdb.dbo.syscolumns c on c.id=o.id

where o.id=object_id('tempdb.dbo.#mytable')

and o.type = 'U'

declare @v varchar(100), @v2 varchar(800)

set @v2 =''

open cursor_mytable

fetch cursor_mytable into @v

while @@fetch_status = 0

begin

if @v2 =''

begin

set @v2 =@v

end

else

begin

set @v2=@v2+','+@v

end

fetch cursor_mytable into @v

end

close cursor_mytable

deallocate cursor_mytable

drop table #mytable

select @v2

SIMPLE CURSOR & LOOPLESS, YET SMARTER SOLUTION FOR APPENDS:

create table #mytable (column_a varchar(2), column_b varchar(2), column_c varchar(3))

declare @v varchar(100), @v2 varchar(800), @tbl sysname

select @v2 = '', @v = ''

select @v = c.name , @v2 = @v2+','+@v

from tempdb.dbo.syscolumns c

INNER JOIN tempdb.dbo.sysobjects o on c.id = o.id

where o.id=object_id('tempdb.dbo.#mytable')

and o.type = 'U'

drop table #mytable

select substring(@v2,2,len(@v2))

The output of both code snippets above is the same but simplicity is obvious in the latter one!

Wednesday, August 27, 2008

SQL Server 2005 Versions and Features Comaprisons

http://www.microsoft.com/sql/prodinfo/features/compare-features.mspx

One thing i found missing in the comparison is, that Peer To Peer Transactional Replication used for scaling out is only supported in the Enterprise Edition

Saturday, August 09, 2008

SQL Server version Info and Build Details!

To get the version info for the SQL server we use:
select @@version
or
select serverproperty('productversion')

Following are very cool links to get a very detailed information about all builds and versions which fairly helps in maintaining patches and security!

http://sqlsecurity.com/FAQs/SQLServerVersionDatabase/tabid/63/Default.aspx

http://sqlserverbuilds.blogspot.com/

http://support.microsoft.com/kb/321185

SQL Server 2008 (Katmai) download links!

Trial Download:

http://www.microsoft.com/sqlserver/2008/en/us/trial-software.aspx

SQL Server 2008 Feature Pack:

http://www.microsoft.com/downloads/details.aspx?familyid=c6c3e9ef-ba29-4a43-8d69-a2bed18fe73c&displaylang=en&tm

SQL Server 2008 Upgrade Advisor:
http://www.microsoft.com/downloads/details.aspx?FamilyID=F5A6C5E9-4CD9-4E42-A21C-7291E7F0F852&displaylang=en

Wednesday, July 23, 2008

Advanced SQL Server Interview Questions

I came across several questions during an Interview for SQL admin; I thought other folks might also benifit from it.

And I will try to add more to the list as long as I am being Interviewed! ;)

Question: When you failover an SQL Server from node1 to node 2, do all processes/users automatically shifted to node 2?
Jobs running on node 1 will fail and do not automatically Failover to node2. You might need to run those jobs manually after the Failover.
Question: How can you move a physical table from one disk to another (using filegroup),
Recreate the clustered index of the table on the destination disk (filegroup), the physical table automatically moves along with the clustered index to the new disk.
Question: When you upgrade a DB what thing doesn't automatically get updated?
run UPDATE STATS manually because statistics are not automatically updated upon upgrade.
Question: Is it fine to have a fill factor of value 100 by default when creating an index?
Do not automatically accept default fill factor value of 100 unless your data seldom changes. You might need a lower fill factor value for OLTP db to reduce page splitting and save I/O.
Question: Is it fine to have SQL startup account as a member of Builtin/Administrators group, if not then what explicit rights might be required for a startup account with limited rights?
Though it’s recommended that SQL start up server account should not be domain admin, and should have limited rights to avoid risking complete machine/network in case of injection/intrusion; You might need LOCK PAGES IN MEMORY setting to be explicitly allowed for the SQL start up account in order to start the SQL service.
Question: How do we replicate dbs over WAN with limited internet bandwidth and is it possible to setup a single large publication in that case?
While replicating Large DBs over WAN, you need to setup ftp options on publication so the snapshot could manually be uploaded on ftp for subscriber to download. You can not create a snapshot file (.cab) over 2 GB, so if you have large tables; use multiple publications with less than 2 GB snapshot cab file.
Question: How can we brwose SQL configuration options without wanting to enable advanced options
SELECT * FROM SYS.CONFIGURATIONS
Question: What is the recommended size to set when setting up a disk or RAID for an SQL database
Optimum Block size to set when formatting a disk or setting up RAID is 64 K for SQL database, as SQL Server basic data storage is in Extents which are of 64 K.
Question: Is it a problem if your NTFS disk is 90% used?
NTFS needs room to work,and when you exceed 80% capacity, NTFS become less efficient and I/O cansuffer for it. Keeping adequate free space onf the disk will prevent the MTF from becoming fragmented, which is important because the Disk Defragmenter tool can't defragment the MFT.
Question: In SQL 2000, we know that page size in SQL storage structue is 8064 bytes out which 4 is header information. what's the actual row size in a data page?
The actual max row size in SQL 2000 is 8039 bytes, not 8060. A table to hold a row of 8040 bytes is not possible and the following command will fail:
CREATE TABLE ROWSIZE ( address CHAR(8000), name CHAR (40) )
Question: In SMS executing the below command; what will be the output.
SELECT @@version
GO 5
It will execute the select command five times.
Question: Does enabling AWE on 32 bit OS means better Procedure Cache?
SQL on 32 bit OS with AWE eabled still can't address more than 4 GB of virtual address space. Only buffer cache can benifit from more RAM on 32 bit with AWE enabled.
Question: Does enabling AWE on 64bit OS means better Procedure Cache?
True, however one of the Draw back of SQL on 64 bit OS is that we can not manage procedure cache size and SQL server may prefer to keep large expensive plans in cache and it drains out memory. And a larger procedure cache may also impact the time for SQL to search the procedure cache to determine if a plan exists.
Question: What's the purpose of AWE enable option in SQL 2005 64 bit
On 64 bit OS AWE isn't ignored (as per several msdn posts) " locked pages, awe mechanism, enable better application's scalability and performance both during ramp up time and over long period of time." http://blogs.msdn.com/slavao/archive/2005/04/29/413425.aspx
Question: How can we handle an SQL statement to rollback if it gets blocked, instead of waiting for the blocking process to commit or rollback?
By using SET LOCK_TIMEOUT
Question: What's wrong with the following queries:

SELECT name FROM DepartmentsTable ORDER BY id
SELECT DISTINCT name FROM DepartmentsTable ORDER BY id

Second Query will return error because when using DISTINCT all columns should be in SELECT statement referred in ORDER BY cluase.

Monday, June 09, 2008

Purging Merge Replication History to avoid Locking/Blocking

Last week I visited a Dataware house that pulls data over the WAN from several OLTP servers setup as merge Publishers. One of the main OLTP servers I was asked to look closely, had a very high processor utilization and consistent spikes of processor queue length because of high lock escallations and blocking. The machine was configured pretty fine with adequate hardware resources. All productions tables were fairly defragmented, partitioned; tempdb on seperate SCISIs, blah blah blah. Yet over all I/O was sky high and processor spikes were shouting that there's some thing fairly wrong.
During the investigation, I found that the merge replication agent was configured to run every 5 minutes on the server. I executed the following query that showed me the potential cuase of the problem:

select min(coldate) min_repldate, max(coldate) max_repldate , count(*) rows_replicated
from msmerge_contents a
join msmerge_genhistory b on a.generation=b.generation and a.tablenick=b.art_nick
where b.genstatus<>0

________________________________________
min_repldate, max_repldate, rows_replicated
2008-01-01 ,2008-06-08 ,12010000

There were around 120 million records in the replication history table since last six months.
Since replication doesn't automatically delete old replicated transactions from these tables except some of the tables in distribution database; each time the replication agent used to kick off, queries and entries were made to these highly fragmented tables causing high I/O and processor time and the user activity involving published tables was being blocked.
Snapshot was not adequite solution because the servers were on the WAN and published tables were substantially large.
I used following command to purge the history records older than 7 days to bring relief to the server.

use publisheddb
GO
begin tran
select count(*) from msmerge_contents with(tablockx)
select count(*) from MSmerge_genhistory with(tablockx)
declare @coldate datetime
select @coldate = getdate()-7
delete MSmerge_contents from MSmerge_contents a, MSmerge_genhistory b where a.generation = b.generation and a.tablenick=b.art_nick and b.coldate < @coldate and a.generation > 0
delete MSmerge_genhistory where coldate < @coldate and pubid is not null
commit
GO
dbcc dbreindex ( MSmerge_contents)
GO
dbcc dbreindex ( MSmerge_genhistory)
GO
update statistics MSmerge_contents
GO
update statistics MSmerge_genhistory
GO

Ideally, we need to keep these two tables (msmerge_contents, msmerge_genhistory) as narrowed as possible. That helps replication to complete quickly and avoid locking/blocking for long time.
There should be a scheduled routine to purge all previously completed transaction daily/weekly depending upon the setup and environement.

Thursday, May 08, 2008

Find out Pending Merge Replication changes

In many of the Merge Replication setups, the subscribers are not in Local LAN and replicating large transactions become a nightmare. And if some one updates, insert,s or deletes very large amount of rows; the merge agent seems to be stuck to replicate changes and users can not see regular changes.
Large DML in merge replication also results in Locking, High I/O and slow response time; because changes applied at the subscribers are row by row.

So, you might need to find out exactly which table and which rows are pending to be replicated.

SCRIPT!
-- run on Publisher/Subscriber

-- NOTE: if the table is published in multiple publications number or rows

-- returned by the below queries would be articlechanges*subscribers

-- Finding inserted/update records
select distinct a.name , b.generation,b.coldate transaction_time, c.rowguid changed_row ,b.changecount tran_count
from sysmergearticles a
join sysobjects ab on ab.name=a.name
join msmerge_genhistory b on a.nickname=b.art_nick
join msmerge_contents c on c.generation=b.generation and c.tablenick=b.art_nick
where b.genstatus =0
order by c.rowguid

-- Finding deleted records

select a.name table_name , b.generation, b.coldate transaction_time , t.rowguid changed_row, b.changecount
from sysmergearticles a
join MSmerge_tombstone t on a.nickname=t.tablenick
join msmerge_genhistory b on a.nickname=b.art_nick and b.generation=t.generation
where b.genstatus=0
order by coldate desc

In the Next post, we will discuss on how to skip replicating the large DML rows for a particular table and apply the change manually on the subscriber to restore the normal merge process and saving business time.

Monday, April 21, 2008

How to login to SQL Server as SA while running Windows Authentiaction only mode

A colleague of mine ran into following situation:
My Company polocies enforce that domain-admins group be removed from sysadmin role in production SQL Server and windows authentication mode should only be used.
He removed the domain-admin group from sysadmin roles , then set windows authentication only and restarted the server at night.
When he tried to log in back with his domain user account (which was part of domain admin group too) he could only log on to master db with public role only!!!!!!!!!!!!!!!!!!!!!
He forgot to add his own account explicitly in SQL Server before removing domain-admins group.

Interesting situation:

Domain-admins removed from SQL sysadmin role
SQL running Windows Authentication only Mode.
No windows user exist in SQL with sysadmin role.

Steps to log in back to SQL with sysadmin rights using SA?
Besides my colleague was planning to run rebuildm.exe utility to rebuild the master db , and then setting up logins and all user databases using scripts.
I tried something on my local test machine and found that there's a way!

I opened Enterprise Manager and tried to connect using SA with SQL authentication; Error: Login Failed for user 'sa'. Reason: Not associated with a trusted SQL Server connection.
hmmmm! OK
Solution:

I logged into the operating system running production SQL machine using local admin account.
Checked that NAMED-PIPES are enabled under the Server network utility.
Start --> Program Files --> Microsoft SQL Server --> SQL Server Client Network Utility
Goto Alias, Select Add.
Under network libaries , selected NAMED PIPES
Gave server alias prod1
Under Server Name typed the actual SQL instance name (in our case PRODUK1)
Clicked OK.
Open the Enterprise manager again
New SQL Server Registration.
Under the server, typed the alias: prod1
Selected, Use SQL Server authentication
user name : SA
password : typed the password
clicked OK.

Successfully connected using sa account! :)

This isn't a security breach; since you already logged in to the machine locally, that means your are an authenticated user to the system. NamedPipes library only supports connections that are valid windows logins. You can not connect using NamedPipes over the WAN or out of a trusted domain. Your Windows login must be authenticated on the machine running SQL Server before in order to use NamedPipes!

Cheers!