Arrao4u

…a blog by Rama Rao

Archive for the ‘Removing dulicate records’ Category

How to delete duplicate records and maintain one record among the duplicates

Posted by arrao4u on December 7, 2009

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn2)

if table dont have primary key:

CREATE TABLE [dbo].[Example]
(
[ID] [int] NOT NULL,
[Name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
ON [PRIMARY]
INSERT INTO [Example] ([ID],[Name]) VALUES (1,’Chirag’)
INSERT INTO [Example] ([ID],[Name]) VALUES (1,’Chirag’)
INSERT INTO [Example] ([ID],[Name]) VALUES (2,’Shailesh’)
INSERT INTO [Example] ([ID],[Name]) VALUES (3,’Dipak’)
INSERT INTO [Example] ([ID],[Name]) VALUES (4,’Mihir’)
INSERT INTO [Example] ([ID],[Name]) VALUES (4,’Mihir’)
INSERT INTO [Example] ([ID],[Name]) VALUES (4,’Piyush’)

SELECT
[ID],[NAME],COUNT([ID])
FROM
[Example]
GROUP BY
[ID],[NAME]
HAVING
COUNT([ID]) > 1
WITH

Test AS (SELECT ROW_NUMBER () OVER ( PARTITION BY ID ORDER BY ID) AS A FROM example )
DELETE FROM Test WHERE A > 1

Eg:2

Code Snippet

CREATE TABLE [DAS](

[INSTANCE_NAME] Varchar(16) NULL,

[SESSIONID] float NOT NULL,

[IMPORT_DATE] smalldatetime NULL

)

INSERT INTO DAS VALUES(‘IN1’, 1, DATEADD(dd,1,GETDATE()))

INSERT INTO DAS VALUES(‘IN2’, 1, DATEADD(dd,2,GETDATE()))

INSERT INTO DAS VALUES(‘IN4’, 2, DATEADD(dd,3,GETDATE()))

INSERT INTO DAS VALUES(‘IN4’, 2, DATEADD(dd,3,GETDATE()))

INSERT INTO DAS VALUES(‘IN5’, 2, DATEADD(dd,4,GETDATE()))

INSERT INTO DAS VALUES(‘IN6’, 3, DATEADD(dd,6,GETDATE()))

INSERT INTO DAS VALUES(‘IN5’, 3, DATEADD(dd,6,GETDATE()))

WITH CTE

AS

(

SELECT ROW_NUMBER() OVER (PARTITION BY sessionid ORDER BY [Import_Date] desc) as rn, instance_name, sessionid, Import_date

FROM

DAS

)

SELECT instance_name, sessionid, Import_date FROM CTE WHERE rn > 1

Results

—————————–

IN1 1 2008-10-22 17:07:00
IN4 2 2008-10-24 17:07:00
IN4 2 2008-10-24 17:07:00
IN5 3 2008-10-27 17:07:00

eg:3

DELETE TOP(1) FROM [Example] WHERE [ID] = 1

Fig – (3) Delete single row from duplicate rows.

Here I have used TOP(1) , If you have n rows which has all the values same than you have to use TOP(n-1) so that only 1 row will be remain after delete statement. To delete all the duplicate rows you need to write a cursor as shown below,

DECLARE @ID int
DECLARE @NAME NVARCHAR(50)
DECLARE @COUNT int

DECLARE CUR_DELETE CURSOR FOR
SELECT [ID],[NAME],COUNT([ID]) FROM [Example] GROUP BY [ID],[NAME] HAVING COUNT([ID]) > 1

OPEN CUR_DELETE

FETCH NEXT FROM CUR_DELETE INTO @ID,@NAME,@COUNT
/* Loop through cursor for remaining ID */
WHILE @@FETCH_STATUS = 0
BEGIN

DELETE TOP(@COUNT -1) FROM [Example] WHERE ID = @ID

FETCH NEXT FROM CUR_DELETE INTO @ID,@NAME,@COUNT
END

CLOSE CUR_DELETE
DEALLOCATE CUR_DELETE

Fig – (4) Cursor to delete all duplicate records

This is all about deleting duplicate rows in SQL Server 2005.

For removing duplicates using row_number?
select ROW_NUMBER() over(partition by name order by name) as a,* from example
a ID Name
——————– ———– ————————————————–
1 1 Chirag
2 1 Chirag
1 3 Dipak
1 4 Mihir
2 4 Mihir
1 4 Piyush
1 2 Shailesh

using rank():
ID Name rownumber
———– ————————————————– ——————–
1 Chirag 1
1 Chirag 1
2 Shailesh 3
3 Dipak 4
4 Mihir 5
4 Mihir 5
4 Piyush 5

with test as
(

select *,rank() over(order by id) as rownumber from example
)
select distinct rownumber,name from test

rownumber name
——————– ————————————————–
1 Chirag
3 Shailesh
4 Dipak
5 Mihir
5 Piyush

(5 row(s) affected)

ex:
select *,rank() over(order by id) as rownumber from example

ID Name rownumber
———– ————————————————– ——————–
1 Chirag 1
1 Chirag 1
2 Shailesh 3
3 Dipak 4
4 Mihir 5
4 Mihir 5
4 Piyush 5

select *,rank() over(order by name) as rownumber from example

ID Name rownumber
———– ————————————————– ——————–
1 Chirag 1
1 Chirag 1
3 Dipak 3
4 Mihir 4
4 Mihir 4
4 Piyush 6
2 Shailesh 7

(7 row(s) affected)

Posted in Removing dulicate records, SqlServer | Leave a Comment »