SQL Server – How to remove HTML tags from data – SQL Circuit

SQL Server – How to remove HTML tags from data

1.     Background

The purpose of this article
is to provide a way of cleaning up of HTML tags within the data. When we use various styles or tabular format data in UI
using Rich Text Editor/ Rad Grid etc, it will save data in database with HTML
tags.
·        
Fig. 1 showing data with style using Rich text editor in UI
 
·        
Fig 2 showing data saved in table corresponding to fig 1 with HTML tags
           
            
·        
If we simply pull the data from the table for SSRS or excel then data
will come with HTML tags unless we explicitly remove the HTML tags from the
data.

2.    
What
is HTML Tags?

An HTML element starts with a start tag (<p>) and ends with end
tag (<p/>) and everything between
Start tag and End tag is HTML element. e.g.
       <b>Following are the popular
databases: <br />  </b>1. SQL
Server <br />  2. Oracle <br
/>  3.    
      Teradata <br />  4. Sybase

3.    
Script
to remove HTML tags from the data?
  

We are using SQL Server CHARINDEX function to get the
positions of ‘<’ and ‘>’ and once found replacing the string between <….> with blank using STUFF
function. We are using WHILE Loop that will run till presence of ‘<’ & ‘>’
in the string. Below is the UDF script that performs HTML Tags clean up from data.
/****** Object:  UserDefinedFunction
[dbo].[usp_ClearHTMLTags]   ******/
 
    SET ANSI_NULLS ON 
    GO 
    SET QUOTED_IDENTIFIER
ON 
    GO 
    /**************************************************************************** 
   
Name of Author  :   Vishal Jharwade 
   
Purpose         :   The Purpose of this function is to clean the
html tags from the data. 
   
***************************************************************************************/
 
    CREATE FUNCTION [dbo].[usp_ClearHTMLTags] 
   
(@String NVARCHAR(MAX)) 
     
    RETURNS NVARCHAR(MAX) 
    AS 
    BEGIN 
        DECLARE @Start INT, 
               
@End INT, 
               
@Length INT 
         
        WHILE CHARINDEX(‘<‘, @String) > 0 AND CHARINDEX(‘>’, @String, CHARINDEX(‘<‘, @String)) > 0 
        BEGIN 
            SELECT 
@Start  =
CHARINDEX(‘<‘, @String),  
                   
@End    =
CHARINDEX(‘>’, @String, CHARINDEX(‘<‘, @String)) 
            SELECT @Length = (@End @Start) + 1 
             
            IF @Length >
0 
            BEGIN 
                SELECT @String = STUFF(@String, @Start, @Length, ) 
             END 
         END 
         
        RETURN @String 
    END 

4.     Conclusion

By using above UDF, We can clean the
HTML tags from the data.

———————————–End
of Article———————————————-

Leave a Reply

Your email address will not be published. Required fields are marked *