ML之HierarchicalClustering：自定义HierarchicalClustering层次聚类算法-重庆市软件正版化服务平台

政策资讯

Policy Information

ML之HierarchicalClustering：自定义HierarchicalClustering层次聚类算法

来源：重庆市软件正版化服务中心 | 时间： 2022-09-20 | 浏览量： 64426 |

ML之HierarchicalClustering：自定义HierarchicalClustering层次聚类算法

输出结果

实现代码

输出结果

更新……

实现代码


 -*- encoding=utf-8 -*-
 
from numpy import *
 
class cluster_node:  定义cluster_node类，类似Java中的构造函数
    def __init__(self,vec,left=None,right=None,distance=0.0,id=None,count=1):
        self.left=left  
        self.right=right
        self.vec=vec
        self.id=id
        self.distance=distance
        self.count=count only used for weighted average 
 
def L2dist(v1,v2):  
    return sqrt(sum((v1-v2)**2))
    
def L1dist(v1,v2):  
    return sum(abs(v1-v2))
 
def hcluster(features,distance=L2dist): 
    cluster the rows of the "features" matrix
    distances={}     
    currentclustid=-1 
 
     clusters are initially just the individual rows
    clust=[cluster_node(array(features[i]),id=i) for i in range(len(features))]
 
 
    while len(clust)>1:  
        lowestpair=(0,1) 
        closest=distance(clust[0].vec,clust[1].vec)
    
        for i in range(len(clust)):
            for j in range(i+1,len(clust)):
                 distances is the cache of distance calculations
                if (clust[i].id,clust[j].id) not in distances: 
                    distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec)
        
                d=distances[(clust[i].id,clust[j].id)]  
        
                if d<closest:  
                    closest=d
                    lowestpair=(i,j) 
        
        mergevec=[(clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 \
            for i in range(len(clust[0].vec))]
        
        newcluster=cluster_node(array(mergevec),left=clust[lowestpair[0]],
                             right=clust[lowestpair[1]],
                             distance=closest,id=currentclustid)
        
        currentclustid-=1  
        del clust[lowestpair[1]]
        del clust[lowestpair[0]]
        clust.append(newcluster)
 
    return clust[0]
 
def extract_clusters(clust,dist):  (clust上边的树形结构，dist阈值)
     extract list of sub-tree clusters from hcluster tree with distance<dist
    clusters = {}
    if clust.distance<dist:
         we have found a cluster subtree
        return [clust] 
    else:
         check the right and left branches
        cl = []   
        cr = []
        if clust.left!=None:  
            cl = extract_clusters(clust.left,dist=dist)
        if clust.right!=None: 
            cr = extract_clusters(clust.right,dist=dist)
        return cl+cr  
        
def get_cluster_elements(clust):  用于取出算好聚类的元素
     return ids for elements in a cluster sub-tree
    if clust.id>=0:  
         positive id means that this is a leaf
        return [clust.id]
    else:
         check the right and left branches
        cl = []
        cr = []
        if clust.left!=None: 
            cl = get_cluster_elements(clust.left)
        if clust.right!=None: 
            cr = get_cluster_elements(clust.right)
        return cl+cr
 
 
def printclust(clust,labels=None,n=0):  将值打印出来
     indent to make a hierarchy layout
    for i in range(n): print (' '),
    if clust.id<0: 
         negative id means that this is branch
        print ('-')
    else:           
         positive id means that this is an endpoint
        if labels==None: print (clust.id)
        else: print (labels[clust.id])
    
    if clust.left!=None: printclust(clust.left,labels=labels,n=n+1)
    if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)
 
 
 
def getheight(clust):  树的高度，递归方法
     Is this an endpoint? Then the height is just 1
    if clust.left==None and clust.right==None: return 1
    
     Otherwise the height is the same of the heights of
     each branch
    return getheight(clust.left)+getheight(clust.right)
 
def getdepth(clust):   树的深度，递归方法
    if clust.left==None and clust.right==None: return 0
    
    return max(getdepth(clust.left),getdepth(clust.right))+clust.distance

产品推荐

更多 >

WPS 365 一站式数字办公套件

WPS 365是金山办公推出的“数字办公全家桶”、一站式数字办公套件。既包含WPS Office套装、云盘、在线文档、轻维表、表单、脑图等内容创作工具，也包含企业IM、音视频会议等协作软件，助力企业高效协同办公。 2023年4月1日，金山办公宣布旗下全新产品WPS 365正式上线。通过统一工具、统一协作、统一管理的数字办公理念匹配业务发展，实现整个组织高效协作和安全管控。

1条评论

万里安全数据库V1.0

万里安全数据库GreatDB V1.0产品通过中国信息安全测评中心第一批“安可”测评，是“国货国用”、国资委79号文要求2027年全面完成国产替代的国产数据库厂商，也是入围国家信创目录中唯一采用 MySQL 技术路线的国产数据库厂商。

0条评论

中望CAD平台软件V2025 国产正版制图软件

中望软件是可信赖的All-in-One CAx解决方案提供商,科创板上市企业,掌握二三维CAD、CAM、CAE核心技术及产品开发能力,产品有中望CAD,中望3D,中望电磁,中望结构仿真.提供建筑设计软件与机械设计制图软件。

0条评论

金山终端安全系统V9.0杀毒软件（防病毒+漏洞+优化等）

金山终端安全系统V9.0是专门为政府、军工、能源、教育、医疗及集团化企业设计的终端安全管理平台。

0条评论