政策资讯

Policy Information


ML之HierarchicalClustering:自定义HierarchicalClustering层次聚类算法

来源: 重庆市软件正版化服务中心    |    时间: 2022-09-20    |    浏览量: 63702    |   

ML之HierarchicalClustering:自定义HierarchicalClustering层次聚类算法

目录

输出结果

实现代码


输出结果

更新……

实现代码

  1.  -*- encoding=utf-8 -*-
  2. from numpy import *
  3. class cluster_node: 定义cluster_node类,类似Java中的构造函数
  4. def __init__(self,vec,left=None,right=None,distance=0.0,id=None,count=1):
  5. self.left=left
  6. self.right=right
  7. self.vec=vec
  8. self.id=id
  9. self.distance=distance
  10. self.count=count only used for weighted average
  11. def L2dist(v1,v2):
  12. return sqrt(sum((v1-v2)**2))
  13. def L1dist(v1,v2):
  14. return sum(abs(v1-v2))
  15. def hcluster(features,distance=L2dist):
  16. cluster the rows of the "features" matrix
  17. distances={}
  18. currentclustid=-1
  19. clusters are initially just the individual rows
  20. clust=[cluster_node(array(features[i]),id=i) for i in range(len(features))]
  21. while len(clust)>1:
  22. lowestpair=(0,1)
  23. closest=distance(clust[0].vec,clust[1].vec)
  24. for i in range(len(clust)):
  25. for j in range(i+1,len(clust)):
  26. distances is the cache of distance calculations
  27. if (clust[i].id,clust[j].id) not in distances:
  28. distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec)
  29. d=distances[(clust[i].id,clust[j].id)]
  30. if d<closest:
  31. closest=d
  32. lowestpair=(i,j)
  33. mergevec=[(clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 \
  34. for i in range(len(clust[0].vec))]
  35. newcluster=cluster_node(array(mergevec),left=clust[lowestpair[0]],
  36. right=clust[lowestpair[1]],
  37. distance=closest,id=currentclustid)
  38. currentclustid-=1
  39. del clust[lowestpair[1]]
  40. del clust[lowestpair[0]]
  41. clust.append(newcluster)
  42. return clust[0]
  43. def extract_clusters(clust,dist): (clust上边的树形结构,dist阈值)
  44. extract list of sub-tree clusters from hcluster tree with distance<dist
  45. clusters = {}
  46. if clust.distance<dist:
  47. we have found a cluster subtree
  48. return [clust]
  49. else:
  50. check the right and left branches
  51. cl = []
  52. cr = []
  53. if clust.left!=None:
  54. cl = extract_clusters(clust.left,dist=dist)
  55. if clust.right!=None:
  56. cr = extract_clusters(clust.right,dist=dist)
  57. return cl+cr
  58. def get_cluster_elements(clust): 用于取出算好聚类的元素
  59. return ids for elements in a cluster sub-tree
  60. if clust.id>=0:
  61. positive id means that this is a leaf
  62. return [clust.id]
  63. else:
  64. check the right and left branches
  65. cl = []
  66. cr = []
  67. if clust.left!=None:
  68. cl = get_cluster_elements(clust.left)
  69. if clust.right!=None:
  70. cr = get_cluster_elements(clust.right)
  71. return cl+cr
  72. def printclust(clust,labels=None,n=0): 将值打印出来
  73. indent to make a hierarchy layout
  74. for i in range(n): print (' '),
  75. if clust.id<0:
  76. negative id means that this is branch
  77. print ('-')
  78. else:
  79. positive id means that this is an endpoint
  80. if labels==None: print (clust.id)
  81. else: print (labels[clust.id])
  82. if clust.left!=None: printclust(clust.left,labels=labels,n=n+1)
  83. if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)
  84. def getheight(clust): 树的高度,递归方法
  85. Is this an endpoint? Then the height is just 1
  86. if clust.left==None and clust.right==None: return 1
  87. Otherwise the height is the same of the heights of
  88. each branch
  89. return getheight(clust.left)+getheight(clust.right)
  90. def getdepth(clust): 树的深度,递归方法
  91. if clust.left==None and clust.right==None: return 0
  92. return max(getdepth(clust.left),getdepth(clust.right))+clust.distance

相关文章
ML之H-clustering:自定义HierarchicalClustering层次聚类算法

评论

QQ咨询 扫一扫加入群聊,了解更多平台咨询
微信咨询 扫一扫加入群聊,了解更多平台咨询
意见反馈
立即提交
QQ咨询
微信咨询
意见反馈