关注微信公共号:小程在线
关注CSDN博客:程志伟的博客
1.加载包
julia> using Clustering, Gadfly, RDatasets, Dates
2.导入数据集
julia> mydat = dataset("datasets", "iris")
150×5 DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Cat… │
├─────┼─────────────┼────────────┼─────────────┼────────────┼───────────┤
│ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ setosa │
│ 2 │ 4.9 │ 3.0 │ 1.4 │ 0.2 │ setosa │
│ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ setosa │
│ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ setosa │
│ 5 │ 5.0 │ 3.6 │ 1.4 │ 0.2 │ setosa │
│ 6 │ 5.4 │ 3.9 │ 1.7 │ 0.4 │ setosa │
│ 7 │ 4.6 │ 3.4 │ 1.4 │ 0.3 │ setosa │
│ 8 │ 5.0 │ 3.4 │ 1.5 │ 0.2 │ setosa │
│ 9 │ 4.4 │ 2.9 │ 1.4 │ 0.2 │ setosa │
│ 10 │ 4.9 │ 3.1 │ 1.5 │ 0.1 │ setosa │
?
│ 140 │ 6.9 │ 3.1 │ 5.4 │ 2.1 │ virginica │
│ 141 │ 6.7 │ 3.1 │ 5.6 │ 2.4 │ virginica │
│ 142 │ 6.9 │ 3.1 │ 5.1 │ 2.3 │ virginica │
│ 143 │ 5.8 │ 2.7 │ 5.1 │ 1.9 │ virginica │
│ 144 │ 6.8 │ 3.2 │ 5.9 │ 2.3 │ virginica │
│ 145 │ 6.7 │ 3.3 │ 5.7 │ 2.5 │ virginica │
│ 146 │ 6.7 │ 3.0 │ 5.2 │ 2.3 │ virginica │
│ 147 │ 6.3 │ 2.5 │ 5.0 │ 1.9 │ virginica │
│ 148 │ 6.5 │ 3.0 │ 5.2 │ 2.0 │ virginica │
│ 149 │ 6.2 │ 3.4 │ 5.4 │ 2.3 │ virginica │
│ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │ virginica │
julia> myf = convert(Array, mydat[:, 1:4])
150×4 Array{Float64,2}:
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5.0 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
4.8 3.4 1.6 0.2
4.8 3.0 1.4 0.1
?
6.0 3.0 4.8 1.8
6.9 3.1 5.4 2.1
6.7 3.1 5.6 2.4
6.9 3.1 5.1 2.3
5.8 2.7 5.1 1.9
6.8 3.2 5.9 2.3
6.7 3.3 5.7 2.5
6.7 3.0 5.2 2.3
6.3 2.5 5.0 1.9
6.5 3.0 5.2 2.0
6.2 3.4 5.4 2.3
5.9 3.0 5.1 1.8
julia> myl = convert(Array, mydat[:, 5])
150-element Array{String,1}:
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
?
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
3.使用initseeds()函数随机产生3个初始质心
julia> x = initseeds(:rand, convert(Matrix, myf'), 3)
3-element Array{Int64,1}:
124
110
41
4.使用kmeans函数
julia> myres = kmeans(myf, 3)
KmeansResult{Array{Float64,2},Float64,Int64}([0.2 5.1 2.45; 0.2 4.9 2.2; … ; 2.3 6.2 4.4; 1.8 5.9 4.05], [2, 3, 3, 1], [0.0, 166.1275000000005, 166.1274999999987, 1.1368683772161603e-13], [1, 1, 2], [1, 1, 2], 332.2549999999993, 2, true)
5.可视化,在浏览器弹出
julia> myplo = Gadfly.plot(mydat, x=:PetalLength, y=:PetalWidth, color=myres.assignments, Geom.point)
julia> myplo2 = Gadfly.plot(mydat, x=:SepalLength, y=:SepalWidth, color=myres.assignments, Geom.point)