当前位置: 代码迷 >> python >> 做这种搜索算法更有效的方法?
  详细解决方案

做这种搜索算法更有效的方法?

热度:108   发布时间:2023-06-13 20:24:02.0

我只是想知道是否有更好的方法来执行此算法。 我发现我需要经常执行这种类型的操作,而我目前的操作方式需要花费数小时,因为我认为它将被视为n ^ 2算法。 我将其附在下面。

import csv

with open("location1", 'r') as main:
    csvMain = csv.reader(main)
    mainList = list(csvMain)

with open("location2", 'r') as anno:
    csvAnno = csv.reader(anno)
    annoList = list(csvAnno)

tempList = []
output = []

for full in mainList:
    geneName = full[2].lower()
    for annot in annoList:
        if geneName == annot[2].lower():
            tempList.extend(full)
            tempList.append(annot[3])
            tempList.append(annot[4])
            tempList.append(annot[5])
            tempList.append(annot[6])
            output.append(tempList)

        for i in tempList:
            del i

with open("location3", 'w') as final:
    a = csv.writer(final, delimiter=',')
    a.writerows(output)

我有两个包含15,000个字符串的csv文件,并且我希望比较每个列中的列,如果它们匹配,请将第二个csv的末尾连接到第一个csv的末尾。 任何帮助将不胜感激!

谢谢!

这样应该更有效:

import csv
from collections import defaultdict

with open("location1", 'r') as main:
  csvMain = csv.reader(main)
  mainList = list(csvMain)

with open("location2", 'r') as anno:
  csvAnno = csv.reader(anno)
  annoList = list(csvAnno)

output = []
annoMap = defaultdict(list)

for annot in annoList:
  tempList = annot[3:]  # adapt this to the needed columns
  annoMap[annot[2].lower()].append(tempList)  # put these columns into the map at position of the column of intereset

for full in mainList:
  geneName = full[2].lower()
  if geneName in annoMap:  # check if matching column exists
    output.extend(annoMap[geneName])

with open("location3", 'w') as final:
  a = csv.writer(final, delimiter=',')
  a.writerows(output)

由于您只需要遍历每个列表一次,因此效率更高。 字典中的查找平均为O(1),因此基本上可以得到线性算法。

一种简单的方法是使用像这样的库。 内置功能非常有效。

您可以使用pandas.read_csv()将csv加载到数据pandas.read_csv() ,然后使用pandas函数对其进行操作。

例如,您可以使用Pandas.merge()在特定列上合并两个数据Pandas.merge()也就是您的两个csv文件),然后删除不需要的数据Pandas.merge()

如果您有一些数据库知识,那么这里的逻辑非常相似。

谢谢@limes的帮助。 这是我使用的最后一个脚本,以为我会发布它以帮助他人。 再次感谢!

import csv
from collections import defaultdict

with open("location1", 'r') as main:
  csvMain = csv.reader(main)
  mainList = list(csvMain)

with open("location2", 'r') as anno:
  csvAnno = csv.reader(anno)
  annoList = list(csvAnno)

output = []
annoMap = defaultdict(list)

for annot in annoList:
  tempList = annot[3:]  # adapt this to the needed columns
  annoMap[annot[2].lower()].append(tempList)  # put these columns into the map at position of the column of intereset

for full in mainList:
  geneName = full[2].lower()
  if geneName in annoMap:  # check if matching column exists
    list = annoMap[geneName]
    full.extend(list[0])
    output.append(full)

with open("location3", 'w') as final:
 a = csv.writer(final, delimiter=',')
 a.writerows(output)