做这种搜索算法更有效的方法？_python

我只是想知道是否有更好的方法来执行此算法。 我发现我需要经常执行这种类型的操作，而我目前的操作方式需要花费数小时，因为我认为它将被视为n ^ 2算法。 我将其附在下面。

import csv

with open("location1", 'r') as main:
    csvMain = csv.reader(main)
    mainList = list(csvMain)

with open("location2", 'r') as anno:
    csvAnno = csv.reader(anno)
    annoList = list(csvAnno)

tempList = []
output = []

for full in mainList:
    geneName = full[2].lower()
    for annot in annoList:
        if geneName == annot[2].lower():
            tempList.extend(full)
            tempList.append(annot[3])
            tempList.append(annot[4])
            tempList.append(annot[5])
            tempList.append(annot[6])
            output.append(tempList)

        for i in tempList:
            del i

with open("location3", 'w') as final:
    a = csv.writer(final, delimiter=',')
    a.writerows(output)

我有两个包含15,000个字符串的csv文件，并且我希望比较每个列中的列，如果它们匹配，请将第二个csv的末尾连接到第一个csv的末尾。 任何帮助将不胜感激！

谢谢！

这样应该更有效：

import csv
from collections import defaultdict

with open("location1", 'r') as main:
  csvMain = csv.reader(main)
  mainList = list(csvMain)

with open("location2", 'r') as anno:
  csvAnno = csv.reader(anno)
  annoList = list(csvAnno)

output = []
annoMap = defaultdict(list)

for annot in annoList:
  tempList = annot[3:]  # adapt this to the needed columns
  annoMap[annot[2].lower()].append(tempList)  # put these columns into the map at position of the column of intereset

for full in mainList:
  geneName = full[2].lower()
  if geneName in annoMap:  # check if matching column exists
    output.extend(annoMap[geneName])

with open("location3", 'w') as final:
  a = csv.writer(final, delimiter=',')
  a.writerows(output)

由于您只需要遍历每个列表一次，因此效率更高。 字典中的查找平均为O（1），因此基本上可以得到线性算法。

一种简单的方法是使用像这样的库。 内置功能非常有效。

您可以使用pandas.read_csv()将csv加载到数据pandas.read_csv() ，然后使用pandas函数对其进行操作。

例如，您可以使用Pandas.merge()在特定列上合并两个数据Pandas.merge()也就是您的两个csv文件），然后删除不需要的数据Pandas.merge() 。

如果您有一些数据库知识，那么这里的逻辑非常相似。

谢谢@limes的帮助。 这是我使用的最后一个脚本，以为我会发布它以帮助他人。 再次感谢！

import csv
from collections import defaultdict

with open("location1", 'r') as main:
  csvMain = csv.reader(main)
  mainList = list(csvMain)

with open("location2", 'r') as anno:
  csvAnno = csv.reader(anno)
  annoList = list(csvAnno)

output = []
annoMap = defaultdict(list)

for annot in annoList:
  tempList = annot[3:]  # adapt this to the needed columns
  annoMap[annot[2].lower()].append(tempList)  # put these columns into the map at position of the column of intereset

for full in mainList:
  geneName = full[2].lower()
  if geneName in annoMap:  # check if matching column exists
    list = annoMap[geneName]
    full.extend(list[0])
    output.append(full)

with open("location3", 'w') as final:
 a = csv.writer(final, delimiter=',')
 a.writerows(output)

做这种搜索算法更有效的方法？

问题描述

1楼

2楼

3楼