1.21 LOAD CSV
LOAD CSV用于从CSV文件中导入数据。
- CSV文件的URL可以由FROM后面紧跟的任意表达式来指定。
- 需要使用AS来为CSV数据指定一个变量
- LOAD CSV支持以gzip,Deflate和ZIP压缩的资源
- CSV文件可以存在数据库服务器上,通过file:///URL来访问。LOAD CSV也支持通过HTTPS、HTTP和FTP来访问CSV文件。
- LOAD CSV支持HTTP重定向,但基于安全考虑,重定向时不能改变协议类型,比如从HTTPS重定向到HTTP。
(1)文件URL的配置项
dbms.security.allow_csv_import_from_file_urls
这个选项决定Cypher在使用LOAD CSV时是否支持使用fille:/// URL来加载数据。该URL唯一标识了数据库服务器文件系统上的文件。
dbms.security.allow_csv_import_from_file_urls=false//完全禁止LOAD CSV访问文件系统。
dbms.directories.import //设置LOAD CSV中file:/// URL中的根路径。
这必须设置为数据库服务器上的文件系统的单个目录,它让所有的请求从file:/// URL加载时都使用根路径的相对路径(类似unix下的chroot操作)。默认值是import,这是基于安全考虑阻止数据库访问标准的import之外的目录下的文件。将dbms.directories.import设置为空将移除这个安全考虑,允许访问系统上的任何文件,但是不推荐这么做。
文件URLs将相对于dbms.directories.import来解析。例如,一个典型的URL类似file:///myfile.csv或者file:///myproject/myfile.csv。
- 如果dbms.directories.import设置的是默认值import,那么在LOAD CSV语句将分别从<NEO4J_HOME>/import/myfile.csv和<NEO4J_HOME>/import/myproject/myfile.csv中读取数据。
- 如果设置为/data/csv,上面的LOAD CSV中的URL将分别从/data/csv/myfile.csv和/data/csv/myproject/myfile.csv中读取数据。
(2) 从本地文件导入
- 将要导入的文件存到路径:C:\Users\14513\.Neo4jDesktop\neo4jDatabases\database-d3841793-9d99-418c-8fcf-311c367b7323\installation-3.5.6\import 。
categories表描述商品种类信息:
customers表描述用户信息:
employees描述员工信息表:
orders表描述订单信息:
products表描述产品信息:
- 导入数据库的语句
//1、创建标签索引 tag::indexes[]
CREATE INDEX ON :Product(productID);
CREATE INDEX ON :Product(productName);
CREATE INDEX ON :Category(categoryID);
CREATE INDEX ON :Employee(employeeID);
CREATE INDEX ON :Supplier(supplierID);
CREATE INDEX ON :Customer(customerID);
CREATE INDEX ON :Customer(customerName);
// 2、Create customers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/customers.csv" AS row
CREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID, fax: row.Fax, phone: row.Phone});
// 3、Create products
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/products.csv" AS row
CREATE (:Product {productName: row.ProductName, productID: row.ProductID, unitPrice: toFloat(row.UnitPrice)});
//4、 Create suppliers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/suppliers.csv" AS row
CREATE (:Supplier {companyName: row.CompanyName, supplierID: row.SupplierID});
//5、 Create employees
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/employees.csv" AS row
CREATE (:Employee {employeeID:row.EmployeeID, firstName: row.FirstName, lastName: row.LastName, title: row.Title});
// 6、Create categories
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/categories.csv" AS row
CREATE (:Category {categoryID: row.CategoryID, categoryName: row.CategoryName, description: row.Description});
//7、create orders
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/orders.csv" AS row
MERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName = row.ShipName;
// end::nodes[]// tag::constraints[]
CREATE CONSTRAINT ON (o:Order) ASSERT o.orderID IS UNIQUE;
// end::constraints[]
// 8、创建订单与产品的关系tag::rels_orders[]
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (product:Product {productID: row.ProductID})
MERGE (order)-[pu:PRODUCT]->(product)
ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity = toFloat(row.Quantity);
//9、创建员工和订单之间的关系USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);
//10、创建用户和订单的关系
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (customer:Customer {customerID: row.CustomerID})
MERGE (customer)-[:PURCHASED]->(order);
// end::rels_orders[]
//11、供应商与产品的关系 tag::rels_products[]
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (supplier:Supplier {supplierID: row.SupplierID})
MERGE (supplier)-[:SUPPLIES]->(product);
//12、创建产品与种类的关系
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (category:Category {categoryID: row.CategoryID})
MERGE (product)-[:PART_OF]->(category);
// end::rels_products[]
//13、创建员工与管理者的关系 tag::rels_employees[]
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/employees.csv" AS row
MATCH (employee:Employee {employeeID: row.EmployeeID})
MATCH (manager:Employee {employeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);
// end::rels_employees[]
- 执行后的结果
(3)从HTTP导入
LOAD CSV FROM ' http://neo4j.com/docs/developer-manual/3.1/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInt(line[2])})
1.21.1 csv文件格式
使用LOAD CSV导入的CSV文件必须满足如下要求:
- 字符编码为UTF-8;
- 行结束符与操作系统关联,如unix上为\n,windows上为\r\n;
- 默认的字段终止符为,;
- 字段终止符可以使用LOAD CSV中 的FIELDTERMINATOR选项来修改。
- CSV文件允许引号字符串,但读取数据的时候引号字符会被丢弃。
- 字符串的引号字符为双引号"。
- 转义字符为\。
1.21.2 从CSV文件导入数据
从CSV文件导入数据到Neo4j,可以用LOAD CSV把数据加载到查询语句中。然后使用正常的Cypher更新语句将数据写入到数据库中。
artists.csv"1","ABBA","1992""2","Roxette","1986""3","Europe","1979""4","The Cardigans","1992"
LOAD CSV FROM ' http://neo4j.com/docs/developer-manual/3.1/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInt(line[2])})
CSV文件中的每一行都创建一个标签为Artist的节点。CSV文件中的另外两列分别设置为节点的属性。
同时创建节点和关系
LOAD CSV FROM ' http://neo4j.com/docs/developer-manual/3.1/csv/artists.csv' AS line
CREATE (n:Artist { name: line[1], year: toInt(line[2])}), (n:Artist { name: line[1], year: toInt(line[2])})
CREATE (n)-[r: line[1]]->(m)
1.21.3 导入包含文件头的CSV文件
当导入的CSV文件包含文件头时,可以把每一行看作一个map,而不是字符串数组。
artists.csv"Id","Name","Year""1","ABBA","1992""2","Roxette","1986""3","Europe","1979""4","The Cardigans","1992"
LOAD CSV WITH HEADERS FROM 'http://neo4j.com/docs/developer-manual/3.1/csv/artists-with-headers.csv' AS line
CREATE (:Artist { name: line.Name, year: toInt(line.Year)})
这时,文件的开始行包含列的名称。指定WITH HEADERS后,可以通过对应的列名来访问指定的字段。
1.21.4 导入自定义分隔符的CSV文件
CSV文件的分隔符有时候不是逗号,而是其他分隔符。这时,可以使用FIELDTERMINATOR来指定。
artists-fieldterminator.csv"1";"ABBA";"1992""2";"Roxette";"1986""3";"Europe";"1979""4";"The Cardigans";"1992"
LOAD CSV FROM 'http://neo4j.com/docs/developer-manual/3.1/csv/artists-fieldterminator.csv' AS line FIELDTERMINATOR ';'
CREATE (:Artist { name: line[1], year: toInt(line[2])})
本例中字段之间以分号分隔,因此,在LOAD CSV中使用了FIELDTERMINATOR自定义分隔符。
1.21.5 导入巨量数据
如果导入的CSV文件包含百万数量级的行,可以使用USING PERIODIC COMMIT来指导Neo4j每导入一定数量行之后就提交(commit)一次。这样可避免在事务过程中耗费大量的内存。默认情况下,每1000行会提交一次。
USING PERIODIC COMMIT
LOAD CSV FROM 'http://neo4j.com/docs/developer-manual/3.1/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInt(line[2])})
1.21.6 设置提交频率
可以设置提交的频率,如本例中设置为500行。
USING PERIODIC COMMIT 500
LOAD CSV FROM 'http://neo4j.com/docs/developer-manual/3.1/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInt(line[2])})
1.21.7导入包含转义字符的数据
本例中同时包含了引用字符和转义字符。
artists-with-escaped-char.csv"1","The ""Symbol""","1992"
LOAD CSV FROM 'http://neo4j.com/docs/developer-manual/3.1/csv/artists-with-escaped-char.csv' AS line
CREATE (a:Artist { name: line[1], year: toInt(line[2])})
RETURN a.name AS name, a.year AS year, length(a.name) AS length
注意:这里的字符串用双引号括起来。同时,关注本例中字符串的长度。