问题描述
本质上,我有一个大文件(?1 GB),其中每行都是一个JSON对象,其中包含嵌套的属性,其中一些值可能是对象数组。
对象的示例:
{
"business_id": "b9WZJp5L1RZr4F1nxclOoQ",
"full_address": "1073 Washington Ave\nCarnegie, PA 15106",
"hours": {
"Monday": {
"close": "14:30",
"open": "06:00"
},
"Tuesday": {
"close": "14:30",
"open": "06:00"
},
"Friday": {
"close": "14:30",
"open": "06:00"
},
"Wednesday": {
"close": "14:30",
"open": "06:00"
},
"Thursday": {
"close": "14:30",
"open": "06:00"
},
"Sunday": {
"close": "12:30",
"open": "07:00"
},
"Saturday": {
"close": "12:30",
"open": "06:00"
}
},
"open": true,
"categories": ["Breakfast & Brunch", "Restaurants"],
"city": "Carnegie",
"review_count": 38,
"name": "Gab & Eat",
"neighborhoods": [],
"longitude": -80.084799799999999,
"state": "PA",
"stars": 4.5,
"latitude": 40.396744099999999,
"attributes": {
"Alcohol": "none",
"Noise Level": "average",
"Has TV": true,
"Attire": "casual",
"Ambience": {
"romantic": false,
"intimate": false,
"classy": false,
"hipster": false,
"divey": true,
"touristy": false,
"trendy": false,
"upscale": false,
"casual": true
},
"Good for Kids": true,
"Wheelchair Accessible": false,
"Delivery": false,
"Caters": true,
"BYOB": false,
"Corkage": false,
"Accepts Credit Cards": false,
"BYOB/Corkage": "yes_free",
"Take-out": true,
"Price Range": 1,
"Outdoor Seating": false,
"Takes Reservations": false,
"Waiter Service": true,
"Wi-Fi": "no",
"Order at Counter": true,
"Good For": {
"dessert": false,
"latenight": false,
"lunch": false,
"dinner": false,
"brunch": false,
"breakfast": true
},
"Parking": {
"garage": false,
"street": false,
"validated": false,
"lot": true,
"valet": false
},
"Good For Kids": true,
"Good For Groups": false
},
"type": "business"
},
如何将其展平并将其转换为CSV,以便有一个具有属性(列)的单个对象,例如business_id
, hours.Monday.close
, attributes.Ambience.hipster
等?
问题在于,并非所有对象都具有所有属性,因此我需要扫描整个文件以获取所有可能的平面属性的列表。 本质上,我试图模仿的功能,但对于数组值属性,我没有将其拆分为多列,而是将整个数组字符串作为值存储在CSV中。
如何使用Python或.NET完成此操作?
1楼
这可行。 可能需要进一步扩展以钻取数组等。使用Newtonsoft的JSON库,并假定JSON字符串是一个对象 ,而不是数组或基元(或其他任何东西)
void Main()
{
var obj = JsonConvert.DeserializeObject(jsonStr) as JObject;
var props = GetPropPaths(string.Empty, obj);
props.Dump();
}
private IEnumerable<Tuple<string, string>> GetPropPaths(string currPath, JObject obj)
{
foreach(var prop in obj.Properties())
{
var propPath = string.IsNullOrWhiteSpace(currPath) ? prop.Name : currPath + "." + prop.Name;
if (prop.Value.Type == JTokenType.Object)
{
foreach(var subProp in GetPropPaths(propPath, prop.Value as JObject))
yield return subProp;
} else {
yield return new Tuple<string, string>(propPath, prop.Value.ToString());
}
}
}
对于您上面的json,它提供了以下内容: