复制
收藏
提问
简洁
We’re going to look at some data on different varieties of ramen, from a dataset provided by residentmario on Kaggle.com! Include the code you use for your analyses. (a) Use pandas to import Ramen-Ratings.csv. Look at the data to get a feel for it. How many columns are there? How many observations/rows? (b) Create two new DataFrames containing ramen ratings from the USA and Japan. Label these as usa_ramen and japan_ramen respectively. (c) For each country, find the number of unique brands rated in that country. Print out both of these values. (d) Use the .unique() to look at the different entries in the Stars column in both the USA and Japan ramen DataFrames. What is the datatype of the entries? If there are any unusual entries, remove them. (e) Pandas contains a function to_numeric, which converts the data type in a pandas series or DataFrame to a numerical type like int or float. You can access this with pd.to_numeric(). Use this to convert the Stars column to a numerical type. If you get a SettingWithCopy warning, it’s okay to ignore it in this case. (f) Compute and print the average ratings for the USA ramen reviews and the Japanese ramen reviews. (g) Make a histogram plot of the USA ramen star scores. (h) Here is an approximation of the probability density functions for the distribution of star ratings for the USA and Japanese ramen respectively. 每一题的答案是什么
1个月前
全网
文档
学术
百科
知识库
回答简洁模式
深入研究一些
导入数据和初步了解
使用 pandas 导入数据集并进行初步了解。
import pandas as pd
# 导入数据
ramen_data = pd.read_csv('Ramen-Ratings.csv')
# 查看数据的列数和行数
print(ramen_data.shape)
创建国家特定数据集
根据国家筛选数据并创建新的 DataFrame。
# 创建美国和日本的拉面评分数据集
usa_ramen = ramen_data[ramen_data['country'] == 'USA']
japan_ramen = ramen_data[ramen_data['country'] == 'Japan']
计算国家品牌数量
统计每个国家的品牌数量。
# 计算每个国家的品牌数量
print(usa_ramen['brand'].nunique())
print(japan_ramen['brand'].nunique())
检查和清洗星级数据
检查星级数据并清洗异常值。
# 查看星级数据的类型和不同条目
print(usa_ramen['stars'].unique())
print(japan_ramen['stars'].unique())
# 清洗数据,移除异常值
usa_ramen_clean = usa_ramen.dropna(subset=['stars'])
japan_ramen_clean = japan_ramen.dropna(subset=['stars'])
转换星级数据类型
将星级数据转换为数值类型。
# 转换星级数据类型
usa_ramen_clean['stars'] = pd.to_numeric(usa_ramen_clean['stars'])
japan_ramen_clean['stars'] = pd.to_numeric(japan_ramen_clean['stars'])
计算平均评分
计算美国和日本拉面的平均评分。
# 计算平均评分
print(usa_ramen_clean['stars'].mean())
print(japan_ramen_clean['stars'].mean())
绘制美国拉面评分直方图
绘制美国拉面评分的直方图。
import matplotlib.pyplot as plt
# 绘制直方图
usa_ramen_clean['stars'].plot(kind='hist', title='USA Ramen Star Scores')
plt.show()
概率密度函数近似
对美国和日本拉面评分分布进行概率密度函数的近似。
# 假设有数据集的密度函数近似值
# 这里没有具体数据,无法提供代码实现
你觉得结果怎么样?