首页 文章 精选 留言 我的

精选列表

搜索[优化],共10000篇文章
优秀的个人博客,低调大师

spring api接口返回数据优化 —— 只返回需要的字段数据

概述 spring/spring boot 返回的json数据,通常包含了对象所有的字段,有时候浪费流量。例如一个接口有10个字段,而前端只需要2个字段,都返回会浪费流量。解决方案:前端在header中传递需要包含或需要排除的字段;后端在返回数据前进行统一拦截,只返回需要的字段。具有有多种实现方式(这里只提供spring boot)。 首先约定返回的BaseResult对象格式如下,里面result属性就是实际各种数据对象。 { "ret":0, "msg":null, "result":{ "id":1, "name":"后摄像头53" }, "time":1540972430498 } 实现方式一:通过AOP controller来实现 aop实现步骤说明: 判断返回的是不是BaseResult对象 判断request header或params是否有x-include-fields、x-exclude-fields属性(有则取出来放入set中) 满足以上条件则对BaseResult.result 对象进行处理,用Map替换result对象,Map只返回需要的字段。如果是Array或Collection则每个Item替换成一个Map。 import com.cehome.cloudbox.common.object.BaseResult; import com.cehome.cloudbox.common.object.ItemsResult; import com.cehome.cloudbox.common.object.PageResult; import com.cehome.cloudbox.common.page.Page; import com.cehomex.spring.feign.FeignRequestHolder; import org.apache.commons.beanutils.PropertyUtils; import org.apache.commons.lang3.StringUtils; import org.aspectj.lang.ProceedingJoinPoint; import org.aspectj.lang.annotation.Around; import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Pointcut; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.web.context.request.RequestContextHolder; import org.springframework.web.context.request.ServletRequestAttributes; import javax.servlet.http.HttpServletRequest; import java.beans.PropertyDescriptor; import java.util.*; @Aspect public class ControllerAOP { private static String INCLUDE_FIELDS = "x-include-fields"; private static String EXCLUDE_FIELDS = "x-exclude-fields"; private static String P_INCLUDE_FIELDS = "x-include-fields"; private static String P_EXCLUDE_FIELDS = "x-exclude-fields"; private static final Logger logger = LoggerFactory.getLogger(ControllerAOP.class); @Pointcut("within(@org.springframework.stereotype.Controller *)") public void controller() { } @Pointcut("within(@org.springframework.web.bind.annotation.RestController *)") public void restController() { } @Around("(controller() || restController()) && execution(public * *(..))") public Object proceed(ProceedingJoinPoint joinPoint) throws Throwable { try { Object object=joinPoint.proceed(); ServletRequestAttributes requestAttributes = (ServletRequestAttributes) RequestContextHolder .getRequestAttributes(); if (requestAttributes != null) { HttpServletRequest request = requestAttributes.getRequest(); handleReturnValue(object,request); } return object; } finally { FeignRequestHolder.removeAll(); } } /** * 返回前端需要的字段 * @param o * @param request * @throws Exception */ public void handleReturnValue(Object o,HttpServletRequest request) throws Exception { if(!isSuccess(o)) return; //HttpServletRequest request = nativeWebRequest.getNativeRequest(HttpServletRequest.class); //HttpServletResponse response = nativeWebRequest.getNativeResponse(HttpServletResponse.class); String fields1 = StringUtils.trimToEmpty(request.getHeader(INCLUDE_FIELDS)); if(fields1.length()==0) fields1 = StringUtils.trimToEmpty(request.getParameter(P_INCLUDE_FIELDS)); String fields2 = StringUtils.trimToEmpty(request.getHeader(EXCLUDE_FIELDS)); if(fields2.length()==0) fields2 = StringUtils.trimToEmpty(request.getParameter(P_EXCLUDE_FIELDS)); if (fields1.length() > 0 || fields2.length() > 0) { Set<String> includes = fields1.length() == 0 ? new HashSet<>() : new HashSet<>(Arrays.asList(fields1.split(","))); Set<String> excludes = fields2.length() == 0 ? new HashSet<>() : new HashSet<>(Arrays.asList(fields2.split(","))); if (o instanceof BaseResult) { BaseResult result = (BaseResult) o; Object object = result.getResult(); result.setResult(convertResult(object, includes, excludes)); } else if (o instanceof ItemsResult) { ItemsResult result = (ItemsResult) o; Object object = result.getItems(); result.setItems(convertResult(object, includes, excludes)); } else if (o instanceof PageResult) { PageResult result = (PageResult) o; Object object = result.getPage(); if (object instanceof Page) { Page page=(Page) object; List datas = page.getDatas(); page.setDatas((List)convertResult(datas, includes, excludes)); } } } } private boolean isSuccess(Object object){ if(object==null) return false; if (object instanceof BaseResult) return ( (BaseResult) object).isSuccess(); if (object instanceof ItemsResult) return ( (ItemsResult) object).isSuccess(); if (object instanceof PageResult) return ( (PageResult) object).isSuccess(); return false; } /*private void handleObject(Object object, Set<String> includes, Set<String> excludes) throws Exception { PropertyDescriptor[] pds = PropertyUtils.getPropertyDescriptors(object); for (PropertyDescriptor pd : pds) { String name = pd.getName(); if (name.equals("class")) { continue; } if (excludes.contains(name) || !includes.contains(name)) { PropertyUtils.setProperty(object, name, null); } } }*/ /** * convert objects to maps * @param object * @param includes * @param excludes * @return * @throws Exception */ private Object convertResult(Object object, Set<String> includes, Set<String> excludes) throws Exception{ if (object instanceof Object[]) { Object[] objects = (Object[]) object; return convertArray(objects,includes,excludes); } else if (object instanceof Collection) { Collection collection = (Collection) object; return convertCollection(collection,includes,excludes); }else{ return convertObject(object,includes,excludes); } } private Collection<Map> convertCollection(Collection collection, Set<String> includes, Set<String> excludes) throws Exception{ Collection<Map> result=new ArrayList<>(); for (Object item : collection) { result.add(convertObject(item,includes,excludes)); } return result; } private Map[] convertArray(Object[] objects, Set<String> includes, Set<String> excludes) throws Exception{ Map[] result=new HashMap[objects.length]; for(int i=0;i<objects.length;i++){ result[i]=convertObject(objects[i],includes,excludes); } return result; } /** * convert object to map * @param object input * @param includes include props * @param excludes exclude props * @return * @throws Exception */ private Map convertObject(Object object, Set<String> includes, Set<String> excludes) throws Exception { Map<Object,Object> result=new HashMap<>(); if(!(object instanceof Map)) { PropertyDescriptor[] pds = PropertyUtils.getPropertyDescriptors(object); for (PropertyDescriptor pd : pds) { String name = pd.getName(); if (name.equals("class")) { continue; } if(!excludes.isEmpty() && excludes.contains(name)){ continue; } if(!includes.isEmpty() && !includes.contains(name)){ continue; } result.put(name,PropertyUtils.getProperty(object, name)); } }else { Map<Object,Object> map=(Map<Object,Object>) object; for(Map.Entry<Object,Object> entry :map.entrySet()){ String name= entry.getKey()==null?"":entry.getKey().toString(); if(!excludes.isEmpty() && excludes.contains(name)){ continue; } if(!includes.isEmpty() && !includes.contains(name)){ continue; } result.put(entry.getKey(),entry.getValue()); } } return result; } } 用Map替换的方式改变了原来的对象,还有一种效率更好的不改变对象的方式,就是把不需要返回的字段设为null,然后配置一个JSON处理bean,统一过滤null的字段。这种方式null的字段就不再返回前端,需要前端做些兼容。 @Bean @Primary @ConditionalOnMissingBean(ObjectMapper.class) public ObjectMapper jacksonObjectMapper(Jackson2ObjectMapperBuilder builder) { ObjectMapper objectMapper = builder.createXmlMapper(false).build(); objectMapper.setSerializationInclusion(JsonInclude.Include.NON_NULL); return objectMapper; } 实现方式二:自定义HttpMessageConverter来实现 spring boot缺省包含了好几个消息转换器,根据返回媒体类型进行匹配,第一个匹配上就忽略掉其它的了。MappingJackson2HttpMessageConverter 是其处理JSON的消息转换器。 所以,需要先删除缺省的MappingJackson2HttpMessageConverter 继承MappingJackson2HttpMessageConverter,实现自定义的消息转换。 import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONObject; import com.fasterxml.jackson.annotation.JsonInclude; import com.fasterxml.jackson.databind.MapperFeature; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.module.SimpleModule; import com.fasterxml.jackson.databind.ser.impl.SimpleBeanPropertyFilter; import com.fasterxml.jackson.databind.ser.impl.SimpleFilterProvider; import com.fasterxml.jackson.databind.ser.std.ToStringSerializer; import java.io.IOException; import java.lang.reflect.Type; import java.nio.charset.Charset; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import org.springframework.context.annotation.Configuration; import org.springframework.http.HttpOutputMessage; import org.springframework.http.MediaType; import org.springframework.http.converter.HttpMessageConverter; import org.springframework.http.converter.HttpMessageNotWritableException; import org.springframework.http.converter.json.Jackson2ObjectMapperBuilder; import org.springframework.http.converter.json.MappingJackson2HttpMessageConverter; import org.springframework.http.converter.json.MappingJacksonValue; import org.springframework.web.bind.annotation.ControllerAdvice; import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter; @Configuration public class WebConfig extends WebMvcConfigurerAdapter { @Override public void extendMessageConverters(List<HttpMessageConverter<?>> converters) { //-- 移除缺省的JSON处理器 for (int i = converters.size() - 1; i >= 0; i--) { HttpMessageConverter<?> messageConverter = converters.get(i); if (messageConverter instanceof org.springframework.http.converter.json.MappingJackson2HttpMessageConverter) converters.remove(i); } // -- 添加自己得JSON处理器 MappingJackson2HttpMessageConverter c = new MappingJackson2HttpMessageConverter() { @Override protected void writeInternal(Object object, Type type, HttpOutputMessage outputMessage) throws IOException, HttpMessageNotWritableException { //-- 例子一: 转成fastjson对象,然后替换name字段 JSONObject json = (JSONObject) JSON.toJSON(object); json.getJSONObject("result").put("name", "coolma"); super.writeInternal(json, type, outputMessage); //-- 例子二: 用过滤器只保留name字段,其它字段不要。 //注意,例子二需要给BaseResult对象的result属性加上com.fasterxml.jackson.annotation.JsonFilter注解: // @JsonFilter("result") // private T result; MappingJacksonValue value = new MappingJacksonValue(object); value.setFilters(new SimpleFilterProvider().addFilter("result", SimpleBeanPropertyFilter.filterOutAllExcept("name"))); super.writeInternal(value, type, outputMessage); } }; c.setDefaultCharset(Charset.forName("UTF-8")); List<MediaType> mediaTypes = new ArrayList<>(); mediaTypes.add(MediaType.APPLICATION_JSON_UTF8); c.setSupportedMediaTypes(mediaTypes); converters.add(c); } }

优秀的个人博客,低调大师

使用C#创建windows服务续之使用Topshelf优化Windows服务

前言: 之前写了一篇“使用C#创建windows服务”,https://www.cnblogs.com/huangwei1992/p/9693167.html,然后有博友给我推荐了一个开源框架Topshelf。 写了一点测试代码,发现Topshelf框架确实在创建windows服务上非常好用,于是就对我之前的代码进行了改造。 开发流程: 1.在不使用Topshelf框架的情况下,我们需要创建Windows服务程序,在这里我们只需要创建一个控制台程序就行了 2.添加引用 使用程序安装命令: Install-Package Topshelf 直接在NuGet包管理器中搜索Topshelf,点击安装即可: 3.新建核心类CloudImageManager 主要方法有三个:LoadCloudImage、Start、Stop,直接贴代码 /// <summary> /// 功能描述 :卫星云图下载管理器 /// 创 建 者 :Administrator /// 创建日期 :2018/9/25 14:29:03 /// 最后修改者 :Administrator /// 最后修改日期:2018/9/25 14:29:03 /// </summary> public class CloudImageManager { private string _ImagePath = System.Configuration.ConfigurationManager.AppSettings["Path"]; private Timer _Timer = null; private double Interval = double.Parse(System.Configuration.ConfigurationManager.AppSettings["Minutes"]); public CloudImageManager() { _Timer = new Timer(); _Timer.Interval = Interval * 60 * 1000; _Timer.Elapsed += _Timer_Elapsed; } void _Timer_Elapsed(object sender, ElapsedEventArgs e) { StartLoad(); } /// <summary> /// 开始下载云图 /// </summary> private void StartLoad() { LoadCloudImage(); } public void Start() { StartLoad(); _Timer.Start(); } public void Stop() { _Timer.Stop(); } /// <summary> /// 下载当天所有卫星云图 /// </summary> private void LoadCloudImage() { CreateFilePath();//判断文件夹是否存在,不存在则创建 //获取前一天日期 string lastYear = DateTime.Now.AddDays(-1).Year.ToString(); string lastMonth = DateTime.Now.AddDays(-1).Month.ToString(); if (lastMonth.Length < 2) lastMonth = "0" + lastMonth; string lastDay = DateTime.Now.AddDays(-1).Day.ToString(); if (lastDay.Length < 2) lastDay = "0" + lastDay; //获取当天日期 string year = DateTime.Now.Year.ToString(); string month = DateTime.Now.Month.ToString(); if (month.Length < 2) month = "0" + month; string day = DateTime.Now.Day.ToString(); if (day.Length < 2) day = "0" + day; //设置所有文件名 string[] dates0 = { lastYear + "/" + lastMonth + "/" + lastDay, year + "/" + month + "/" + day }; string[] dates = { lastYear + lastMonth + lastDay, year + month + day }; string[] hours = { "00", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23" }; string[] minutes = { "15", "45" }; int hLength = hours.Count(); //遍历下载当天所有在线云图 for (int i = 0; i < 2; i++) { string date = dates[i]; string date0 = dates0[i]; for (int j = 0; j < hLength; j++) { string hour = hours[j]; for (int k = 0; k < 2; k++) { string minute = minutes[k]; string imageUrl = @"http://image.nmc.cn/product/" + date0 + @"/WXCL/SEVP_NSMC_WXCL_ASC_E99_ACHN_LNO_PY_" + date + hour + minute + "00000.JPG"; string[] s = imageUrl.Split('/'); string imageName = s[s.Count() - 1]; HttpWebRequest request = HttpWebRequest.Create(imageUrl) as HttpWebRequest; HttpWebResponse response = null; try { response = request.GetResponse() as HttpWebResponse; } catch (Exception) { continue; } if (response.StatusCode != HttpStatusCode.OK) continue; Stream reader = response.GetResponseStream(); FileStream writer = new FileStream(_ImagePath + imageName, FileMode.OpenOrCreate, FileAccess.Write); byte[] buff = new byte[512]; int c = 0; //实际读取的字节数 while ((c = reader.Read(buff, 0, buff.Length)) > 0) { writer.Write(buff, 0, c); } writer.Close(); writer.Dispose(); reader.Close(); reader.Dispose(); response.Close(); } } } } /// <summary> /// 判断文件夹是否存在,不存在则创建 /// </summary> private void CreateFilePath() { if (Directory.Exists(_ImagePath)) { ClearImages(); return; } else { Directory.CreateDirectory(_ImagePath); } } /// <summary> /// 清空文件夹下所有文件 /// </summary> private void ClearImages() { try { DirectoryInfo dir = new DirectoryInfo(_ImagePath); FileSystemInfo[] fileinfo = dir.GetFileSystemInfos(); //返回目录中所有文件和子目录 foreach (FileSystemInfo i in fileinfo) { if (i is DirectoryInfo) //判断是否文件夹 { DirectoryInfo subdir = new DirectoryInfo(i.FullName); subdir.Delete(true); //删除子目录和文件 } else { File.Delete(i.FullName); //删除指定文件 } } } catch (Exception e) { Console.WriteLine(e.Message); } } } 然后在Program.cs中调用: static void Main(string[] args) { HostFactory.Run(x => //1 { x.Service<CloudImageManager>(s => //2 { s.ConstructUsing(name => new CloudImageManager()); //3 s.WhenStarted(tc => tc.Start()); //4 s.WhenStopped(tc => tc.Stop()); //5 }); x.RunAsLocalSystem(); //6 x.SetDescription("卫星云图实时下载工具"); //7 x.SetDisplayName("CloudImageLoad"); //8 x.SetServiceName("CloudImageLoad"); //9 }); } 可以看到调用的时候主要涉及到CloudImageManager类中的构造函数、Start方法以及Stop方法 安装、运行和卸载: 在Topshelf框架下进行服务的这些操作相对而言就简单多了 安装:Topshelf.CloudImageLoad.exe install 启动:Topshelf.CloudImageLoad.exe start 卸载:Topshelf.CloudImageLoad.exe uninstall 操作界面如下:(注意:必须用管理员身份运行命令提示符) 在这里只贴出了安装命令的截图,其他命令相信就不用多说了。 查看服务列表,这时我们的服务就已经安装成功了 参考链接: http://www.cnblogs.com/jys509/p/4614975.html

优秀的个人博客,低调大师

Scikit中的特征选择,XGboost进行回归预测,模型优化的实战

前天偶然在一个网站上看到一个数据分析的比赛(sofasofa),自己虽然学习一些关于机器学习的内容,但是并没有在比赛中实践过,于是我带着一种好奇心参加了这次比赛。 赛题:足球运动员身价估计比赛概述 本比赛为个人练习赛,主要针对于于数据新人进行自我练习、自我提高,与大家切磋。 练习赛时限:2018-03-05 至 2020-03-05 任务类型:回归 背景介绍: 每个足球运动员在转会市场都有各自的价码。本次数据练习的目的是根据球员的各项信息和能力值来预测该球员的市场价值。 根据以上描述,我们很容易可以判断出这是一个回归预测类的问题。当然,要想进行预测,我们首先要做的就是先看看数据的格式以及内容(由于参数太多,我就不一一列举了,大家可以直接去网上看,下面我简单贴个图): 简单了解了数据的格式以及大小以后,由于没有实践经验,我就凭自己的感觉,单纯的认为一下几个字段可能是最重要的: 字段 含义 club 该球员所属的俱乐部。该信息已经被编码。 league 该球员所在的联赛。已被编码。 potential 球员的潜力。数值变量。 international_reputation 国际知名度。数值变量。 巧合的是刚好这些字段都没有缺失值,我很开心啊,心想着可以直接利用XGBoost模型进行预测了。具体XGBoost的使用方法,可以参考:XGBoost以及官方文档XGBoost Parameters。说来就来,我开始了coding工作,下面就贴出我的第一版代码: #!/usr/bin/env python # -*- coding: utf-8 -*- # @File : soccer_value.py # @Author: Huangqinjian # @Date : 2018/3/22 # @Desc : import pandas as pd import matplotlib.pyplot as plt import xgboost as xgb import numpy as np from xgboost import plot_importance from sklearn.preprocessing import Imputer def loadDataset(filePath): df = pd.read_csv(filepath_or_buffer=filePath) return df def featureSet(data): data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]['club']) tmp_list.append(data.iloc[row]['league']) tmp_list.append(data.iloc[row]['potential']) tmp_list.append(data.iloc[row]['international_reputation']) XList.append(tmp_list) yList = data.y.values return XList, yList def loadTestData(filePath): data = pd.read_csv(filepath_or_buffer=filePath) data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]['club']) tmp_list.append(data.iloc[row]['league']) tmp_list.append(data.iloc[row]['potential']) tmp_list.append(data.iloc[row]['international_reputation']) XList.append(tmp_list) return XList def trainandTest(X_train, y_train, X_test): # XGBoost训练过程 model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=160, silent=False, objective='reg:gamma') model.fit(X_train, y_train) # 对测试集进行预测 ans = model.predict(X_test) ans_len = len(ans) id_list = np.arange(10441, 17441) data_arr = [] for row in range(0, ans_len): data_arr.append([int(id_list[row]), ans[row]]) np_data = np.array(data_arr) # 写入文件 pd_data = pd.DataFrame(np_data, columns=['id', 'y']) # print(pd_data) pd_data.to_csv('submit.csv', index=None) # 显示重要特征 # plot_importance(model) # plt.show() if __name__ == '__main__': trainFilePath = 'dataset/soccer/train.csv' testFilePath = 'dataset/soccer/test.csv' data = loadDataset(trainFilePath) X_train, y_train = featureSet(data) X_test = loadTestData(testFilePath) trainandTest(X_train, y_train, X_test) 然后我就把得到的结果文件submit.csv提交到网站上,看了结果,MAE为106.6977,排名24/28,很不理想。不过这也在预料之中,因为我基本没有进行特征处理。 我当然不满意啦,一直想着怎么能提高准确率呢?后来就想到了可以利用一下scikit这个库啊!在scikit中包含了一个特征选择的模块sklearn.feature_selection,而在这个模块下面有以下几个方法: Removing features with low variance(剔除低方差的特征) Univariate feature selection(单变量特征选择) Recursive feature elimination(递归功能消除) Feature selection using SelectFromModel(使用SelectFromModel进行特征选择) 我首先想到的是利用单变量特征选择的方法选出几个跟预测结果最相关的特征。根据官方文档,有以下几种得分函数来检验变量之间的依赖程度: 对于回归问题: f_regression, mutual_info_regression 对于分类问题: chi2, f_classif, mutual_info_classif 由于这个比赛是一个回归预测问题,所以我选择了f_regression这个得分函数(刚开始我没有注意,错误使用了分类问题中的得分函数chi2,导致程序一直报错!心很累~) f_regression的参数: sklearn.feature_selection.f_regression(X, y, center=True) X:一个多维数组,大小为(n_samples, n_features),即行数为训练样本的大小,列数为特征的个数y:一个一维数组,长度为训练样本的大小return:返回值为特征的F值以及p值 不过在进行这个操作之前,我们还有一个重大的任务要完成,那就是对于空值的处理!幸运的是scikit中也有专门的模块可以处理这个问题:Imputation of missing values sklearn.preprocessing.Imputer的参数:sklearn.preprocessing.Imputer(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True) 其中strategy代表对于空值的填充策略(默认为mean,即取所在列的平均数进行填充): strategy='median',代表取所在列的中位数进行填充 strategy='most_frequent', 代表取所在列的众数进行填充 axis默认值为0: axis=0,代表按列进行填充 axis=1,代表按行进行填充 其他具体参数可以参考:sklearn.preprocessing.Imputer 根据以上,我对数据进行了一些处理: from sklearn.feature_selection import f_regression from sklearn.preprocessing import Imputer imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) imputer.fit(data.loc[:, 'rw':'lb']) x_new = imputer.transform(data.loc[:, 'rw':'lb']) data_num = len(x_new) XList = [] yList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(x_new[row][0]) tmp_list.append(x_new[row][1]) tmp_list.append(x_new[row][2]) tmp_list.append(x_new[row][3]) tmp_list.append(x_new[row][4]) tmp_list.append(x_new[row][5]) tmp_list.append(x_new[row][6]) tmp_list.append(x_new[row][7]) tmp_list.append(x_new[row][8]) tmp_list.append(x_new[row][9]) XList.append(tmp_list) yList.append(data.iloc[row]['y']) F = f_regression(XList, yList) print(len(F)) print(F) 测试结果: 2 (array([2531.07587725, 1166.63303449, 2891.97789543, 2531.07587725, 2786.75491791, 2891.62686404, 3682.42649607, 1394.46743196, 531.08672792, 1166.63303449]), array([0.00000000e+000, 1.74675421e-242, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.37584507e-286, 1.15614152e-114, 1.74675421e-242])) 根据以上得到的结果,我选取了rw,st,lw,cf,cam,cm(选取F值相对大的)几个特征加入模型之中。以下是我改进后的代码: #!/usr/bin/env python # -*- coding: utf-8 -*- # @File : soccer_value.py # @Author: Huangqinjian # @Date : 2018/3/22 # @Desc : import pandas as pd import matplotlib.pyplot as plt import xgboost as xgb import numpy as np from xgboost import plot_importance from sklearn.preprocessing import Imputer def loadDataset(filePath): df = pd.read_csv(filepath_or_buffer=filePath) return df def featureSet(data): imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) imputer.fit(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) x_new = imputer.transform(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]['club']) tmp_list.append(data.iloc[row]['league']) tmp_list.append(data.iloc[row]['potential']) tmp_list.append(data.iloc[row]['international_reputation']) tmp_list.append(data.iloc[row]['pac']) tmp_list.append(data.iloc[row]['sho']) tmp_list.append(data.iloc[row]['pas']) tmp_list.append(data.iloc[row]['dri']) tmp_list.append(data.iloc[row]['def']) tmp_list.append(data.iloc[row]['phy']) tmp_list.append(data.iloc[row]['skill_moves']) tmp_list.append(x_new[row][0]) tmp_list.append(x_new[row][1]) tmp_list.append(x_new[row][2]) tmp_list.append(x_new[row][3]) tmp_list.append(x_new[row][4]) tmp_list.append(x_new[row][5]) XList.append(tmp_list) yList = data.y.values return XList, yList def loadTestData(filePath): data = pd.read_csv(filepath_or_buffer=filePath) imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) imputer.fit(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) x_new = imputer.transform(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]['club']) tmp_list.append(data.iloc[row]['league']) tmp_list.append(data.iloc[row]['potential']) tmp_list.append(data.iloc[row]['international_reputation']) tmp_list.append(data.iloc[row]['pac']) tmp_list.append(data.iloc[row]['sho']) tmp_list.append(data.iloc[row]['pas']) tmp_list.append(data.iloc[row]['dri']) tmp_list.append(data.iloc[row]['def']) tmp_list.append(data.iloc[row]['phy']) tmp_list.append(data.iloc[row]['skill_moves']) tmp_list.append(x_new[row][0]) tmp_list.append(x_new[row][1]) tmp_list.append(x_new[row][2]) tmp_list.append(x_new[row][3]) tmp_list.append(x_new[row][4]) tmp_list.append(x_new[row][5]) XList.append(tmp_list) return XList def trainandTest(X_train, y_train, X_test): # XGBoost训练过程 model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=160, silent=False, objective='reg:gamma') model.fit(X_train, y_train) # 对测试集进行预测 ans = model.predict(X_test) ans_len = len(ans) id_list = np.arange(10441, 17441) data_arr = [] for row in range(0, ans_len): data_arr.append([int(id_list[row]), ans[row]]) np_data = np.array(data_arr) # 写入文件 pd_data = pd.DataFrame(np_data, columns=['id', 'y']) # print(pd_data) pd_data.to_csv('submit.csv', index=None) # 显示重要特征 # plot_importance(model) # plt.show() if __name__ == '__main__': trainFilePath = 'dataset/soccer/train.csv' testFilePath = 'dataset/soccer/test.csv' data = loadDataset(trainFilePath) X_train, y_train = featureSet(data) X_test = loadTestData(testFilePath) trainandTest(X_train, y_train, X_test) 再次提交,这次MAE为 42.1227,排名16/28。虽然提升了不少,不过距离第一名还是有差距,仍需努力。 接下来,我们来处理一下下面这个字段: 由于这两个字段是标签,需要进行处理以后(标签标准化)才用到模型中。我们要用到的函数是sklearn.preprocessing.LabelEncoder: le = preprocessing.LabelEncoder() le.fit(['Low', 'Medium', 'High']) att_label = le.transform(data.work_rate_att.values) # print(att_label) def_label = le.transform(data.work_rate_def.values) # print(def_label) 当然你也可以使用pandas直接来处理离散型特征变量,具体内容可以参考:pandas使用get_dummies进行one-hot编码。顺带提一句,scikit中也有一个方法可以来处理,可参考:sklearn.preprocessing.OneHotEncoder。 调整后的代码: #!/usr/bin/env python # -*- coding: utf-8 -*- # @File : soccer_value.py # @Author: Huangqinjian # @Date : 2018/3/22 # @Desc : import pandas as pd import matplotlib.pyplot as plt import xgboost as xgb from sklearn import preprocessing import numpy as np from xgboost import plot_importance from sklearn.preprocessing import Imputer from sklearn.cross_validation import train_test_split def featureSet(data): imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) imputer.fit(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) x_new = imputer.transform(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) le = preprocessing.LabelEncoder() le.fit(['Low', 'Medium', 'High']) att_label = le.transform(data.work_rate_att.values) # print(att_label) def_label = le.transform(data.work_rate_def.values) # print(def_label) data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]['club']) tmp_list.append(data.iloc[row]['league']) tmp_list.append(data.iloc[row]['potential']) tmp_list.append(data.iloc[row]['international_reputation']) tmp_list.append(data.iloc[row]['pac']) tmp_list.append(data.iloc[row]['sho']) tmp_list.append(data.iloc[row]['pas']) tmp_list.append(data.iloc[row]['dri']) tmp_list.append(data.iloc[row]['def']) tmp_list.append(data.iloc[row]['phy']) tmp_list.append(data.iloc[row]['skill_moves']) tmp_list.append(x_new[row][0]) tmp_list.append(x_new[row][1]) tmp_list.append(x_new[row][2]) tmp_list.append(x_new[row][3]) tmp_list.append(x_new[row][4]) tmp_list.append(x_new[row][5]) tmp_list.append(att_label[row]) tmp_list.append(def_label[row]) XList.append(tmp_list) yList = data.y.values return XList, yList def loadTestData(filePath): data = pd.read_csv(filepath_or_buffer=filePath) imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) imputer.fit(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) x_new = imputer.transform(data.loc[:, ['rw', 'st', 'lw', 'cf', 'cam', 'cm']]) le = preprocessing.LabelEncoder() le.fit(['Low', 'Medium', 'High']) att_label = le.transform(data.work_rate_att.values) # print(att_label) def_label = le.transform(data.work_rate_def.values) # print(def_label) data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]['club']) tmp_list.append(data.iloc[row]['league']) tmp_list.append(data.iloc[row]['potential']) tmp_list.append(data.iloc[row]['international_reputation']) tmp_list.append(data.iloc[row]['pac']) tmp_list.append(data.iloc[row]['sho']) tmp_list.append(data.iloc[row]['pas']) tmp_list.append(data.iloc[row]['dri']) tmp_list.append(data.iloc[row]['def']) tmp_list.append(data.iloc[row]['phy']) tmp_list.append(data.iloc[row]['skill_moves']) tmp_list.append(x_new[row][0]) tmp_list.append(x_new[row][1]) tmp_list.append(x_new[row][2]) tmp_list.append(x_new[row][3]) tmp_list.append(x_new[row][4]) tmp_list.append(x_new[row][5]) tmp_list.append(att_label[row]) tmp_list.append(def_label[row]) XList.append(tmp_list) return XList def trainandTest(X_train, y_train, X_test): # XGBoost训练过程 model = xgb.XGBRegressor(max_depth=6, learning_rate=0.05, n_estimators=500, silent=False, objective='reg:gamma') model.fit(X_train, y_train) # 对测试集进行预测 ans = model.predict(X_test) ans_len = len(ans) id_list = np.arange(10441, 17441) data_arr = [] for row in range(0, ans_len): data_arr.append([int(id_list[row]), ans[row]]) np_data = np.array(data_arr) # 写入文件 pd_data = pd.DataFrame(np_data, columns=['id', 'y']) # print(pd_data) pd_data.to_csv('submit.csv', index=None) # 显示重要特征 # plot_importance(model) # plt.show() if __name__ == '__main__': trainFilePath = 'dataset/soccer/train.csv' testFilePath = 'dataset/soccer/test.csv' data = pd.read_csv(trainFilePath) X_train, y_train = featureSet(data) X_test = loadTestData(testFilePath) trainandTest(X_train, y_train, X_test) 这次只提高到了40.8686。暂时想不到提高的方法了,还请大神多多赐教! 更多内容欢迎关注我的个人公众号

资源下载

更多资源
Mario

Mario

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长,特征是大鼻子、头戴帽子、身穿背带裤,还留着胡子。与他的双胞胎兄弟路易基一起,长年担任任天堂的招牌角色。

Nacos

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称,一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集,帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Rocky Linux

Rocky Linux

Rocky Linux(中文名:洛基)是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版,作为CentOS稳定版停止维护后与RHEL(Red Hat Enterprise Linux)完全兼容的开源替代方案,由社区拥有并管理,支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性,采用模块化包装和SELinux安全架构,默认包含GNOME桌面环境及XFS文件系统,支持十年生命周期更新。

Sublime Text

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能,例如代码缩略图,Python的插件,代码段等。还可自定义键绑定,菜单和工具栏。Sublime Text 的主要功能包括:拼写检查,书签,完整的 Python API , Goto 功能,即时项目切换,多选择,多窗口等等。Sublime Text 是一个跨平台的编辑器,同时支持Windows、Linux、Mac OS X等操作系统。

用户登录
用户注册